Nyzo techRelease notesNyzo 572: NodeManager stability

Nyzo 572: NodeManager stability

Nyzo version 572 (commit on GitHub) improves control of memory use in NodeManager.

This version affects the verifier only. This is the only run mode that uses the NodeManager.

The openness of the Nyzo verifier, with its willingness to accept information on incoming connections, makes it especially susceptible to resource-consumption attacks. In the past, various potential attacks have been found by members of the community. The NicknameManager issue, addressed in version 500, resulted in a ∩75,000 bounty. The signature-buffer issue, addressed in version 503, resulted in a ∩225,000 bounty. The TIME_WAIT issue, addressed in version 539, resulted in a ∩30,000 bounty.

We will not recommend a specific bounty amount for this issue. We encourage the cycle to provide a bounty that appropriately acknowledges the value that report of this issue provides to the community. The fixes in this build eliminate a vulnerability that, if exploited correctly, could have resulted in a cycle stall, tremendous hassle for operators of verifiers, and loss of verification positions in the cycle.

Also, we ask that the cycle considers the NodeManager improvements in version 570 when determining the bounty. Those improvements were due to this report, but the report was not mentioned at the time due to the incomplete nature of the fixes in that version.

As the current cycle transaction mechanism is frustratingly unreliable, we do feel that delaying bounty transactions until the deployment of NTTP-3 is warranted.

The reporters of this issue have provided the following addresses for receiving a bounty:

In Node, the communicationFailureInactiveThreshold constant and communicationFailureCount field were added to encapsulate communication failure information. The identifierChangeTimestamp was eliminated, as there is no longer a restriction on frequency of IP changes.

RN_572 code 0

The methods for getting and setting identifierChangeTimestamp were removed.

RN_572 code 1

Indentation of the isActive() method was corrected. The markSuccessfulConnection() and markFailedConnection() methods were added to track reachability.

RN_572 code 2

In NodeManager, the consecutiveFailuresBeforeRemoval constant and ipAddressToFailureCountMap field used for reachability were removed. Their functionality is now encapsulated in the Node class.

The maximumNodesPerInCycleVerifier and maximumNewNodeMapSize constants were added for enforcing new limits. The newNodeIpToPortMap field stores nodes that have sent node joins to this verifier but have not yet been added to the main map of nodes, ipAddressToNodeMap.

RN_572 code 3

The meshRequestWait and meshRequestSuccessCount fields were both changed from int to AtomicInteger to improve thread safety.

RN_572 code 4

In the NodeManager.updateNode() that takes a Message argument, a check is performed to determine whether the message is a node-join response. Responses are not processed in the MeshListener, so a response would only be available as a result of an outgoing node-join message sent by this verifier.

The other overload of updateNode() no longer returns a boolean value to indicate whether the node was added to the map, and node joins are no longer immediately sent when a new node is encountered.

RN_572 code 5

When updateNode() is called with a message of type MissingBlockVoteRequest23 or MissingBlockRequest25, a successful connection is registered on the node. This has the same effect as the previous setInactiveTimestamp() method call, but code readability and encapsulation are improved.

RN_572 code 6

In NodeManager.addTemporaryLocalVerifierEntry(), a true value is passed for the new argument of updateNode(). This causes the temporary entry to be added to the primary map immediately.

RN_572 code 7

For the private overload of updateNode(), synchronization was removed and the isNodeJoinResponse argument was added. The return value was eliminated. The logic was reorganized to have fewer cases.

RN_572 code 8

At the top level, there are now only two cases. If the IP address and identifier are the same as those of a node already in the map, the ports are updated and a successful connection is marked. In all other situations, the node is treated as a new node, being removed from the map if a node is already present at the IP. Note that the result of removing and re-adding a node to the map is similar to the result of the previous logic, which reset queue timestamps when identifiers changed. The new logic, though, is simpler and easier to protect from memory attacks.

If the node is in the cycle, or if the update is due to a node-join response, the node is immediately added to the primary map, ipAddressToNodeMap. This is subject to a maximum of 6 nodes per identifier. This limit allows suitable leeway for backup and replacement nodes. Inactive nodes are eventually removed from the map, so there is no concern that cumulative IP changes for a node will limit the ability to add new nodes.

RN_572 code 9

All out-of-cycle nodes that are not due to node-join responses are added to a size-limited map for later querying. While this does not impose a limit on the memory usage of the NodeManager, it does impose a limit of the rate of increase of memory usage.

As before, the temporary entry is removed if an actual entry for the verifier is encountered.

RN_572 code 10

Synchronization was removed from the demoteIdentifier() method. The ipAddressToNodeMap is a ConcurrentHashMap, so this method is thread-safe.

RN_572 code 11

The NodeManager.markFailedConnection() method is now basically a pass-through to the Node.markFailedConnection() method. Marking nodes as inactive due to repeated communications failures is now encapsulated in the Node class.

RN_572 code 12

Similarly, the NodeManager.markSuccessfulConnection() method is now essentially a pass-through to the Node.markSuccessfulConnection() method.

RN_572 code 13

NodeManager.markNodeAsInactive() was removed. This behavior is now encapsulated in the Node class and controlled through Node.markFailedConnection() and Node.markSuccessfulConnection().

RN_572 code 14

Synchronization was removed from NodeManager.updateActiveVerifiersAndRemoveOldNodes(). All code in this method is thread-safe.

RN_572 code 15

Synchronization was removed from NodeManager.requestMesh(), and the method was renamed to reloadNodeJoinQueue() to better reflect its purpose.

Usage of meshRequestWait was updated for its change from int to AtomicInteger. The differing semantics of the post-decrement meshRequestWait-- and pre-decrement meshRequestWait.decrementAndGet() are inconsequential in this situation.

Use of MessageType.FullMeshRequest41 was eliminated to further emphasize cycle connectivity. Out-of-cycle verifiers should make their presence known to in-cycle verifiers. In-cycle verifiers need not actively search for out-of-cycle verifiers.

RN_572 code 16

Another usage of meshRequestWait was updated for its changed data type, along with two usages of meshRequestSuccessCount.

Node joins are enqueued for all IP addresses in the newNodeIpToPortMap. This is how new out-of-cycle nodes are added to the primary map. Previously, nodes would be added immediately when a node-join message was received. That means that someone would need to control an IP address for only moments in order to cause this verifier to register an entry. Now, control needs to be for much longer, as the verifier reloads its node-join queue on its own schedule, and the next reload may be several hours from when the initial join was sent. Thus, an inexpensive attack on memory consumption is changed to a situation where the attacker is essentially just another out-of-cycle verifier.

RN_572 code 17

Synchronization was removed from NodeManager.demoteInCycleNodes. The ipAddressToNodeMap is a ConcurrentHashMap, so the method is thread-safe.

RN_572 code 18

In NodeManager.persistNodes(), a placeholder is written for the identifier-change timestamp. This preserves the structure of the file.

RN_572 code 19

In NodeManager.loadPersistedNodes(), reading and setting of the identifier-change timestamp was removed.

RN_572 code 20

In Verifier.sendNodeJoinMessage(), which sends a node-join to a TrustedEntryPoint, the version of the message was updated.

RN_572 code 21

In Verifier.verifierMain(), comments and a called method's name were updated.

RN_572 code 22