Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Add new peers without restarting a node #32

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

eugene-babichenko
Copy link

Proposes an extension of validator client messaging to add peers to a running node without restarting it.

Signed-off-by: Yevhenii Babichenko [email protected]

Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
This RFC proposes to implement the possibility to add new peers in runtime
(through the component validator endpoint) when a node is working in the
`static` peering mode. Along with that it also adds the corresponding extensions
to the off-chain permissioning model.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see the corresponding permissioning text below.
One of the main security considerations is how to prevent a bad actor from taking the validator offline. Attacks may include connecting the validator to offline or otherwise byzantine nodes. Other attacks may include the general type of exposure from adding any new remote call (input formatting etc).
The proposal adds remote access to what has so far required local system administrator access to the host. We must consider adversaries who are on the same local network. The system in most cases may be configured such that the component port is not advertised beyond the loopback, but there are valid reasons for the administrator to advertise the component port on the local network. This proposal should consider how the validator remains secure independent of the system's network configuration.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, after this explanation I have a better understanding of the permissioning requirement. Will describe a model that was discussed on the chat.

runtime and also decreases the uptime.

To resolve this problem our team proposes to add a method to add new peers to a
running node.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you also considered removing nodes? The security implications are a little more risky but about the same. The benefit of including a remove function would let us address the limits of maximum-peer-connectivity.

Copy link
Contributor

@agunde406 agunde406 Nov 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that it would be good to include removal functionality for discussion.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is a good idea.

@dcmiddle
Copy link
Contributor

I think this would be a useful addition to the administrative APIs. Main thing to work out are security cosiderations.

receives a new request for adding peers it:

- Validates the format of peer URIs which has to be
`tcp://IP_ADDRESS:PORT_NUMBER`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to IP addresses, this should also accept DNS names, which ZMQ will resolve: http://api.zeromq.org/2-1:zmq-tcp

`tcp://IP_ADDRESS:PORT_NUMBER`
- If the validation was successful then the validator updates its peer list and
immediately returns the `OK` response. The new peers are connected _after_
that.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the response is sent back before connecting, there is no way to notify clients of authorization errors. Is not sending an error due to authorization an intentional decision? Or should the error enum be extended and a response sent only after peering completes?

internal interface of the application. Even if we do then we can restrict the
access to that feature by using a proxy as suggested in the documentation.
- Should the system notify its user about the statuses of new connections or
leave the status check to the end user?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My opinion would be to either have the response only be sent after the new connection is created, or expose another request type that allows the user to poll on the status of the new connection so they can detect connection failures such as authorization errors.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is not sending an error due to authorization an intentional decision

I missed it intentionally to have a discussion on how we should handle such errors.

In my opinion, it would be better to extend the enum in ClientAddPeersResponse but then I would like to discuss the implementation. The simplest approach I see here is to make the Gossip class send notifications about peer status changes. This will allow us to know if the connection attempt was ok or not ok, but will not give any specific information. To be more specific we will have to add handlers for corresponding peer messages that will notify us, for example, about authorization violations, rejected connections and so on. I think that the simplest approach should be acceptable now (at least in the system our company build).

…g that in batches.

Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
Signed-off-by: Yevhenii Babichenko <[email protected]>
@eugene-babichenko
Copy link
Author

Just uploaded an update. Here is what is changed:

  • Extended the range of error codes for adding peers.
  • Added the description for peers removal.
  • Now requests for adding peers are submitted one by one instead of doing that in batches. The rationale behind that is fairly simple. Now this RFC support error codes that are returned after the connection is finished or failed. Because of that we need control over each separate connection and they may fail with different errors.
  • Added the description of the permissioning model. This is very basic and I doubt if it should be generalized somehow (added that to the "Unresloved questions" section).

text/0000-add-peers-in-runtime.md Outdated Show resolved Hide resolved
text/0000-add-peers-in-runtime.md Show resolved Hide resolved
text/0000-add-peers-in-runtime.md Outdated Show resolved Hide resolved
CLIENT_PEERS_ADD_REQUEST = 131;
CLIENT_PEERS_ADD_RESPONSE = 132;
CLIENT_PEERS_REMOVE_REQUEST = 131;
CLIENT_PEERS_REMOVE_RESPONSE = 132;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be 134

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

[request-processing]: #request-processing

The requests are received on the `component` endpoint. When the validator
receives a new request for adding peers it:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add what the validator would do to remove peers

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added that, and there is one question: what to do if we go below the minimum peer connectivity limit here? I have two options in my mind:

  • Just allow to do that, but it my bring the node functionality down;
  • Another option is to throw an error here in case we reached the minimum limit.

Wondering which of those two will be a better solution.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isnt this proposal only for static peering? minimum peer connectivity is used for dynamic peering.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, sorry

@agunde406 agunde406 self-requested a review January 8, 2019 19:47
@vaporos
Copy link
Contributor

vaporos commented Jan 22, 2019

I like this proposal overall.

I wonder if this truly belongs in the Client* message namespace, or whether we should have an administrative set of messages. This is similar to the question of whether it should be exposed in the REST API, but more fundamentally whether it should be able to be bound to a different port.

@eugene-babichenko
Copy link
Author

I am not sure if we should define a separate namespace here — this falls beyond the scope of this proposal. I followed the simple logic here: if something does not belong to TPs or consensus engine, this should be in the Client* namespace.

Do you mean by "to be bound to a different port" that those messages should be on a different port than component?


# Summary
[summary]: #summary This RFC proposes to implement the possibility to add new
peers and remove the existing connections in runtime (through the component

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend: "This RFC proposes functionality to add and remove static peer connections while a validator node is running."

[summary]: #summary This RFC proposes to implement the possibility to add new
peers and remove the existing connections in runtime (through the component
validator endpoint) when a node is working in the `static` peering mode. Along
with that it also adds the corresponding extensions to the off-chain

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend: "This RFC also adds the corresponding extensions to the off-chain permssioning model."

# Motivation
[motivation]: #motivation

When an administrator adds a new node to an existing Sawtooth network he/she has

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add comma after 'network' and just pick either he or she for the gender


When an administrator adds a new node to an existing Sawtooth network he/she has
to restart a node with new peering settings. This makes any automation
significantly harder to write than if we had the possibility to add peers in the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Recommend: "Restarting the process adds substantial complexity to infrastructure automation, and incurs system downtime."

significantly harder to write than if we had the possibility to add peers in the
runtime and also decreases the uptime.

To resolve this problem our team proposes to add a method to add new peers to a

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comma after problem

list and returns the `INVALID_PEER_URI` status along with the list of faulty
peer URIs.
- If the `--maximum-peer-connectivity` parameter was provided to the validator
then the validator checks if it has reached the maximum peer connectivity and

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma after connectivity

following:

- Validates the format of peer URI which has to be `tcp://ADDRESS:PORT_NUMBER`;
- If a peer is connected the validator removes it. Otherwise the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma after connected

restrict access to requests that can be malicious. The workflow for the
validation is the following:

- If the `admin` role is not specified then the permissioning module will use

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma after specified


- If the `admin` role is not specified then the permissioning module will use
the `default` policy.
- If the `default` policy is not specified then the validation of the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma after specified

verifier checks:
- If the `admin_public_key` is allowed.
- If the `signature` is correct.
- If one of the above conditions is not satisfied then the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add comma after satisfied

- If the `default` or the `admin` policy is specified, then the permission
verifier checks:
- If the `admin_public_key` is allowed.
- If the `signature` is correct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you expand on this? In particular, it is not clear what is being signed.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation should also be covered by the signature. If only the URI is signed then it could be replayed in a ClientRemovePeerRequest. You could look at the batch format for an example of using a signature that covers the entirety of the message. https://github.com/hyperledger/sawtooth-core/blob/master/protos/batch.proto
There may be a simpler approach for this kind of message.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am still waiting for a reply to this message. I am not really sure whether we should use just message signatures or develop a more complicated system for future use (I believe that such system should appear as a separate and more general RFC).

@vaporos
Copy link
Contributor

vaporos commented Apr 2, 2019

Should the authorization for this feature occur during the connection conversation (see authorization.proto)? I think it may be appropriate to add ADMIN to "RoleType" and create an Admin* message namespace in addition to Client*, etc.

Signed-off-by: Yevhenii Babichenko <[email protected]>
@eugene-babichenko
Copy link
Author

Should the authorization for this feature occur during the connection conversation (see authorization.proto)?

It may be difficult to implement the authorization in the component endpoint. Should we extend the component endpoint, use the network endpoint (it already has the authorization implemented) or create a separate admin endpoint?

I think it may be appropriate to add ADMIN to "RoleType" and create an Admin* message namespace in addition to Client*, etc.

Agree on that.

@agunde406 agunde406 requested review from jsmitchell and vaporos April 17, 2019 13:08
- If the `default` or the `admin` policy is specified, then the permission
verifier checks:
- If the `admin_public_key` is allowed.
- If the `signature` is correct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operation should also be covered by the signature. If only the URI is signed then it could be replayed in a ClientRemovePeerRequest. You could look at the batch format for an example of using a signature that covers the entirety of the message. https://github.com/hyperledger/sawtooth-core/blob/master/protos/batch.proto
There may be a simpler approach for this kind of message.

@dcmiddle
Copy link
Contributor

Should the authorization for this feature occur during the connection conversation (see authorization.proto)?

It may be difficult to implement the authorization in the component endpoint. Should we extend the component endpoint, use the network endpoint (it already has the authorization implemented) or create a separate admin endpoint?

I think an admin endpoint would be the most secure. This would facilitate firewall rules to restrict traffic. The downside is it adds one more element to a node setup. We used to get a lot of support questions stemming from configuring the existing endpoints. That said, I don't think this will add much complexity. A loopback default wouldn't need to be modified. A simple setup would then have the administrator logging into the validator host to issue these commands and the respective private key could reside on that validator host adding no other key management burden.

@eugene-babichenko
Copy link
Author

Is that OK if I reopen this pull request from another repository?

@agunde406
Copy link
Contributor

Please dont, we dont want to loose this conversation. @eugene-babichenko

@eugene-babichenko
Copy link
Author

@agunde406 The problem is I no longer have access to this fork but still want to finish this RFC.

@agunde406
Copy link
Contributor

agunde406 commented Jun 6, 2019

@eugene-babichenko Ah okay. In that case, yes it is okay to resubmit the RFC from a new fork. Please do not change commit history and link this PR in the new one.

@eugene-babichenko
Copy link
Author

@agunde406 Here is the new pull request #44

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

7 participants