-
Notifications
You must be signed in to change notification settings - Fork 32
Allow TP to read by address prefix #34
base: main
Are you sure you want to change the base?
Conversation
eba5274
to
853ce13
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This pr should not include commits from your previous rfc, it should not delete the old rfc as when this is merged it would delete the rfc in master, and should not have any merge commits. Instead, this branch should be based on master and only contain the new commits.
Signed-off-by: ywolf <[email protected]>
a3b4762
to
551a040
Compare
apologize for the mess, started again with only the one initial commit |
|
||
This RFC proposes an additional API to transaction context, | ||
context.listAddresses(address_prefix), which will get address prefix as input | ||
and return list of all existing addresses that starts with the prefix. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we allow the case of address_prefix being empty / none?
If so, the reference implementation lists all addresses with non-empty data at prefixes which are in input address list of transaction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that better to follow logic as in get_state: if TP is requesting read of addresses that is not in 'input' address list then return failure.
It is up to the transaction family developer to check return value of the api and to make sure that the 'input' address list is valid, since the input list can accept address prefix i don't see this as a change of behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks
@agunde406 can you please restart your review? |
Can we progress with getting this approved? |
|
||
Get state by prefix is a feature available for reading via sawtooth rest-api. | ||
This RFC adds method to expose this capability of reading address prefix | ||
to transaction processor. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section is very light on content. Please provide concrete examples of why this is needed. Please check the Guide-Level explanation section in the template for other information that should be in this section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoni-wolf you have more thoughts in here: https://jira.hyperledger.org/browse/STL-1375 like the example of deleting all leaves under a certain node. As a use case maybe that's deleting all accounts owned by a certain user?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the main usage for this feature is this:
- you allocate some prefix for a member or usage
- the TP logic has some algorithm for translating data to sufix
- TP writes the data to the calculated prefix+sufix
now, it is very fast to find a certain piece of information, you just calculate it again and read that address, however, it is very hard and inefficient to clean it up!
with 'read by prefix' TP can support a 'clean transaction' that will:
- use this feature to acquire a list of all used addresses under that prefix
- make a list of them and call the delete once, or call delete for each address there
how would it be implemented w/o 'read by prefix'? (assuming you did implement efficient lookup like this):
one way is to try and read every possible suffix (which might be a huge list and take forever), another way is to start keeping track of every suffix you used in another address, which you will have to read,modify,write for every sufix you add, which will slow you down. both are much worse i think.
text/0000-read-by-prefix.md
Outdated
Example of implementation can be found here: | ||
Sawtooth-core + Python SDK changes: | ||
https://github.com/arsulegai/sawtooth-core/tree/jira_1375 | ||
(commits: 4225dc7 and 2c3a223) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain the actual implementation suggested for the validator and the sdks in this section as well, besides just the links to the commits.
text/0000-read-by-prefix.md
Outdated
read/write/delete each address in the list. | ||
This has a performance downside that each time transaction processor needs to | ||
write a new address, it will need to modify the special address containing | ||
list of all addresses. This address could be huge, when the need to actually |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This argument - that the address could be huge - is also a flaw of this approach generally, since the returned address list could be that large. This should be mentioned in Drawbacks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added to drawbacks.
The main different is that the access to a huge set of data can be limited to only when actually need to change all the data. with 'old' approach of keeping an address that tracks existing addresses, TP needs to modify this address during almost every transaction.
text/0000-read-by-prefix.md
Outdated
[drawbacks]: #drawbacks | ||
|
||
This will need to be implemented in every sawtooth SDK and will increase | ||
the complexity of the validator slightly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few other drawbacks should be mentioned here: a) this will perform very poorly when the address list is large; b) this will cause large I/O and CPU on the validator when the address list is large (it is potentially very expensive, unlike the rest of the TP API); c) once implemented it will be hard/difficult to deprecate because of our commitment to backward compatibility; d) this seems like an anti-pattern in TP design, and may encourage others to implement poorly designed TPs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know how this is handled for the rest-api, but one salient difference between the situation for the rest-api and a transaction processor, is that only one node is impacted by the query overhead for serving the rest-api request whereas all nodes are impacted by a transaction using a similar query. If not implemented carefully, a significant performance cost could be exploited as a DOS attack on the network.
the complexity of the validator slightly. | ||
|
||
# Rationale and alternatives | ||
[alternatives]: #alternatives |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another alternative is to redesign the TP to not need the list of addresses at all.
that contains list of existing addresses. When need to manipulate multiple | ||
addresses transaction processor will first read the special address, then | ||
read/write/delete each address in the list. | ||
This has a performance downside that each time transaction processor needs to |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a performance penalty for maintaining a list like this, but it is probably substantially faster than the approach contained in this RFC.
in a separate address. | ||
|
||
Having this capability allows a better and smarter address mapping this a | ||
better transaction processor logic. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a well-formed sentence.
message TpStateAddressesListRequest { | ||
// The context id that references a context in the contextmanager | ||
string context_id = 1; | ||
repeated string addresses = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if this request was for an address space with an exceptionally large number of addresses? Should they just be returned, or is there an upper limit, where the validator will send an error back with "request denied". If there should be an upper limit, how would we manage it?
Another approach may be a paging approach, where the request specifies the maximum number of addresses to return (with a max page size, perhaps configurable per-validator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as regular read, there is no limitation of data size in one address, it is up the the TP developer to make sure he doesn't write too much data to one address.
Since only requesting address list without addresses data this won't get too big that easily.
No objection to adding paging and max, only that it is not done in regular read address, added it as to unresolved questions section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not the same type of problem, because read/write are relatively efficient. Returning a list of addresses from the state tree will not be efficient as it will require a large number of underlying database reads to accomplish.
Adding an on-chain setting for maximum size of address storage would probably be a good feature too.
message TpStateAddressesListRequest { | ||
// The context id that references a context in the contextmanager | ||
string context_id = 1; | ||
repeated string addresses = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this a list? If it remains a list, we should specify in this section how the validator should process the request.
[summary]: #summary | ||
|
||
This RFC proposes an additional API to transaction context, | ||
context.listAddresses(address_prefix), which will get address prefix as input |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably return an iterator so not all addresses must be obtained at once.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
enum Status { | ||
STATUS_UNSET = 0; | ||
OK = 1; | ||
AUTHORIZATION_ERROR = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This document should describe whether this occurs based on a comparison to the request, or whether it occurs if the list would return an address outside the allowed range. It should probably be the former.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The general concept described here, as I understand it, could be implemented as a mapping of application keys and values, in turn stored at a single address, where that address would be more specific for that particular transaction. In other words, store the list of key-value pairs at an address specific enough for the use case. Transaction processors don't generally operate on arbitrary depths of addresses - things are scoped to a particular user or org or product, etc.
@dcmiddle referenced a Jira ticket for this particular describes a situation where intkey keys are selected by key length. This presumes a direct link between keys and their address in a 1-1 manner. Keys are often generated using hashes, which would be a fixed length, regardless of original key length.
A reply in the issue does describe deleting an entire subtree, which makes sense and probably has a valid set of use cases.
I think a compelling use case needs to be presented, but ultimately don't think there is one.
text/0000-read-by-prefix.md
Outdated
# Motivation | ||
[motivation]: #motivation | ||
|
||
The motivation came when looking for a way to improve processing transaction |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The optimization here is actually reduces the number of messages between the validator and a transaction processor in exchange for potentially larger messages.
Signed-off-by: ywolf <[email protected]>
2412b3c
to
bd2bf2d
Compare
Signed-off-by: ywolf <[email protected]>
Added paging mechanism. |
@peterschwarz @vaporos @agunde406 @dcmiddle Can you please review again? |
There are still quite a few comments that have not been addressed. |
I'm delighted to see this RFC proposed as we hit exactly this problem running a Sawtooth Workshop aimed exclusively at developers in Hong Kong in December. What was intuitive to the developers when tackling a programming exercise aimed at creating a voting app TP was reading the current set of voters using the voters prefix to determine whether someone registering was in fact already registered when handing a register to vote transaction. They were baffled when they discovered they couldn't do this. |
This is an application design issue, and does not require the feature described in this RFC. If the application was designed to store voter registration information at a deterministic location based on information describing the voter, then the existence check becomes an O(1) operation, a direct lookup. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm voting no on this. I believe the need for this can be avoided with appropriate state design. If there are specific examples that can't be solved for with an alternate state layout, I'm interested in exploring them. Even if such examples are presented, there are a number of practical challenges with this proposal, foremost the question of how to constrain/page the result set.
RFC to support read by address prefix from transaction processor