Skip to content
This repository has been archived by the owner on Feb 1, 2024. It is now read-only.

Allow TP to read by address prefix #34

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

yoni-wolf
Copy link
Contributor

RFC to support read by address prefix from transaction processor

Copy link
Contributor

@agunde406 agunde406 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pr should not include commits from your previous rfc, it should not delete the old rfc as when this is merged it would delete the rfc in master, and should not have any merge commits. Instead, this branch should be based on master and only contain the new commits.

@yoni-wolf
Copy link
Contributor Author

apologize for the mess, started again with only the one initial commit


This RFC proposes an additional API to transaction context,
context.listAddresses(address_prefix), which will get address prefix as input
and return list of all existing addresses that starts with the prefix.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we allow the case of address_prefix being empty / none?
If so, the reference implementation lists all addresses with non-empty data at prefixes which are in input address list of transaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that better to follow logic as in get_state: if TP is requesting read of addresses that is not in 'input' address list then return failure.
It is up to the transaction family developer to check return value of the api and to make sure that the 'input' address list is valid, since the input list can accept address prefix i don't see this as a change of behavior.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks

@yoni-wolf
Copy link
Contributor Author

apologize for the mess, started again with only the one initial commit

@agunde406 can you please restart your review?

@yoni-wolf yoni-wolf closed this Jan 3, 2019
@yoni-wolf yoni-wolf reopened this Jan 3, 2019
@agunde406 agunde406 self-requested a review January 3, 2019 13:42
@yoni-wolf
Copy link
Contributor Author

Can we progress with getting this approved?


Get state by prefix is a feature available for reading via sawtooth rest-api.
This RFC adds method to expose this capability of reading address prefix
to transaction processor.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section is very light on content. Please provide concrete examples of why this is needed. Please check the Guide-Level explanation section in the template for other information that should be in this section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yoni-wolf you have more thoughts in here: https://jira.hyperledger.org/browse/STL-1375 like the example of deleting all leaves under a certain node. As a use case maybe that's deleting all accounts owned by a certain user?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the main usage for this feature is this:

  1. you allocate some prefix for a member or usage
  2. the TP logic has some algorithm for translating data to sufix
  3. TP writes the data to the calculated prefix+sufix

now, it is very fast to find a certain piece of information, you just calculate it again and read that address, however, it is very hard and inefficient to clean it up!

with 'read by prefix' TP can support a 'clean transaction' that will:

  1. use this feature to acquire a list of all used addresses under that prefix
  2. make a list of them and call the delete once, or call delete for each address there

how would it be implemented w/o 'read by prefix'? (assuming you did implement efficient lookup like this):

one way is to try and read every possible suffix (which might be a huge list and take forever), another way is to start keeping track of every suffix you used in another address, which you will have to read,modify,write for every sufix you add, which will slow you down. both are much worse i think.

Example of implementation can be found here:
Sawtooth-core + Python SDK changes:
https://github.com/arsulegai/sawtooth-core/tree/jira_1375
(commits: 4225dc7 and 2c3a223)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explain the actual implementation suggested for the validator and the sdks in this section as well, besides just the links to the commits.

read/write/delete each address in the list.
This has a performance downside that each time transaction processor needs to
write a new address, it will need to modify the special address containing
list of all addresses. This address could be huge, when the need to actually
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This argument - that the address could be huge - is also a flaw of this approach generally, since the returned address list could be that large. This should be mentioned in Drawbacks.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to drawbacks.
The main different is that the access to a huge set of data can be limited to only when actually need to change all the data. with 'old' approach of keeping an address that tracks existing addresses, TP needs to modify this address during almost every transaction.

[drawbacks]: #drawbacks

This will need to be implemented in every sawtooth SDK and will increase
the complexity of the validator slightly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few other drawbacks should be mentioned here: a) this will perform very poorly when the address list is large; b) this will cause large I/O and CPU on the validator when the address list is large (it is potentially very expensive, unlike the rest of the TP API); c) once implemented it will be hard/difficult to deprecate because of our commitment to backward compatibility; d) this seems like an anti-pattern in TP design, and may encourage others to implement poorly designed TPs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know how this is handled for the rest-api, but one salient difference between the situation for the rest-api and a transaction processor, is that only one node is impacted by the query overhead for serving the rest-api request whereas all nodes are impacted by a transaction using a similar query. If not implemented carefully, a significant performance cost could be exploited as a DOS attack on the network.

the complexity of the validator slightly.

# Rationale and alternatives
[alternatives]: #alternatives
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another alternative is to redesign the TP to not need the list of addresses at all.

that contains list of existing addresses. When need to manipulate multiple
addresses transaction processor will first read the special address, then
read/write/delete each address in the list.
This has a performance downside that each time transaction processor needs to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a performance penalty for maintaining a list like this, but it is probably substantially faster than the approach contained in this RFC.

text/0000-read-by-prefix.md Show resolved Hide resolved
in a separate address.

Having this capability allows a better and smarter address mapping this a
better transaction processor logic.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a well-formed sentence.

message TpStateAddressesListRequest {
// The context id that references a context in the contextmanager
string context_id = 1;
repeated string addresses = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if this request was for an address space with an exceptionally large number of addresses? Should they just be returned, or is there an upper limit, where the validator will send an error back with "request denied". If there should be an upper limit, how would we manage it?

Another approach may be a paging approach, where the request specifies the maximum number of addresses to return (with a max page size, perhaps configurable per-validator).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as regular read, there is no limitation of data size in one address, it is up the the TP developer to make sure he doesn't write too much data to one address.
Since only requesting address list without addresses data this won't get too big that easily.
No objection to adding paging and max, only that it is not done in regular read address, added it as to unresolved questions section.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not the same type of problem, because read/write are relatively efficient. Returning a list of addresses from the state tree will not be efficient as it will require a large number of underlying database reads to accomplish.

Adding an on-chain setting for maximum size of address storage would probably be a good feature too.

message TpStateAddressesListRequest {
// The context id that references a context in the contextmanager
string context_id = 1;
repeated string addresses = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a list? If it remains a list, we should specify in this section how the validator should process the request.

[summary]: #summary

This RFC proposes an additional API to transaction context,
context.listAddresses(address_prefix), which will get address prefix as input
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return an iterator so not all addresses must be obtained at once.

Copy link
Contributor

@peterschwarz peterschwarz Jan 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

enum Status {
STATUS_UNSET = 0;
OK = 1;
AUTHORIZATION_ERROR = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This document should describe whether this occurs based on a comparison to the request, or whether it occurs if the list would return an address outside the allowed range. It should probably be the former.

Copy link
Contributor

@peterschwarz peterschwarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The general concept described here, as I understand it, could be implemented as a mapping of application keys and values, in turn stored at a single address, where that address would be more specific for that particular transaction. In other words, store the list of key-value pairs at an address specific enough for the use case. Transaction processors don't generally operate on arbitrary depths of addresses - things are scoped to a particular user or org or product, etc.

@dcmiddle referenced a Jira ticket for this particular describes a situation where intkey keys are selected by key length. This presumes a direct link between keys and their address in a 1-1 manner. Keys are often generated using hashes, which would be a fixed length, regardless of original key length.

A reply in the issue does describe deleting an entire subtree, which makes sense and probably has a valid set of use cases.

I think a compelling use case needs to be presented, but ultimately don't think there is one.

# Motivation
[motivation]: #motivation

The motivation came when looking for a way to improve processing transaction
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The optimization here is actually reduces the number of messages between the validator and a transaction processor in exchange for potentially larger messages.

Signed-off-by: ywolf <[email protected]>
@yoni-wolf
Copy link
Contributor Author

Added paging mechanism.

@yoni-wolf
Copy link
Contributor Author

@peterschwarz @vaporos @agunde406 @dcmiddle
Added paging mechanism to address the performance concerns.
Oron added use case that helps explain why this feature is required.

Can you please review again?
Thanks

@agunde406
Copy link
Contributor

There are still quite a few comments that have not been addressed.

@duncanjw
Copy link

duncanjw commented Feb 21, 2019

I'm delighted to see this RFC proposed as we hit exactly this problem running a Sawtooth Workshop aimed exclusively at developers in Hong Kong in December.

What was intuitive to the developers when tackling a programming exercise aimed at creating a voting app TP was reading the current set of voters using the voters prefix to determine whether someone registering was in fact already registered when handing a register to vote transaction.

They were baffled when they discovered they couldn't do this.

@jsmitchell
Copy link

I'm delighted to see this RFC proposed as we hit exactly this problem running a Sawtooth Workshop aimed exclusively at developers in Hong Kong in December.

What was intuitive to the developers when tackling a programming exercise aimed at creating a voting app TP was reading the current set of voters using the voters prefix to determine whether someone registering was in fact already registered when handing a register to vote transaction.

They were baffled when they discovered they couldn't do this.

This is an application design issue, and does not require the feature described in this RFC. If the application was designed to store voter registration information at a deterministic location based on information describing the voter, then the existence check becomes an O(1) operation, a direct lookup.

Copy link

@jsmitchell jsmitchell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm voting no on this. I believe the need for this can be avoided with appropriate state design. If there are specific examples that can't be solved for with an alternate state layout, I'm interested in exploring them. Even if such examples are presented, there are a number of practical challenges with this proposal, foremost the question of how to constrain/page the result set.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Development

Successfully merging this pull request may close these issues.

9 participants