Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peer, main, netsync, blockchain: parallel block downloads #2226

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

kcalvinalvin
Copy link
Collaborator

This PR modifies netsync.Manager so that all the header-first blocks downloaded before the last checkpoint is done out of order by utilizing query.WorkManager from neutrino.

Gonna put it in draft for now as testing is sorta difficult and I'm not convinced it's downloading blocks faster for mainnet. By my testing it works just fine in testnet but mainnet seems to be slow when downloading blocks. Still identifying where the bottleneck is and will make adjustments accordingly.

If anyone else would like to give this a try please let me know if you see speed ups or slow downs from this PR.

@coveralls
Copy link

coveralls commented Aug 7, 2024

Pull Request Test Coverage Report for Build 12081011051

Details

  • 13 of 426 (3.05%) changed or added relevant lines in 6 files are covered.
  • 14 unchanged lines in 2 files lost coverage.
  • Overall coverage decreased (-0.4%) to 56.866%

Changes Missing Coverage Covered Lines Changed/Added Lines %
peer/peer.go 2 18 11.11%
server.go 0 18 0.0%
blockchain/accept.go 10 33 30.3%
netsync/blocklogger.go 0 54 0.0%
netsync/manager.go 0 302 0.0%
Files with Coverage Reduction New Missed Lines %
peer/peer.go 5 73.37%
netsync/manager.go 9 0.0%
Totals Coverage Status
Change from base Build 11941199528: -0.4%
Covered Lines: 29895
Relevant Lines: 52571

💛 - Coveralls

@kcalvinalvin
Copy link
Collaborator Author

Seems like the slowdowns are coming from a single peer that's slowing down the block processing. peerWorkManager having capabilities to completely disconnect a peer would help in this.

@Roasbeef
Copy link
Member

Seems like the slowdowns are coming from a single peer that's slowing down the block processing. peerWorkManager having capabilities to completely disconnect a peer would help in this.

Interesting, is the issue that the single peer is assigned blocks uniformly and is always the last one to send (haven't looked in PR at detail yet). I have this tracking issue in the neutrino repo for adding stuff like dynamic tuning, better work assignment, and also work stealing. With work stealing, the faster peers would steal the block request from the work queue of the slow peer, with faster peers helping us to be as slow as the fastest peer.

@kcalvinalvin kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch 2 times, most recently from c51d31a to 2cbb562 Compare September 6, 2024 05:51
@kcalvinalvin kcalvinalvin marked this pull request as ready for review September 9, 2024 23:41
@saubyk
Copy link

saubyk commented Oct 3, 2024

cc: @Crypt-iQ @ProofOfKeags for review

@kcalvinalvin kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch from 2cbb562 to 8f28947 Compare November 29, 2024 07:24
query.Peer is used for downloading blocks out of order during headers
first download.  Methods SubscribeRecvMsg() and OnDisconnect() are added
to abide by the interface.
ConnectedPeers returns all the currently connected peers.  This is used
to provide the query.WorkManager with all the currently connected peers
from the netsync package.
@kcalvinalvin kcalvinalvin force-pushed the 2024-04-01-parallel-ibd branch from 8f28947 to 06209a4 Compare November 29, 2024 07:56
peerDisconnectMsg is added so that we can access the peerStates map and
disconnect peers with just a string of their address without risking a
concurrent access of the map.
handleBlockMsg used to check that the block header is both valid and
then process the blocks as they come in.  It's now refactored so that
it also handles blocks that are not in order.  For out of order block
downloads handleBlockMsg would mark the block as an orphan but it's now
refactored to handle those cases.

Whenever a block that's not the next from the chain tip is received,
it's now temporarily stored in memory until the next block from the
chain tip is received.  And then all the blocks that are in sequence are
processed.
checkpointedBlocksQuery is a helper to create []*query.Request which can
be passed off to query.Workmanager to query for wire.Messages to
multiple peers.  This is useful for downloading blocks out of order from
multiple peers during ibd.
peerSubscription is added to Manager which will allow it subscribers to
receive peers through the channel whenever the Manager is aware of a new
peer that it's been connected to.  This is useful to alert
query.Workmanager that a new peer that's been connected to is eligible
to download blocks from.
ConnectedPeers returns all the currently connected peers and any new
peer that's additionally connected through the returned channel.  This
method is required for query.Workmanager as it needs ot receive peers
that it can request blocks from.
The blocks that were requested from headers are now sent over to
query.Workmanager where it will rank peers based on their speed and
request blocks from them accordingly.  This allows for quicker block
downloads as:
1: Workmanager will prioritize faster peers.
2: Workmanager is able to ask from multiple peers.
Storing block happens before the block validation is done and this can
be a bottleneck on computers with slow disks.  Allowing for concurrent
block storage saves time as the disk operation can be done in parallel
with the cpu operations of verifying the block.
Resetting the requestedBlocks state in headersFirst is problematic since
we may be banning peers that are still good.
IBD for new nodes were broken due to the version handshake failing
between nodes that recognized wtxid based relays.  Reverting the changes
that were made so that the node is able to connect to those nodes.
candidate

Since we can use all the peers we could get for ibd, don't add peers
that are not sync candidates when we're still not current.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants