Replies: 9 comments 5 replies
-
@jxs @guillaumemichel Referring to our recent discussion, I have found out that dropping deprecated messages without penalizing peers should be possible by submitting I still puzzles me though, how is it possible that with the default gossipsub config, there are messages months old circulating in my network. I would expect that gossipsub messages would naturally be gone after a minute or two. Perhaps it's important – I see that about 50% of these very old messages come from a single propagation source. |
Beta Was this translation helpful? Give feedback.
-
If this adds any context, I have been seeing a lot of these warnings in all nodes:
And more recently, one of the node operators also reported a bunch of these:
|
Beta Was this translation helpful? Give feedback.
-
This is the big problem on the network right now. Any updates from you? @Wiezzel |
Beta Was this translation helpful? Give feedback.
-
We've been having this issue with go-libp2p in F3 (Lotus/Filecoin), I'm going to propose a spec update this week to fix it. The idea is to (optionally) repurpose the message nonce as an expiration timestamp. |
Beta Was this translation helpful? Give feedback.
-
Hey @Stebalien! I debugged the repo a bit and worked on an improvement with forked repo. So first of all, the problem is; If you look at this part, you will see that the on_connection_handler function synchronises all messages. So this function works as peer increases and right at this part it triggers the following function to receive messages like crazy and then puts these messages directly into memcache. And the memory continues to swell continuously. Also, these messages are propagated again when a new node arrives. I dig a little bit where this message list comes from and I realized that these messages first go through ValidationMode. This is the part where the proto is directly decoded. I realized that ValidationMode has three things that it validates, signature, sequence_no, source. In Rust-Libp2p, sequence_no works like this; when the node gets up, they take the timestamp and increment it. I utilized this sequence_no to be completely timestamp. If you look at this part, before each message is sent, sequence_no goes as the timestamp of that moment; in this section, I added ttl directly into sequence_no verification. I say that messages that do not pass the filter directly to the invalid list. Incoming messages are blocked without any processing, data extraction or cache. and the other thing is memory issue. Some apps has not need the past messages but libp2p still stores on memcache. So I added the usage of it. Normally, libp2p uses Hashmap for it; But I implemented lru_cache_time; Idea is, messages will start to be deleted after a certain time or capacity, so old messages will not be kept in memory. I'm testing fork, but after testing I'm going to release pr for the next version. You can use this configuration on the gossip behaviour like this.
What do you think on that? Also @Wiezzel, is it fix your issue too? Because seems we are implementing same usage on libp2p. |
Beta Was this translation helpful? Give feedback.
-
@anilaltuner This is a viable solution to my problem. However, as @guillaumemichel pointed out to me, there's a couple of problems with that: I personally implemented a simpler solution, using the message validation functionality. In my case only the most recent message from each publisher is interesting. So I just keep a hash map with the highest seen seq_no for each peer, and ignore messages with previous numbers. |
Beta Was this translation helpful? Give feedback.
-
Apologies for being late to this thread (I was away for a bit). I might be able to add some insight here. The pubsub system is designed to publish messages as best it can throughout the network. There is no in-built mechanism to decide if a message is old or not. Old messages can bounce around the network for a variety of reasons. Some of which I've seen in the wild are:
In all of these scenario's, the easiest (and imo correct way) to handle this is to inform gossipsub about what messages are stale and which are not. This kind of logic is application-specific so has been left to the application. The way to do this is to set Once a message comes in, the application should then call Large queues if outbound messages exceeds the capacity of the network to upload them can still cause messages to be late however. It seems you've already found this solution, but I thought it might be useful to elaborate on the original design. |
Beta Was this translation helpful? Give feedback.
-
Hey @AgeManning! Firstly thank you for elaborating, it is quite clear. The thing is, I've changed a few more things since my last update. Yes, as you said, there is no structure related to whether the message is old or not, we can add this custom in validate_messages, but there may be a problem like this.
Apart from all these, I saw that the most basic problem I had was in send_queue #4572. As a solution here, it is suggested that either send_queue should be limited or backpressure should be required between connections. I solved it by limiting it, but is it still a valid suggestion or has a better solution been developed since then? |
Beta Was this translation helpful? Give feedback.
-
Hey, yeah, so sounds like you have huge amounts of burst in your network. For 1, it sounds like the signature verification is too slow to handle the traffic. There's probably two options here, 1 - avoid signature verification, 2 - Backpressure and decide on what messages to drop. (I'll talk a bit about this later on).
Gossipsub tries not to make decisions like these at the protocol level because they quickly get specialised and the protocol becomes very complex and less general. In fact, adding extra configurations into gossipsub was turned down by the maintainers at the time because they thought the protocol was already getting too complex. For this reason, gossipsub tries to be dumb about what messages are being sent, and passes functionality to the application of gossipsub to handle these details. My initial reaction to your problem would be to implement backpressure via the MessageValidation functionality. Let gossipsub give you the burst of messages, dump them all into a queue in your application and filter based on any metric (i.e time) then the ones that are left, do sig verification and send back MessageAcceptance::Accepted. If this is not possible, I'm not against adding a configuration paramter that takes a closure that can filter messages, the problem is that it can only rely on the gossipsub protobuf not the application specific encoding. So like from,to, msg-id etc.
If you have a lot of burst messages, then the memcache will grow quite large, depending on this configuration parameters (which you can tweak). If we space-bound the memcache, then we'd run into footguns with the specification parameters like Yes. There was a significant problem with send queues, we run into also quite a while ago. And you rightly find the solution that it should have backpressure. We resolved this, here is a useful comment to track changes: sigp/lighthouse#4918 (comment) We needed a fix quickly and our changes were quite large to the gossipsub code base and we didn't have time to merge upstream. So we have forked from rust-libp2p and currently have our own gossipsub which handles the send queue backpressure, along with a few other fixes. Our implementation is here: https://github.com/sigp/lighthouse/tree/stable/beacon_node/lighthouse_network/gossipsub We are planning/in the process of upstreaming our fixes to rust-libp2p but they are not in there yet. Essentially our fix creates channels with different priorities. Sending messages have a higher priority than sending IHAVE/IWANT gossip messages for example. When we drop messages, lower priority messages gets dropped before higher priority ones. These queues are also bounded and configurable. So you can choose a bound and they wont drop anything until we max them out. Hope it helps :) |
Beta Was this translation helpful? Give feedback.
-
I have a live network with ~800 nodes which publish "ping" messages with their state every 10 seconds. This generates quite a lot of messages, but they are quickly deprecated. As a new ping gets broadcasted, the previous one for the same peer is no longer of any interest. When a node joins the network and subscribes to the pings topic, I see it getting flooded with thousands of messages. Therefore, I would like to ask if gossipsub protocol supports any of the following features?
Beta Was this translation helpful? Give feedback.
All reactions