-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[chore][tracker]: save most recent (archive) write index to disk #36799
base: main
Are you sure you want to change the base?
Conversation
|
||
if err := persister.Set(ctx, key, buf.Bytes()); err != nil { | ||
ops = append(ops, storage.SetOperation(key, buf.Bytes())) | ||
if err := persister.Batch(ctx, ops...); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For existing usage, this will be a no-op.
262c3e3
to
eb13fdb
Compare
eb13fdb
to
ab6bdd1
Compare
// It's best if we reset the index or else we might end up writing invalid keys | ||
t.set.Logger.Warn("the read index was found, but it exceeds the bounds. Starting from 0") | ||
t.archiveIndex = 0 | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea to check for this case.
However, I wonder if we can handle it better than restarting from zero. What would it take to search the archive for the most recently updated?
I think we could maintain some kind of data structure which notes the time each archive was written. Maybe just map[index]time.Time
. Then when we first create the tracker, we can load this up and find the most recent timestamp. We can also check for the case where pollsToArchive
has changed and then rewrite the storage to align with the new value.
For example, if we previously saved 10 archives and find that pollsToArchive
is now 5, we can find the 5 most recent indices based on the timestamp structure, then rewrite the archive files so that these are 0-4. We should probably even delete the extras from storage as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@djaglowski This solution does makes sense to me, but it becomes tricky when we eventually overwrite old archive data, as it is a ring buffer.
We might need to load the filesets in memory.
I'll find a few ways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it becomes tricky when we eventually overwrite old archive data, as it is a ring buffer.
Can you elaborate?
We might need to load the filesets in memory.
If it's more than one at a time then it defeats the point of the archive.
Co-authored-by: Daniel Jaglowski <[email protected]>
This PR stores the most recent index to disk. Much similar to what happens for persistent queue. It also adds
Batch
methods tooperator.Persister
, as saving the metadata and saving the index should be a transaction and it can only be achieved viaBatch
.For eg. if user has configured archiving to store 100 poll cycles, let's assume:
archiveIndex
is 11 (pointing to the next index).archiveIndex
from disk and continue from index 11Link to tracking issue
Related #32727
Testing
Added UT for checking index