Table Scan Delete File Handling: Positional and Equality Delete Support #652

sdd · 2024-09-27T07:44:31Z

This PR adds support for handling of both positional and equality delete files within table scans.

The approach taken is to include a list of delete file paths in every FileScanTask. At the moment it is assumed that this list refers to delete files that may apply to the data file in the scan, rather than having been filtered to only contain delete files that definitely do apply to this data file. Further optimisation of plan_files is expected in the future to ensure that this list is pre-filtered before being included in the FileScanTask.

The arrow reader now has the responsibility of retrieving the applicable delete files from FileIO as well as the data file. Thanks to the Object Cache already being in place, if there are file scan tasks being handled in parallel in the same process that both attempt to load the same delete file, the Object Cache should ensure that only a single load and parse occurs. I've just realised that storing data files in the Object Cache is in my local fork - we only store manifests and manifest lists in the Object Cache in this repo! I can create an issue for adding this in a follow-up PR as it brings big performance improvements in many situations.

The approach taken for handling each type of delete file is as follows:

Positional Deletes

Positional Delete support is implemented using the RowSelection functionality of the parquet arrow reader that we are already using. The list of applicable positional delete indices is turned into a RowSelection. If the scan already has enable_row_selection enabled and there is a scan filter predicate, then the RowSelection from this is intersected with the positional delete RowSelection to yield a single combined RowSelection.

NB: This implementation relies on all data files for the table have been written with parquet page indexes in order for positional delete file application to succeed. This seems to be the case with parquet files written by Spark or Iceberg Java but not pyiceberg. In scenarios where positional delete files are encountered, but one or more of the data files that they apply to does not contain a page index, then the scan will return an error. This is at least an improvement on the status quo, where positional delete files cause a scan to fail in all circumstances, and for consumers who are not writing parquet files without page indexes this will not be an issue.

Equality Deletes

All rows from all applicable equality delete files are combined together to create a single BoundPredicate. If the scan also has a filter predicate, this is ANDed with the delete predicate to form a final combined BoundPredicate that is used as before to construct the arrow RowFilter and is also used in the row group filtering.

Update

I added Equality Delete handling into this PR as it only made a difference of about 350 lines.

sdd · 2024-10-10T18:27:55Z

@Xuanwo, @liurenjie1024: This is now ready to review, PTAL when you guys get chance. Look forward to your feedback 😁

sdd · 2024-10-29T08:30:51Z

Hi @liurenjie1024 and @Xuanwo - would either of you be able to review this at some point please? I know it's a bit large, sorry. Thanks :-)

liurenjie1024 · 2024-10-29T08:33:48Z

Hi @liurenjie1024 and @Xuanwo - would either of you be able to review this at some point please? I know it's a bit large, sorry. Thanks :-)

Hi, @sdd Thanks for your patience. In fact I already started reviewing it, and it's a little large, so it may take some time.

sdd · 2024-10-31T08:28:39Z

Hey @liurenjie1024 - sorry to make changes whilst you are reviewing. I updated the design of the DeleteFileManager as I was not happy with it.

liurenjie1024

Thanks @sdd for your patience, it's a really large pr and took me some time to review. I think generally you've understood how deletion files works, but I have some concerns about current code as it mixed a lot of things together. Deletion hanndling is quite chanllenging, I think the design from java implemention is quite reasonable:

I think maybe we need to have a design to split them into more small parts, what do you think?

liurenjie1024 · 2024-10-28T09:46:02Z

crates/iceberg/src/spec/delete_file.rs

+    // filename to a sorted list of row indices.
+    // TODO: Ignoring the stored rows that are present in
+    //   positional deletes for now. I think they only used for statistics?
+    Positional(HashMap<String, Vec<u64>>),


Java implementation uses roaring bitmap to save space, we should also use it?

Can we switch to that in a follow-up PR? This enum is pub(crate) so we won't break any users by doing so.

liurenjie1024 · 2024-11-04T12:42:59Z

crates/iceberg/src/scan.rs

+/// Manages async retrieval of all the delete files from FileIO that are
+/// applicable to the scan. Provides references to them for inclusion within FileScanTasks
+#[derive(Debug, Clone)]
+struct DeleteFileManager {


I would suggest to move this part to a standalone module. And there exists a component DeleteFileIndex in java implementation, which I think is well designed. We don't need to implement all its details in one pr, but maintaining a similar data structure and api as DeleteFileIndex and evolve slowly would be easier.

Refactoring to a separate module

Refactored to a separate module and rebased back on latest main as it was getting a bit stale. I'll be working to update this PR to bring it closer to DeleteFileIndex, ideally in a way that allows me to split this into smaller PRs as well

liurenjie1024 · 2024-11-04T12:48:27Z

crates/iceberg/src/scan.rs

+
+/// A task to scan part of file.
+#[derive(Debug, Clone, Serialize, Deserialize, PartialEq)]
+pub struct FileScanTaskDeleteFile {


This may evolve as we add more feature, so I would suggest to make this a crate only data structure.

Are you sure? It is present inside FileScanTask, all of whose items are pub and is intended for potential consumption outside of the crate.

liurenjie1024 · 2024-11-04T13:14:52Z

crates/iceberg/src/spec/delete_file.rs

+    }
+}
+
+pub(crate) async fn parse_positional_delete_file(


I think this logic is quite similar to data file reader, I would expected it to reuse code of data file reader.

Will make a start on this now

liurenjie1024 · 2024-11-04T13:15:51Z

crates/iceberg/src/spec/delete_file.rs

+    Ok(Deletes::Positional(result))
+}
+
+pub(crate) async fn parse_equality_delete_file(


Consider schema evolution, we should use same logic as data file processing.

liurenjie1024 · 2024-11-04T13:16:44Z

crates/iceberg/src/arrow/reader.rs

+    fn equality_delete_column_to_datum_vec(
+        column: &ArrayRef,
+        field: &NestedFieldRef,
+    ) -> Result<Vec<Option<Datum>>> {


Should this be a struct set?

Do you mean that it should return Result<HashSet<Struct>>?

liurenjie1024 · 2024-11-04T13:19:33Z

crates/iceberg/src/scan.rs

+        //       that are not applicable to the DataFile?
+
+        DeleteFileManagerFuture {
+            files: self.files.clone(),


This is incorrect even if we ignore pruning techniques to remove unrelated deletion files. Please see this part for details.

sdd · 2024-11-12T10:14:51Z

Thanks so much for the review on this @liurenjie1024 - I've been ill for the past week or two so I've not had chance to work through your review in detail yet. I just wanted to let you know I've seen it and will pick it up when I've recovered. 👍

liurenjie1024 · 2024-11-13T03:01:53Z

Thanks so much for the review on this @liurenjie1024 - I've been ill for the past week or two so I've not had chance to work through your review in detail yet. I just wanted to let you know I've seen it and will pick it up when I've recovered. 👍

Hi, @sdd Sorry to hear that, take care of yourself! Don't worry about this, I'll be happy to discuss about this with you anytime when you're back.

Fokko · 2024-12-19T17:13:39Z

@sdd Thanks for doing all this work, could you split out the positional deletes? I think that's already a sizeable chunk.

sdd · 2024-12-19T18:12:14Z

Sure @Fokko - I'm in the middle of a refactor of what I have so far. It aligns the design a bit more closely to the Java DeleteFileIndex while still keeping the more efficient loading process from my original. I was thinking of splitting this PR into three - one that is mostly collating all the delete files into the index, and then two more that each focus on the filtering and application of the two delete types.

Xuanwo · 2024-12-20T05:13:28Z

crates/iceberg/src/delete_file_index.rs

+        // index through the receiver channel. Update the `None` inside the `RwLock` to a `Some`
+        // once the stream has been exhausted so that any consumers awaiting on the Future returned
+        // by DeleteFileIndex::get_deletes_for_data_file can proceed
+        spawn({


Hi, would you like to review #806 first? I believe we can remove most spawn and channels.

Sure, will take a look!

Fokko · 2024-12-20T12:13:34Z

@sdd Thank you for your understanding, looking forward to the smaller PRs 👍 From PyIceberg I've learned that there are a lot of subtle optimizations and want to make sure that we handle those correctly 👍

sdd mentioned this pull request Sep 27, 2024

Delete Files in Table Scans #630

Open

sdd force-pushed the feature/table-scan-delete-file-handling branch 3 times, most recently from 0a64237 to 28021a4 Compare October 10, 2024 18:03

sdd marked this pull request as ready for review October 10, 2024 18:26

sdd changed the title ~~WIP: Table Scan Delete File Handling~~ Table Scan Delete File Handling: Positional Delete Support Oct 10, 2024

sdd force-pushed the feature/table-scan-delete-file-handling branch from 2732a49 to 50f8a9e Compare October 24, 2024 17:46

sdd changed the title ~~Table Scan Delete File Handling: Positional Delete Support~~ Table Scan Delete File Handling: Positional and Equality Delete Support Oct 24, 2024

sdd force-pushed the feature/table-scan-delete-file-handling branch 3 times, most recently from cf8748a to 7a8d297 Compare October 28, 2024 07:28

sdd force-pushed the feature/table-scan-delete-file-handling branch from cc5dba4 to df4e86a Compare October 31, 2024 08:23

liurenjie1024 reviewed Nov 4, 2024

View reviewed changes

c-thiel mentioned this pull request Nov 16, 2024

Tracking issue: Writing iceberg tables #346

Closed

7 tasks

Fokko self-requested a review November 28, 2024 15:24

sdd force-pushed the feature/table-scan-delete-file-handling branch 2 times, most recently from 2ff526f to 091a249 Compare December 11, 2024 19:07

Xuanwo reviewed Dec 20, 2024

View reviewed changes

Fokko mentioned this pull request Dec 20, 2024

feat: exposing delete files in task #625

Closed

feat: add delete_file_index and populate in table scan

4c21f00

sdd force-pushed the feature/table-scan-delete-file-handling branch from c55e986 to 4c21f00 Compare December 21, 2024 09:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Table Scan Delete File Handling: Positional and Equality Delete Support #652

Table Scan Delete File Handling: Positional and Equality Delete Support #652

sdd commented Sep 27, 2024 •

edited

Loading

sdd commented Oct 10, 2024

sdd commented Oct 29, 2024

liurenjie1024 commented Oct 29, 2024

sdd commented Oct 31, 2024

liurenjie1024 left a comment

liurenjie1024 Oct 28, 2024

sdd Dec 9, 2024

liurenjie1024 Nov 4, 2024

sdd Dec 9, 2024

sdd Dec 11, 2024

liurenjie1024 Nov 4, 2024

sdd Dec 9, 2024

liurenjie1024 Nov 4, 2024

sdd Dec 11, 2024

liurenjie1024 Nov 4, 2024

liurenjie1024 Nov 4, 2024

sdd Dec 9, 2024

liurenjie1024 Nov 4, 2024

sdd commented Nov 12, 2024

liurenjie1024 commented Nov 13, 2024

Fokko commented Dec 19, 2024

sdd commented Dec 19, 2024

Xuanwo Dec 20, 2024

sdd Dec 20, 2024

Fokko commented Dec 20, 2024

Table Scan Delete File Handling: Positional and Equality Delete Support #652

Are you sure you want to change the base?

Table Scan Delete File Handling: Positional and Equality Delete Support #652

Conversation

sdd commented Sep 27, 2024 • edited Loading

Positional Deletes

Equality Deletes

Update

sdd commented Oct 10, 2024

sdd commented Oct 29, 2024

liurenjie1024 commented Oct 29, 2024

sdd commented Oct 31, 2024

liurenjie1024 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sdd commented Nov 12, 2024

liurenjie1024 commented Nov 13, 2024

Fokko commented Dec 19, 2024

sdd commented Dec 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Fokko commented Dec 20, 2024

sdd commented Sep 27, 2024 •

edited

Loading