Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Normalize basePath in targetTables in ConversionController #602

Merged
merged 1 commit into from
Dec 18, 2024

Conversation

vinishjail97
Copy link
Contributor

Important Read

  • Please ensure the GitHub issue is mentioned at the beginning of the PR

What is the purpose of the pull request

When iceberg tables are used as source, they can have different basePath and dataPaths, when synchronizing these tables to hudi and delta, the .hoodie and _delta_log directories should be present in basePath/data/.hoodie, the current code generates the metadata at basePath/.hoodie

Brief change log

(for example:)

  • Fixes the target base paths when source table format is ICEBERG with different basePath and dataPath

Verify this pull request

(Please pick either of the following options)

This change added tests and can be verified as follows:

(example:)

  • xtable-core/src/test/java/org/apache/xtable/ITConversionController.java
  • xtable-core/src/test/java/org/apache/xtable/conversion/TestConversionUtils.java

@vinishjail97 vinishjail97 merged commit 6a5f2b4 into apache:main Dec 18, 2024
2 checks passed
@vinishjail97 vinishjail97 deleted the IcebergSourceFix branch December 18, 2024 21:43
* Few table formats need the metadata to be located at the root level of the data files. Eg: An
* iceberg table generated through spark will have two directories basePath/data and
* basePath/metadata For synchronising the iceberg metadata to hudi and delta, they need to be
* present in basePath/data/.hoodie and basePath/data/_delta_log.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @vinishjail97
This observation is not accurate for delta lake. It is ok for the _delta_log to be present at the basePath level.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants