Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-2990: Only call hsync() and hflush() on supported filesystems #2991

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

CZuegner
Copy link

@CZuegner CZuegner commented Aug 13, 2024

Instead of log the unsupported call check capabilities and call only on supported filesystems - e.g. S3A does not.

Rationale for this change

When stream into an HadoopOutputFile on S3A a waring gets logged: Application invoked the Syncable API against stream writing to XXX. This is Unsupported
https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/troubleshooting_s3a.html#UnsupportedOperationException_.E2.80.9CS3A_streams_are_not_Syncable._See_HADOOP-17597..E2.80.9D

What changes are included in this PR?

Instead of log the unsupported call (hflush() and hsync()) check capabilities and call only on supported filesystems - whereas S3A is not.

Are these changes tested?

Yes

Are there any user-facing changes?

No

Closes: #2990

@@ -51,7 +51,9 @@ public void write(byte[] b, int off, int len) throws IOException {
}

public void sync() throws IOException {
wrapped.hsync();
if (wrapped.hasCapability("hsync")) {
wrapped.hsync();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@steveloughran Does this make sense to you?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hsync is actually really expensive. even on hdfs you shouldn't be calling it as it doesn't return until the data has been replicated and persisted.

S3A tells you off because some apps (hbase etc) really rely on sync to commit work. there's switch to actually fail...turning that on let us find which bits of code actually expected it to work.

Ideally code shouldn't be using outside of work whey want to be 100% confident data is written.

My IDE doesn't show any uses of the method...is there some through reflection?

Copy link
Contributor

@steveloughran steveloughran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

code LGTM -though I'd question the need to explicitly call them.

hsync() is more expensive than hflush(); it's the one which guarantees the data is persisted. I'd only consider it in close(), and swallow any UnsupportedOperationException everywhere ... in case there's an external implementation which fails or someone has turned s3a failure on.

@@ -51,7 +51,9 @@ public void write(byte[] b, int off, int len) throws IOException {
}

public void sync() throws IOException {
wrapped.hsync();
if (wrapped.hasCapability("hsync")) {
wrapped.hsync();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hsync is actually really expensive. even on hdfs you shouldn't be calling it as it doesn't return until the data has been replicated and persisted.

S3A tells you off because some apps (hbase etc) really rely on sync to commit work. there's switch to actually fail...turning that on let us find which bits of code actually expected it to work.

Ideally code shouldn't be using outside of work whey want to be 100% confident data is written.

My IDE doesn't show any uses of the method...is there some through reflection?

@@ -62,7 +64,9 @@ public void flush() throws IOException {
@Override
public void close() throws IOException {
try (FSDataOutputStream fdos = wrapped) {
fdos.hflush();
if (fdos.hasCapability("hflush")) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. adding a delegating hasCapability() might be handy
  2. consider what to do on a sync failure. close() will sill be invoked but any exception thrown would probably be from the hflush() failure.

@wgtmac
Copy link
Member

wgtmac commented Aug 15, 2024

The CI failures are related:

[INFO] -------------------------------------------------------------
Error:  COMPILATION ERROR : 
[INFO] -------------------------------------------------------------
Error:  /home/runner/work/parquet-java/parquet-java/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopPositionOutputStream.java:[54,16] cannot find symbol
  symbol:   method hasCapability(java.lang.String)
  location: variable wrapped of type org.apache.hadoop.fs.FSDataOutputStream
Error:  /home/runner/work/parquet-java/parquet-java/parquet-hadoop/src/main/java/org/apache/parquet/hadoop/util/HadoopPositionOutputStream.java:[67,15] cannot find symbol
  symbol:   method hasCapability(java.lang.String)
  location: variable fdos of type org.apache.hadoop.fs.FSDataOutputStream

@steveloughran
Copy link
Contributor

that compiler failure means you are running against a very old version of hadoop, 2.8 or earlier as the change is from https://issues.apache.org/jira/browse/HDFS-11644

Keeping the entire hadoop-2/2.7.3 is really preventing the library from using the modern, especially cloud-friendlier APIs -including hadoop 2.9 APIs to probe for capabilities.

Compare with spark which is on 3.4.0.

Cut it and everyone's life will be much better. Doesn't have to be 3.4.x, but the latest 3.3.x release (3.3.x)

@@ -62,7 +64,9 @@ public void flush() throws IOException {
@Override
public void close() throws IOException {
try (FSDataOutputStream fdos = wrapped) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking at this more, while it does guarantee wrapped.close() is called and is very pretty, there's a risk that if hflush() raises an exception then anything raised by wrapped.close() is lost.
when working to s3, it's that close() where the write to S3 takes place and is where the more important failures are likely to be raised.
It might be best to catch and log the hflush failure and always call wrapped.close()

@steveloughran
Copy link
Contributor

steveloughran commented Sep 27, 2024

commented on this again.

that warning is only printed once per process, though it is potentially a sign of a dangerous mismatch between application code and the apps (hbase, some streaming logs)

what we could do (and I'll take a hadoop PR) to give that warning message a new log name which is only used for this message. org.apache.hadoop.fs.s3a.needless for example. 😀

you can have it in hadoop 3.4.1 if you do a timely PR

otherwise, #2944 will fix the build problems

@davidvoit
Copy link

@steveloughran I'm a colleague of Christian and we worked together on this patch. If we add a own category this makes it just easier to filter out the message, but in the end hsync just don't make any sense for parquet and object storeage, or would you disagree?

I think the hasCapalipty route is the best one here. Object storage are always atomic so don't need hsync at all. For hdfs the code still does do the hsync as it has the capality. The warning still makes absolute sense for stuff like hbase, which should not be used as is together with an object storage driver, but parquet as is doesn't has this requirment, and works fine without hsync.

Sure we can wait for 2944 or if we should change something just let us now :-)

@steveloughran
Copy link
Contributor

hsync is overkill, as is flushing. It does make sense for things like streaming logs where you want to be confident it has been persisted even if your app crashes. The applications generally creating Parquet files (hive, spark...) implement failure resilience at a higher level. I wouldn't bother at all.

@davidvoit
Copy link

So should we modify the pull request and just remove the hsync and hflush calls?

@steveloughran
Copy link
Contributor

worksforme.
the only special case is: is someone using this in any commit algorithm where returning is viewed as a sign that it has been persisted? I'm thinking of stuff like iceberg here

hflush and hsync are only needed for special cases and are overkill here
@CZuegner CZuegner force-pushed the s3_syncable_support branch from a429c5b to 7ec03c3 Compare October 14, 2024 08:39
@CZuegner
Copy link
Author

I've changed the PR according your thoughts.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Only call hsync() and hflush() on supported filesystems
4 participants