Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reader can not read any data until the writer close the file #215

Open
quantsword opened this issue Mar 27, 2017 · 3 comments
Open

reader can not read any data until the writer close the file #215

quantsword opened this issue Mar 27, 2017 · 3 comments

Comments

@quantsword
Copy link

I run 2 programs in separate process. On program created a file and write to the end 64K a time till the file size reaches 1G. The other program was launched after writer started (file created) to read the same time. In my first iteration, reader program could not read anything even after write finishes and exit. I added "UpdateFilesize" to update filesize when read returns 0. Reader can read data after writer finishes.

I believe reader should be allowed to read even if write does not close the file yet. It is ok for writer to keep the lease on last chunk. But for completed chunk, It should allow everybody to read.

@mckurt
Copy link
Contributor

mckurt commented Mar 31, 2017

Hi,

I replicated this setup: one writer and one reader running concurrently. As soon as a chunk got fully written, e.g. gets stable, reader started to read the chunk. However, my setup was using 1x replication without striping.

Mehmet

@mikeov
Copy link
Contributor

mikeov commented Mar 31, 2017

Striped files, including RS, cannot be read until closed by design: close call sets the logical file size (EOF). Until striped file is closed its logical file size remains 0.

@quantsword
Copy link
Author

Thanks for your response.

Rationale why this is needed.
Scenario #1, It is typical for data processing pipeline to use files to connect the input and output of 2 process steps. Step 1 generate items and write to file, step 2 read this file and do further processing. If process2 is a map process (no need for sorting, shuffling, etc), it can start the process as soon as data is readable from the file. After it consume all the data items, it can wait till more data become ready.

Scenario #2, data is appended to a qfs file and index is created at the same time. Data request can come any time for any record. If an record is available as indicated in index (index can be local file instead of QFS), it will try to read from QFS file with the specified location. It can wait a little bit but not till QFS file close, as append will always happen.

My Question, after QFS collected 6 strips and generated 3 recovery strips, it will push these 9 strips into 9 chunk server. The write client can notify meta server with a updated file size. Is there any design concerns in this approach?

thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants