Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark section removed from README #899

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

themisir
Copy link
Contributor

@themisir themisir commented Feb 3, 2022

No description provided.

@PawlikMichal25
Copy link

@themisir Why are you considering removing the benchmark?

@themisir
Copy link
Contributor Author

@themisir Why are you considering removing the benchmark?

It does not reflect current state of the ecosystem. The benchmarks were old and a lot changed since then. Also some points might be "unfair" to compare on benchmark basis. So I don't want people to make decision based on misleading data.

Also some benchmark steps were not tested correctly back then or for some other reasons when I do benchmark myself the results are different, for some reason Lazy boxes performed worst than regular boxes.

@PawlikMichal25
Copy link

It's a fair point that we should keep the benchmark up to date.
I think it's normal for benchmarks that they compare a certain thing though. So it's developer's responsibility to think whether a certain benchmark relates to their use case.

I'm not sure how it's right now, but I actually adjusted and ran the benchmark myself for my article. The results were somewhat different, but the conclusion was the same - Hive is fast.

Perhaps the benchmark should be updated or described differently, but I think it's responsible to include some benchmark, if we're claiming that Hive is "blazing fast" in documentation.

PS: I'm just a random guy from internet, so sorry for "sticking my nose" here, but I always liked Hive and considered speed to be one of it's most important benefits :)

@themisir
Copy link
Contributor Author

themisir commented Mar 27, 2022

Yeah I've updated and added new comparison points to the benchmark. You can see final result here. But what's interesting is apperately lazy reads was way slower than what's shown on current README. I think that's because Hive have to read data from file when doing lazy reads, which is the whole point (otherwise hive have to read the whole data into memory, which indeed takes memory, which might be an issue if data source is bigger). But that also means using more conventional solutions like SQLite makes more sense for data heavy workloads instead of using Hive because of efficiency.

PS: I'm just a random guy from internet, so sorry for "sticking my nose" here, but I always liked Hive and considered speed to be one of it's most important benefits :)

Don't worry, I'm actually not sure about removing that too, so that's why didn't wanted to merge this PR yet. I think maybe I should update data instead of removing..

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants