-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Provide benchmarks testing against available query-server implementations. #5
Comments
Some preliminary benchmark results using a similar method to the one from article:
The first two are using couchjs and the last two are with couch-chakra. Note how first run (after deleting the design doc, compacting the DB and running the view cleanup) differs from design doc update, given it's also running in just one process instead of 8 as in the update case. I'm not quite sure why is this. My prior understanding of CouchDB's MapReduce was that it traverses the whole B-Tree structure and applies map to each item, building a new B-Tree each time. But with these results it might be a bit different... |
Please ignore the part about running better on the first view creation. Apparently in those cases CouchDB has only indexed one shard out of 8. And the next query actually indexed the whole view. So only "2nd time" runs really matter for the benchmarking process. |
Just checked out your repository https://github.com/excieve/dragnet, great work and write-up on benchmarking views! I'm just curious, do you think it would be possible to provide a synthetic data-set for benchmarking based on your real-word data set? I'm aware that your data is most certainly private, but maybe you could come up with some sort of anonymization procedure. It's just that real-world data is so much more signifikative for benchmarks! Another thing I'd be curious about is average CPU time and memory used during indexing. I'd be curious to see how ChakraCore performs there in comparison to SpiderMonkey, especially since ChakraCore is supposed to be optimized for IoT scenarios. Good news also from the binary protocol side: I have all the view tests running now. CouchApp specific functions still fail but I'm going to ignore those for the moment. There's still some cleanup work to do, and more important, some minor modifications in CouchDB itself, but from the conceptual side I'm on the right way! |
Actually the dataset I'm using is completely public. It's a subset of tax and property ownership declarations, which all public employees in Ukraine have to submit annually (and upon certain events) into an official online registry. After which they are in public domain, available from the national agency for corruption prevention, like this one for instance: https://public-api.nazk.gov.ua/v1/declaration/3371ace7-177b-44d6-ba2a-53e023f740be. We're planning to analyse them all in continuous manner but for the testing purposes I'm only operating on a subset. I can provide you with a file suitable to feed into this import script or maybe just archive CouchDB's data volume with this dataset already imported. Which do you prefer?
In all cases query servers don't really consume much CPU time due to the I/O bottleneck. The only case when I've seen a query server get more CPU than CouchDB was CPython with multiple iterations in the function (which did not get optimised like in JITed PyPy or JS runtimes). But I agree, it would be nice to monitor those and report too. Will see if I can get it in the benchmarks.
That's awesome! Please let me know when there's something to try out and I will be happy to test. |
That's definitely an interesting dataset :) One thing I thought might be comfortable is to provide a publicly available Cloudant instance with the demo data from which interested parties might replicate from. That is, as far as I understood it's only free up to 1GB, but maybe that's enough! But I'll take what works and what's easiest for you, shouldn't take too much time either! |
FYI: https://medium.com/@excieve/benchmarking-couchdb-views-abb7a0a891b2#.v6px6aid2 |
Might be something along the lines of this blog post: http://blog.idempotent.ca/2016/12/19/couchdb-indexing-benchmark/
@excieve also mentions working on some benchmarking in
#2 (comment)
The text was updated successfully, but these errors were encountered: