Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide benchmarks testing against available query-server implementations. #5

Open
dmunch opened this issue Mar 5, 2017 · 6 comments

Comments

@dmunch
Copy link
Owner

dmunch commented Mar 5, 2017

Might be something along the lines of this blog post: http://blog.idempotent.ca/2016/12/19/couchdb-indexing-benchmark/

@excieve also mentions working on some benchmarking in
#2 (comment)

@excieve
Copy link

excieve commented Mar 6, 2017

Some preliminary benchmark results using a similar method to the one from article:

1st time (after compaction and view cleanup)
$ python3 utils/addview.py data/reference/income_map_function.js -u admin -p admin -D map_tests_js -v read_income_test -l javascript -b
INFO:dragnet.addview:average = 657.98 (all_tasks) 657.98 (one task)
INFO:dragnet.addview:total time = 55s

2nd time (updating a view with a slightly modified function)
$ python3 utils/addview.py data/reference/income_map_function.js -u admin -p admin -D map_tests_js -v read_income_test -l javascript -b
INFO:dragnet.addview:average = 1196.54 (all_tasks) 149.87 (one task)
INFO:dragnet.addview:total time = 249s

1st time (after compaction view cleanup)
$ python3 utils/addview.py data/reference/income_map_function.es6 -u admin -p admin -D map_tests_ch -v read_income_test -l chakra -b
INFO:dragnet.addview:average = 676.65 (all_tasks) 676.65 (one task)
INFO:dragnet.addview:total time = 52s

2nd time (updating a view with a slightly modified function)
$ python3 utils/addview.py data/reference/income_map_function.es6 -u admin -p admin -D map_tests_ch -v read_income_test -l chakra -b
INFO:dragnet.addview:average = 1213.02 (all_tasks) 147.49 (one task)
INFO:dragnet.addview:total time = 243s

The first two are using couchjs and the last two are with couch-chakra. Note how first run (after deleting the design doc, compacting the DB and running the view cleanup) differs from design doc update, given it's also running in just one process instead of 8 as in the update case. I'm not quite sure why is this. My prior understanding of CouchDB's MapReduce was that it traverses the whole B-Tree structure and applies map to each item, building a new B-Tree each time. But with these results it might be a bit different...

@excieve
Copy link

excieve commented Mar 6, 2017

Please ignore the part about running better on the first view creation. Apparently in those cases CouchDB has only indexed one shard out of 8. And the next query actually indexed the whole view. So only "2nd time" runs really matter for the benchmarking process.

@dmunch
Copy link
Owner Author

dmunch commented Mar 7, 2017

Just checked out your repository https://github.com/excieve/dragnet, great work and write-up on benchmarking views!

I'm just curious, do you think it would be possible to provide a synthetic data-set for benchmarking based on your real-word data set? I'm aware that your data is most certainly private, but maybe you could come up with some sort of anonymization procedure. It's just that real-world data is so much more signifikative for benchmarks!

Another thing I'd be curious about is average CPU time and memory used during indexing. I'd be curious to see how ChakraCore performs there in comparison to SpiderMonkey, especially since ChakraCore is supposed to be optimized for IoT scenarios.

Good news also from the binary protocol side: I have all the view tests running now. CouchApp specific functions still fail but I'm going to ignore those for the moment. There's still some cleanup work to do, and more important, some minor modifications in CouchDB itself, but from the conceptual side I'm on the right way!

@excieve
Copy link

excieve commented Mar 7, 2017

Actually the dataset I'm using is completely public. It's a subset of tax and property ownership declarations, which all public employees in Ukraine have to submit annually (and upon certain events) into an official online registry. After which they are in public domain, available from the national agency for corruption prevention, like this one for instance: https://public-api.nazk.gov.ua/v1/declaration/3371ace7-177b-44d6-ba2a-53e023f740be.

We're planning to analyse them all in continuous manner but for the testing purposes I'm only operating on a subset. I can provide you with a file suitable to feed into this import script or maybe just archive CouchDB's data volume with this dataset already imported. Which do you prefer?

Another thing I'd be curious about is average CPU time and memory used during indexing.

In all cases query servers don't really consume much CPU time due to the I/O bottleneck. The only case when I've seen a query server get more CPU than CouchDB was CPython with multiple iterations in the function (which did not get optimised like in JITed PyPy or JS runtimes). But I agree, it would be nice to monitor those and report too. Will see if I can get it in the benchmarks.

Good news also from the binary protocol side: I have all the view tests running now.

That's awesome! Please let me know when there's something to try out and I will be happy to test.

@dmunch
Copy link
Owner Author

dmunch commented Mar 8, 2017

That's definitely an interesting dataset :)

One thing I thought might be comfortable is to provide a publicly available Cloudant instance with the demo data from which interested parties might replicate from. That is, as far as I understood it's only free up to 1GB, but maybe that's enough! But I'll take what works and what's easiest for you, shouldn't take too much time either!

@excieve
Copy link

excieve commented Mar 11, 2017

FYI: https://medium.com/@excieve/benchmarking-couchdb-views-abb7a0a891b2#.v6px6aid2
It includes a link to the raw dataset for importing. I'll see if I can get Cloudant instance going though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants