Second round of applycal optimisations #224

bmerry · 2019-02-14T09:07:11Z

The main speedups come from

Building a separate dask graph from the corrections, so that the calculations can be recycled when loading vis, flags, weights jointly.
Expanding CategoricalData (for B and K) into Python lists of arrays, which can be indexed much faster. This does have the downside of extracting all cal sensors when the file is opened rather than lazily, but for now that's only going to happen if the user explicitly requested applycal and so it is presumably going to happen sooner or later.

A major change is that calibration solutions are now computed and applied prior to stage1 selection. This may be slower in some cases where not all the data is accessed e.g. if the user is only selecting a subset of inputs, the calibration correction will still be done on all inputs and baselines. On the other hand, it means we now have a distinct 'postproc' flag bit that can be selected.

I also implemented SR-1214, creating a katdal.flags module.

This actually reduces performance, but it is a stepping stone to reusing a single correction array across vis, flags and weights.

- Add --workers option - Add some warmup outside the loop so that once-off overheads (like poking lazy indexers to make them produce the dask graphs) doesn't skew the results.

Querying a CategoricalData is relatively slow. To avoid doing it per chunk, look up the values when the file is opened and just make a list indexed by dump. Also pack all the arguments that are going through from_block_function into an object so that dask doesn't try to hash the individual arrays.

It had an uninitialised value that meant no corrections were being applied.

Also add some more documentation.

Pulling in the new code from dask/dask#4476.

The FLAG_NAMES and FLAG_DESCRIPTIONS are then re-imported to h5datav2, h5datav3 and visdatav4 (for compatibility), and also in the top-level namespace. There are also definitions for the individual bits (e.g. DATA_LOST_BIT = 3, DATA_LOST = 1 << DATA_LOST_BIT); for now I didn't pull those into the top-level namespace because it's not obvious that they represent flags when written as katdal.DATA_LOST, but katdal.flags.DATA_LOST is self-explanatory. In order for katdal.flags.DATA_LOST to work, I removed the code that deletes the modules from the top-level namespace. This addresses SR-1214 (although it does not introduce a `flags_raw` array to datasets).

ludwigschwardt

Awesome stuff!

Just some minor nitpicks and musings.

katdal/__init__.py

katdal/applycal.py

katdal/flags.py

ludwigschwardt · 2019-02-19T08:59:30Z

katdal/visdatav4.py

+            corrected_flags = self._make_corrected(apply_flags_correction, self.source.data.flags)
+            corrected_weights = self._make_corrected(apply_weights_correction, self.source.data.weights)
+            self._corrected = VisFlagsWeights(corrected_vis, corrected_flags, corrected_weights,
+                                              name='corrected')


What's in a name? More options:

L1

sdp_l1

1543471660_sdp_l1, to match the default L0 name

I'm not fussy, and I don't have a clear picture of what else you're stuffing into name. Pick something and I will make it so. Another option would be to incorporate self.source.data.name into the name so that you know where it's coming from.

How about starting with self.source.data.name? If it contains 'sdp_l0', replace it with 'sdp_l1', else append '_corrected'.

Done, although slightly differently because I didn't have your exact recipe open when I did the change. Feel free to tweak it on the branch.

katdal/visdatav4.py

katdal/applycal.py

katdal/visdatav4.py

Mostly cosmetic, but fixes flags (they were being built from the original flags not the "corrected" flags).

bmerry · 2019-02-19T11:48:41Z

Should be ready for another look.

- Make flag corrections actually work - Change how the name of the _corrected VisFlagsWeights is computed

bmerry added 9 commits February 12, 2019 17:24

Create a dask array holding applycal corrections

cc4917d

This actually reduces performance, but it is a stepping stone to reusing a single correction array across vis, flags and weights.

Work-in-progress on shared applycal

c54a415

Additions to mvf_read_benchmark

a71fc79

- Add --workers option - Add some warmup outside the loop so that once-off overheads (like poking lazy indexers to make them produce the dask graphs) doesn't skew the results.

Fix broken applycal

dca92c7

It had an uninitialised value that meant no corrections were being applied.

Update unit test for applycal changes

ef971e8

Also add some more documentation.

Fix typo in docstring

db3c827

Fix minor bugs in from_block_function

47d7e61

Pulling in the new code from dask/dask#4476.

bmerry assigned ludwigschwardt Feb 14, 2019

bmerry requested a review from ludwigschwardt February 14, 2019 09:07

Merge remote-tracking branch 'origin/master' into applycal-opt2

5eafb64

ludwigschwardt requested changes Feb 19, 2019

View reviewed changes

Improvements from pull request review

9488b6f

Mostly cosmetic, but fixes flags (they were being built from the original flags not the "corrected" flags).

bmerry and others added 2 commits February 20, 2019 09:21

More PR tweaks

e013443

- Make flag corrections actually work - Change how the name of the _corrected VisFlagsWeights is computed

Tweak derived name of corrected data container

a753e44

ludwigschwardt approved these changes Feb 20, 2019

View reviewed changes

bmerry merged commit 622b1ef into master Feb 20, 2019

bmerry deleted the applycal-opt2 branch February 20, 2019 08:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Second round of applycal optimisations #224

Second round of applycal optimisations #224

bmerry commented Feb 14, 2019

ludwigschwardt left a comment

ludwigschwardt Feb 19, 2019

bmerry Feb 19, 2019

ludwigschwardt Feb 19, 2019 •

edited

Loading

bmerry Feb 20, 2019

ludwigschwardt Feb 20, 2019

bmerry commented Feb 19, 2019

Second round of applycal optimisations #224

Second round of applycal optimisations #224

Conversation

bmerry commented Feb 14, 2019

ludwigschwardt left a comment

Choose a reason for hiding this comment

ludwigschwardt Feb 19, 2019

Choose a reason for hiding this comment

bmerry Feb 19, 2019

Choose a reason for hiding this comment

ludwigschwardt Feb 19, 2019 • edited Loading

Choose a reason for hiding this comment

bmerry Feb 20, 2019

Choose a reason for hiding this comment

ludwigschwardt Feb 20, 2019

Choose a reason for hiding this comment

bmerry commented Feb 19, 2019

ludwigschwardt Feb 19, 2019 •

edited

Loading