-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Second round of applycal optimisations #224
Conversation
This actually reduces performance, but it is a stepping stone to reusing a single correction array across vis, flags and weights.
- Add --workers option - Add some warmup outside the loop so that once-off overheads (like poking lazy indexers to make them produce the dask graphs) doesn't skew the results.
Querying a CategoricalData is relatively slow. To avoid doing it per chunk, look up the values when the file is opened and just make a list indexed by dump. Also pack all the arguments that are going through from_block_function into an object so that dask doesn't try to hash the individual arrays.
It had an uninitialised value that meant no corrections were being applied.
Also add some more documentation.
Pulling in the new code from dask/dask#4476.
The FLAG_NAMES and FLAG_DESCRIPTIONS are then re-imported to h5datav2, h5datav3 and visdatav4 (for compatibility), and also in the top-level namespace. There are also definitions for the individual bits (e.g. DATA_LOST_BIT = 3, DATA_LOST = 1 << DATA_LOST_BIT); for now I didn't pull those into the top-level namespace because it's not obvious that they represent flags when written as katdal.DATA_LOST, but katdal.flags.DATA_LOST is self-explanatory. In order for katdal.flags.DATA_LOST to work, I removed the code that deletes the modules from the top-level namespace. This addresses SR-1214 (although it does not introduce a `flags_raw` array to datasets).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome stuff!
Just some minor nitpicks and musings.
katdal/visdatav4.py
Outdated
corrected_flags = self._make_corrected(apply_flags_correction, self.source.data.flags) | ||
corrected_weights = self._make_corrected(apply_weights_correction, self.source.data.weights) | ||
self._corrected = VisFlagsWeights(corrected_vis, corrected_flags, corrected_weights, | ||
name='corrected') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's in a name? More options:
- L1
- sdp_l1
- 1543471660_sdp_l1, to match the default L0 name
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fussy, and I don't have a clear picture of what else you're stuffing into name
. Pick something and I will make it so. Another option would be to incorporate self.source.data.name
into the name so that you know where it's coming from.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about starting with self.source.data.name
? If it contains 'sdp_l0', replace it with 'sdp_l1', else append '_corrected'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, although slightly differently because I didn't have your exact recipe open when I did the change. Feel free to tweak it on the branch.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
Mostly cosmetic, but fixes flags (they were being built from the original flags not the "corrected" flags).
Should be ready for another look. |
- Make flag corrections actually work - Change how the name of the _corrected VisFlagsWeights is computed
The main speedups come from
A major change is that calibration solutions are now computed and applied prior to stage1 selection. This may be slower in some cases where not all the data is accessed e.g. if the user is only selecting a subset of inputs, the calibration correction will still be done on all inputs and baselines. On the other hand, it means we now have a distinct 'postproc' flag bit that can be selected.
I also implemented SR-1214, creating a katdal.flags module.