Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Second round of applycal optimisations #224

Merged
merged 13 commits into from
Feb 20, 2019
Merged

Second round of applycal optimisations #224

merged 13 commits into from
Feb 20, 2019

Conversation

bmerry
Copy link
Contributor

@bmerry bmerry commented Feb 14, 2019

The main speedups come from

  • Building a separate dask graph from the corrections, so that the calculations can be recycled when loading vis, flags, weights jointly.
  • Expanding CategoricalData (for B and K) into Python lists of arrays, which can be indexed much faster. This does have the downside of extracting all cal sensors when the file is opened rather than lazily, but for now that's only going to happen if the user explicitly requested applycal and so it is presumably going to happen sooner or later.

A major change is that calibration solutions are now computed and applied prior to stage1 selection. This may be slower in some cases where not all the data is accessed e.g. if the user is only selecting a subset of inputs, the calibration correction will still be done on all inputs and baselines. On the other hand, it means we now have a distinct 'postproc' flag bit that can be selected.

I also implemented SR-1214, creating a katdal.flags module.

This actually reduces performance, but it is a stepping stone to reusing
a single correction array across vis, flags and weights.
- Add --workers option
- Add some warmup outside the loop so that once-off overheads (like
  poking lazy indexers to make them produce the dask graphs) doesn't
  skew the results.
Querying a CategoricalData is relatively slow. To avoid doing it per
chunk, look up the values when the file is opened and just make a list
indexed by dump.

Also pack all the arguments that are going through from_block_function
into an object so that dask doesn't try to hash the individual arrays.
It had an uninitialised value that meant no corrections were being
applied.
Also add some more documentation.
The FLAG_NAMES and FLAG_DESCRIPTIONS are then re-imported to h5datav2,
h5datav3 and visdatav4 (for compatibility), and also in the top-level
namespace. There are also definitions for the individual bits (e.g.
DATA_LOST_BIT = 3, DATA_LOST = 1 << DATA_LOST_BIT); for now I didn't
pull those into the top-level namespace because it's not obvious that
they represent flags when written as katdal.DATA_LOST, but
katdal.flags.DATA_LOST is self-explanatory.

In order for katdal.flags.DATA_LOST to work, I removed the code that
deletes the modules from the top-level namespace.

This addresses SR-1214 (although it does not introduce a `flags_raw`
array to datasets).
Copy link
Contributor

@ludwigschwardt ludwigschwardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome stuff!

Just some minor nitpicks and musings.

katdal/__init__.py Show resolved Hide resolved
katdal/applycal.py Show resolved Hide resolved
katdal/applycal.py Show resolved Hide resolved
katdal/applycal.py Show resolved Hide resolved
katdal/applycal.py Outdated Show resolved Hide resolved
katdal/flags.py Outdated Show resolved Hide resolved
corrected_flags = self._make_corrected(apply_flags_correction, self.source.data.flags)
corrected_weights = self._make_corrected(apply_weights_correction, self.source.data.weights)
self._corrected = VisFlagsWeights(corrected_vis, corrected_flags, corrected_weights,
name='corrected')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's in a name? More options:

  • L1
  • sdp_l1
  • 1543471660_sdp_l1, to match the default L0 name

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fussy, and I don't have a clear picture of what else you're stuffing into name. Pick something and I will make it so. Another option would be to incorporate self.source.data.name into the name so that you know where it's coming from.

Copy link
Contributor

@ludwigschwardt ludwigschwardt Feb 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about starting with self.source.data.name? If it contains 'sdp_l0', replace it with 'sdp_l1', else append '_corrected'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, although slightly differently because I didn't have your exact recipe open when I did the change. Feel free to tweak it on the branch.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

katdal/visdatav4.py Outdated Show resolved Hide resolved
katdal/applycal.py Show resolved Hide resolved
katdal/visdatav4.py Outdated Show resolved Hide resolved
Mostly cosmetic, but fixes flags (they were being built from the
original flags not the "corrected" flags).
@bmerry
Copy link
Contributor Author

bmerry commented Feb 19, 2019

Should be ready for another look.

bmerry and others added 2 commits February 20, 2019 09:21
- Make flag corrections actually work
- Change how the name of the _corrected VisFlagsWeights is computed
@bmerry bmerry merged commit 622b1ef into master Feb 20, 2019
@bmerry bmerry deleted the applycal-opt2 branch February 20, 2019 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants