-
Notifications
You must be signed in to change notification settings - Fork 39
Structure analysis API proposal
The original dendrogram package contains code for basic structure analysis. Dendro-core
needs this as well. There are two layers of analysis which roughly map from general to astronomical specific use cases. These can be thought of a summary statistics of the intensity values x,y,z coordinates of a scalar intensity I. For simulation data, we will assume analysis of a 2- or 3- dimensional scalar field like density or gravitational potential.
There are three natural layers of abstraction:
At the lowest level of abstraction (level 0), the "basic properties" summarize a collection of voxels (we can generically call this a structure). At a minimum, there is one scalar field defined at each voxel (usually an intensity for a PPV observation, or a density for a PPP cube). However, there could be more. A dendrogram defines one set of structures (the branches and leaves), but the basic quantities layer need not and should not assume that the input structures come from a dendrogram.
At a higher level of abstraction are "semantically rich" quantities (level 1)-- these are more or less directly derived from the basic quantities, but have physically meaningful units. This layer requires additional metadata about each structure: what each axis refers to, whether the underlying field is an intensity / density / other field, the physical pixel scale, etc.
The highest level of abstraction is a set of utilities that iterate through dendrogram structures, and extract an ensemble of semantically rich quantities for all sructures (level 2).
- Flux-like : The sum of the pixel values
- Location-like : Intensity-weighted centroid
- Extent-like : Intensity-weighted second moments, along each axis and covariances of the same.
- Derived Quantities
- Position angle, major and minor axes (from diagonalizing the covariance tensor)
- Equivalent rotations in the third dimension (corresponding to velocity gradients in PPV case)
- Surface Area / Perimeter (optional, not currently implemented)
- Volume / Area (optional, not currently implemented)
Semantically, we should probably avoid analysis specific vocabulary in the "Basic Quantities" layer and refer to them as things like Moment0
for the integrated flux and Moment2
for n x n dimensional covariance matrix (for n dimensions).
- deconvolved second moments (major / minor axes)
- radius : The geometric mean of the major and minor axes
- luminosity : intensity combined with distance
- mass : Usually, approximated by applying a conversion factor to the flux
- virial parameter
- pressure
- n.b. Most analyses will require including additional fields in the analysis framework.
- gravitational potential
- kinetic energies
The same as above, though we can measure a 3D orientation vector, the extents along all 3 principle axes, and rely external velocity information for velocity dispersion, virial parameter, and pressure
These quantities can be computed for any collection of voxels. The most natural collection of voxels is each leaf/branch. However, we should also be able to sample these quantities at different contour levels within a leaf/branch.
There is some tension between analysis that makes the most sense for PPV spectral data cubes, versus more generic cases. We should strive for a natural API to address PP maps, PPV spectral data cubes, and PPP simulation data. These might need to be separate classes.
The original "levelprops" code considers three paradigms for dealing with how structures are truncated: The "bijection" paradigm assumes that a structure has a sharp boundary, and all of the emission interior to the boundary belongs to the structure. The "clipping" paradigm assumes that a structure has a sharp boundary, and only the emission above the contour level of the boundary is part of the structure. The "extrapolation" paradigm assumes that a structure extends beyond the boundary.
EWR: I think the "clipping" approach is not useful to be ported into the python analysis. In most use cases, it's not physical. The "extrapolation" approach is useful but the underlying mechanics of how to do the extrapolation are ill-explored. This utilized the cprops
framework, at present. It seems appropriate to implement "bijection" immediately and then worry about "extrapolation" at a later step in the development process.
One question to address: given the new implementation does not rely on a specific set of contour levels, what should be used for determining properties. Potentially, all the elements of the data set associated with the branch could be used.
I'm not yet sure what the interface should look like. Here are some options:
# or PPPCataloger, PPCataloger, ...
c = PPVCataloger(...wcs info, distance, etc...)
# extract all quantities for a given leaf or branch, optionally considering the subset above a given contour
# output is... a numpy record?
# could also specify a list of contours, returning a record array
property = c.extract(leaf_or_branch, contour=None)
#or
# same thing, but specify a boolean mask for more control over which voxels are/are not considered
property = c.extract(leaf_or_branch, mask=None)
The analysis of simulations suggests that whatever levelprops
-like functionality gets implemented will require the ability to apply the proposed analysis class to data sets other than the original.