Structure analysis API proposal

The original dendrogram package contains code for basic structure analysis. Dendro-core needs this as well. There are two layers of analysis which roughly map from general to astronomical specific use cases. These can be thought of a summary statistics of the intensity values x,y,z coordinates of a scalar intensity I. For simulation data, we will assume analysis of a 2- or 3- dimensional scalar field like density or gravitational potential.

There are three natural layers of abstraction:

At the lowest level of abstraction (level 0), the "basic properties" summarize a collection of voxels (we can generically call this a structure). At a minimum, there is one scalar field defined at each voxel (usually an intensity for a PPV observation, or a density for a PPP cube). However, there could be more. A dendrogram defines one set of structures (the branches and leaves), but the basic quantities layer need not and should not assume that the input structures come from a dendrogram.

At a higher level of abstraction are "semantically rich" quantities (level 1)-- these are more or less directly derived from the basic quantities, but have physically meaningful units. This layer requires additional metadata about each structure: what each axis refers to, whether the underlying field is an intensity / density / other field, the physical pixel scale, etc.

The highest level of abstraction is a set of utilities that iterate through dendrogram structures, and extract an ensemble of semantically rich quantities for all sructures (level 2).

Basic quantities (level 0)

Flux-like : The sum of the pixel values
Location-like : Intensity-weighted centroid
Extent-like : Intensity-weighted second moments, along each axis and covariances of the same.
Derived Quantities
Position angle, major and minor axes (from diagonalizing the covariance tensor)
Equivalent rotations in the third dimension (corresponding to velocity gradients in PPV case)
Surface Area / Perimeter (optional, not currently implemented)
Volume / Area (optional, not currently implemented)

Semantically, we should probably avoid analysis specific vocabulary in the "Basic Quantities" layer and refer to them as things like Moment0 for the integrated flux and Moment2 for n x n dimensional covariance matrix (for n dimensions).

Quantities requiring observation metadata and an assumed distance (level 1)

deconvolved second moments (major / minor axes)
radius : The geometric mean of the major and minor axes
luminosity : intensity combined with distance

Quantities specific to molecular cloud PPV cubes (level 1)

mass : Usually, approximated by applying a conversion factor to the flux
virial parameter
pressure

Quantities specific to PPP simulation data (level 1)

n.b. Most analyses will require including additional fields in the analysis framework.
gravitational potential
kinetic energies

The same as above, though we can measure a 3D orientation vector, the extents along all 3 principle axes, and rely external velocity information for velocity dispersion, virial parameter, and pressure

Other considerations

These quantities can be computed for any collection of voxels. The most natural collection of voxels is each leaf/branch. However, we should also be able to sample these quantities at different contour levels within a leaf/branch.

There is some tension between analysis that makes the most sense for PPV spectral data cubes, versus more generic cases. We should strive for a natural API to address PP maps, PPV spectral data cubes, and PPP simulation data. These might need to be separate classes.

The original "levelprops" code considers three paradigms for dealing with how structures are truncated: The "bijection" paradigm assumes that a structure has a sharp boundary, and all of the emission interior to the boundary belongs to the structure. The "clipping" paradigm assumes that a structure has a sharp boundary, and only the emission above the contour level of the boundary is part of the structure. The "extrapolation" paradigm assumes that a structure extends beyond the boundary.

EWR: I think the "clipping" approach is not useful to be ported into the python analysis. In most use cases, it's not physical. The "extrapolation" approach is useful but the underlying mechanics of how to do the extrapolation are ill-explored. This utilized the cprops framework, at present. It seems appropriate to implement "bijection" immediately and then worry about "extrapolation" at a later step in the development process.

One question to address: given the new implementation does not rely on a specific set of contour levels, what should be used for determining properties. Potentially, all the elements of the data set associated with the branch could be used.

Interface

I'm not yet sure what the interface should look like. Here are some options:

Cataloger classes (level 2)

# or PPPCataloger, PPCataloger, ...
c = PPVCataloger(...wcs info, distance, etc...)

# extract all quantities for a given leaf or branch, optionally considering the subset above a given contour
# output is... a numpy record?
# could also specify a list of contours, returning a record array
property = c.extract(leaf_or_branch, contour=None)

#or
# same thing, but specify a boolean mask for more control over which voxels are/are not considered
property = c.extract(leaf_or_branch, mask=None)

The analysis of simulations suggests that whatever levelprops-like functionality gets implemented will require the ability to apply the proposed analysis class to data sets other than the original.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly