Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

min_delta fails on "flat-topped" leaves with no parent structure #176

Open
jbaade opened this issue Sep 30, 2021 · 1 comment
Open

min_delta fails on "flat-topped" leaves with no parent structure #176

jbaade opened this issue Sep 30, 2021 · 1 comment

Comments

@jbaade
Copy link

jbaade commented Sep 30, 2021

I noticed some unexpected behavior while testing dendrograms on random 1D spectra, to learn about how the algorithm worked. The file given is just an array of Gaussian noise with stdev 1, with a Gaussian function added with mean 50, amplitude 6, and stdev 2. I made a large number of varying spectra and collected examples of failed detections; some of them were too broken up by noise, which makes sense, but the rest had this sort of profile:
bad_spectrum

bad_spectrum_file.txt

Loading that file into an array and running Dendrogram.compute(spectrum, min_value=3, min_delta=1, min_npix=3), I get an empty dendrogram, because only one structure is detected at all, and it fails the independence test for min_delta. Lines 84-91 in pruning.py:

    def result(structure, index=None, value=None):
        if value is None:
            if structure.parent is not None:
                return (structure.height - structure.parent.height) >= delta

            return (structure.vmax - structure.vmin) >= delta
        return (structure.vmax - value) >= delta
    return result

Here, structure.parent is None because the collection of points above min_value=3, at x=48 through x=51, forms the only structure, so line 89 compares the highest (5.25) and lowest (5.03) value in the structure and returns False.

I might have misunderstood the algorithm example in the documentation, but why this line compares vmax and vmin isn't very clear to me. My understanding was that the purpose of min_delta was to set a cutoff for distinguishing a substructure from noise. Why would a large-enough structure far above the min_value threshold not ought to be counted if it happens to be alone and to have a plateauing profile? This is just the sort of shape that can appear upon sampling a noisy Gaussian; I expect it might occur in the 2D case as well as 1D. Is this behavior intentional, and if so, what's the reasoning behind it?

@keflavich
Copy link
Contributor

I think you're right, this sort of object should be included, and I think you've pinpointed a line that incorrectly excludes it. This warrants a deeper review. I'd like to hear from others whether there's any good explanation for this.

I also suggest we add this example to the test suite as a regression test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants