Tree.is_empty method #2640

hyanwong · 2022-11-18T10:54:55Z

As discussed in #2600 (comment), it would be good to have an standard way of checking if a tree is "empty". However, there are 2 definitions of empty:

No topology, anywhere. This is what happens when we delete_intervals e.g. when we simulate a certain region, or when we remove the flanking regions in tsinfer.
No topology reachable from the roots (where a root can also be subject to the root_threshold parameter, such that isolated samples may not always be considered a root).

I think that an is_empty() method would be a good place to document these two different meanings. So something like:

def is_empty(self, check_roots=None) -> bool:
    """
    Check if this tree is "empty" (i.e. has no topology). A tree is empty if it has no edges. However, it
    may also be considered empty if it contains edges which represent "dead branches" (i.e. not
    reachable from the :meth:`~Tree.roots` of the tree). To consider a tree like this as empty, specify
    ``check_roots=True``.

    :param bool check_roots: Should we also consider a tree empty if it has topology which is unconnected
        to any of the roots of the tree. Default: ``None`` treated as ``False``.
    :return: ``True`` if this tree is empty, ``False`` otherwise.
    """
    if check_roots:
        if tree.num_roots == tree.num_samples():  # A common case, when root_threshold is 1
            return True
        else:
            # More inefficient if this is an empty tree with many samples, e.g. a missing region of genome
            return set(tree.roots) <= set(tree.tree_sequence.samples())
    else:
         return self.num_edges == 0

The text was updated successfully, but these errors were encountered:

hyanwong · 2022-12-23T19:07:21Z

Hmm, my logic above is flawed: we can't simply count the number or roots and samples, because a tree could have a single edge above a each sample (not empty, but same number of roots as samples). Similarly, we can't simply look to see if the roots are all samples, as they could have edges descending from the samples, leading nowhere. So I think we are forced to be longwinded here and iterate through each root checking that it has no children:

    def is_empty(self, check_roots=None) -> bool:
        """
        Check if this tree is "empty" (i.e. has no topology). A tree is empty if it has
        no edges. However, it may also be considered empty if it contains edges which
        represent :ref:`dead branches<sec_data_model_tree_dead_leaves_and_branches>`
        (i.e. not reachable from the :meth:`~Tree.roots` of the tree). To consider such
        a tree as empty too, which is more involved, specify ``check_roots=True``.
        
        Note that this is purely a property of the topology. An "empty" tree can still
        contain sites and there may even be mutations on those sites.
    
        :param bool check_roots: Should we also consider a tree empty if it has
            topology but the topology is unconnected to any of the roots of the tree?
            Default: ``None`` treated as ``False``.
        :return: ``True`` if this tree is empty, ``False`` otherwise.
        """
        if self.num_edges == 0:
            return True

        if not check_roots:
            return False

        # Exhaustively check the roots: it's not simply enough to check that the roots
        # are all samples, as they could still have children 
        for u in self.roots:
            if self.num_children(u) != 0:
                return False
        return True

Fixes tskit-dev#2640

jeromekelleher · 2023-01-08T14:32:57Z

I don't understand why we're worried about the check_roots case - what's the application for this?

I think it would be simpler to just define a tree as empty if it has zero edges. We can come up with another word to describe a tree that only contains dead branches, if we need it.

hyanwong · 2023-01-08T14:51:55Z

ISTR that the main "reason" was to document the various definitions of empty for the user. But we could indeed just define an empty tree as having no edges at all, and document somewhere else that trees can "appear" empty (e.g. when plotting or when traversing their nodes in the normal way), but may not be strictly empty by the formal definition.

I guess the thought was that having the param forces people to think about what they mean by "empty" when they are looking at a tree. But this could be done by e.g. a ..note:: in the docstring?

hyanwong · 2023-01-08T14:55:15Z

I guess the thought was that having the param forces people to think about what they mean by "empty" when they are looking at a tree. But this could be done by e.g. a ..note:: in the docstring?

Then we could also point out that an "empty" tree can still have e.g. sites defined in it, which is another weird edge case.

jeromekelleher · 2023-01-08T15:10:05Z

Or just choose another word, like branchless or something? (i.e., a tree that has zero branches)

hyanwong · 2023-01-08T16:01:40Z

Or just choose another word, like branchless or something? (i.e., a tree that has zero branches)

Hmm, not a bad idea. Would we therefore be happy with e.g. ts.trees(skip_branchless=True) rather than ts.trees(skip_empty=True)? It sounds less intuitive to me, although it's more precise, which is good. Perhaps we should ask others in the tskit community for feedback here?

jeromekelleher · 2023-01-08T20:13:12Z

It seems to me that what we're really interested in here is missingness, so why don't we call it is_missing?

The point is to skip trees representing missing data, so it would be easier to follow if we used the word "missing" rather than adding another more-or-less equivalent term.

I guess the most consistent definition would be that the tree contains no non isolated samples or something?

jeromekelleher · 2023-01-08T20:25:15Z

There is definitely an argument for keeping it simple and saying a tree is "missing" if it has zero edges, though.

hyanwong · 2023-01-08T20:31:54Z

It seems to me that what we're really interested in here is missingness, so why don't we call it is_missing?

Yes, but I would be worried that a missing tree and missing data might get confused. Missing data is really a property of sites, whereas you can have missing tree topology even in a tree sequence with no mutations. So I think we need to tread carefully here. Maybe something to ask for opinions on during a tskit.dev meeting?

hyanwong · 2023-01-10T13:35:35Z

We decided that it's too soon to implement this since it's just as easy to do tree.num_edges == 0 or tree.total_branch_length==0

hyanwong added a commit to hyanwong/tskit that referenced this issue Dec 23, 2022

Implement is_empty method

ec7378c

Fixes tskit-dev#2640

hyanwong mentioned this issue Dec 23, 2022

Implement is_empty method #2668

Closed

3 tasks

hyanwong added a commit to hyanwong/tskit that referenced this issue Dec 23, 2022

Implement is_empty method

ece21ff

Fixes tskit-dev#2640

jeromekelleher mentioned this issue Jan 8, 2023

Method or attribute for start and end of tree sequence data #2671

Closed

hyanwong closed this as completed Jan 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Tree.is_empty method #2640

Tree.is_empty method #2640

hyanwong commented Nov 18, 2022 •

edited

Loading

hyanwong commented Dec 23, 2022

Uh oh!

jeromekelleher commented Jan 8, 2023

Uh oh!

hyanwong commented Jan 8, 2023 •

edited

Loading

Uh oh!

hyanwong commented Jan 8, 2023

Uh oh!

jeromekelleher commented Jan 8, 2023 •

edited

Loading

Uh oh!

hyanwong commented Jan 8, 2023

Uh oh!

jeromekelleher commented Jan 8, 2023

Uh oh!

jeromekelleher commented Jan 8, 2023

Uh oh!

hyanwong commented Jan 8, 2023 •

edited

Loading

Uh oh!

hyanwong commented Jan 10, 2023

Uh oh!

Tree.is_empty method #2640

Tree.is_empty method #2640

Comments

hyanwong commented Nov 18, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

hyanwong commented Dec 23, 2022

Uh oh!

jeromekelleher commented Jan 8, 2023

Uh oh!

hyanwong commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyanwong commented Jan 8, 2023

Uh oh!

jeromekelleher commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyanwong commented Jan 8, 2023

Uh oh!

jeromekelleher commented Jan 8, 2023

Uh oh!

jeromekelleher commented Jan 8, 2023

Uh oh!

hyanwong commented Jan 8, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hyanwong commented Jan 10, 2023

Uh oh!

hyanwong commented Nov 18, 2022 •

edited

Loading

hyanwong commented Jan 8, 2023 •

edited

Loading

jeromekelleher commented Jan 8, 2023 •

edited

Loading

hyanwong commented Jan 8, 2023 •

edited

Loading