-
Notifications
You must be signed in to change notification settings - Fork 77
Tree.is_empty method #2640
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmm, my logic above is flawed: we can't simply count the number or roots and samples, because a tree could have a single edge above a each sample (not empty, but same number of roots as samples). Similarly, we can't simply look to see if the roots are all samples, as they could have edges descending from the samples, leading nowhere. So I think we are forced to be longwinded here and iterate through each root checking that it has no children: def is_empty(self, check_roots=None) -> bool:
"""
Check if this tree is "empty" (i.e. has no topology). A tree is empty if it has
no edges. However, it may also be considered empty if it contains edges which
represent :ref:`dead branches<sec_data_model_tree_dead_leaves_and_branches>`
(i.e. not reachable from the :meth:`~Tree.roots` of the tree). To consider such
a tree as empty too, which is more involved, specify ``check_roots=True``.
Note that this is purely a property of the topology. An "empty" tree can still
contain sites and there may even be mutations on those sites.
:param bool check_roots: Should we also consider a tree empty if it has
topology but the topology is unconnected to any of the roots of the tree?
Default: ``None`` treated as ``False``.
:return: ``True`` if this tree is empty, ``False`` otherwise.
"""
if self.num_edges == 0:
return True
if not check_roots:
return False
# Exhaustively check the roots: it's not simply enough to check that the roots
# are all samples, as they could still have children
for u in self.roots:
if self.num_children(u) != 0:
return False
return True |
I don't understand why we're worried about the I think it would be simpler to just define a tree as empty if it has zero edges. We can come up with another word to describe a tree that only contains dead branches, if we need it. |
ISTR that the main "reason" was to document the various definitions of empty for the user. But we could indeed just define an empty tree as having no edges at all, and document somewhere else that trees can "appear" empty (e.g. when plotting or when traversing their nodes in the normal way), but may not be strictly empty by the formal definition. I guess the thought was that having the param forces people to think about what they mean by "empty" when they are looking at a tree. But this could be done by e.g. a |
Then we could also point out that an "empty" tree can still have e.g. sites defined in it, which is another weird edge case. |
Or just choose another word, like |
Hmm, not a bad idea. Would we therefore be happy with e.g. |
It seems to me that what we're really interested in here is missingness, so why don't we call it is_missing? The point is to skip trees representing missing data, so it would be easier to follow if we used the word "missing" rather than adding another more-or-less equivalent term. I guess the most consistent definition would be that the tree contains no non isolated samples or something? |
There is definitely an argument for keeping it simple and saying a tree is "missing" if it has zero edges, though. |
Yes, but I would be worried that a missing tree and missing data might get confused. Missing data is really a property of sites, whereas you can have missing tree topology even in a tree sequence with no mutations. So I think we need to tread carefully here. Maybe something to ask for opinions on during a tskit.dev meeting? |
We decided that it's too soon to implement this since it's just as easy to do |
Uh oh!
There was an error while loading. Please reload this page.
As discussed in #2600 (comment), it would be good to have an standard way of checking if a tree is "empty". However, there are 2 definitions of empty:
delete_intervals
e.g. when we simulate a certain region, or when we remove the flanking regions in tsinfer.root_threshold
parameter, such that isolated samples may not always be considered a root).I think that an
is_empty()
method would be a good place to document these two different meanings. So something like:The text was updated successfully, but these errors were encountered: