-
Notifications
You must be signed in to change notification settings - Fork 77
Randomly resolve polytomies #809
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Cool, @hyanwong! I assume we will want a method like this in tskit at some point. One thing I wanted to flag - adding new nodes may invalidate the mutation time requirements if the tree sequence has them, for example if a mutation is just above one of the polytomies to be broken then it would need to be moved above the oldest node in the new set that replaces the polytomy. Clearly an edge-case but thought I should flag! |
That's a good point @benjeffery, thanks. Actually, I break the polytomy below the focal node, so that the polytomy node does not change time: it's mutations on any nodes below the polytomy which will need checking. I think I can see how to do this: at the moment I set the delta time ( What do we do if the time differences are so small that they start to encounter floating point accuracy errors. Or is this so unlikely that we don't care? |
I think we detect the situation and error out. Only other option is some horrible recursive shuffling right? |
Yes, I guess so. The question is whether we actively look for this and raise a specific error, or just expect it to bomb out (with e.g. time[parent] <= time[child]) when we try to convert to a tree sequence. Shall I actually work this up into a PR, if you think it's a useful extra method on a tree_sequence? I guess , following other examples, I could make it an in-place method on a TableCollection, and then create a idempotent (is that the right word?) version for a tree sequence. |
Hi Yan Wong,
This looks great! Personally, I would prefer to raise a specific error
rather than time[parent]<=time[child] which is less informative from a
user's perspective.
Haven't got a chance to try it out myself but I think this feature would be
useful for many others. Wonder what other people in the tskit group think
about this feature.
btw, how to access the tskit in which you can supply the include_terminal
param to edge_diff? It's not in the latest release, right?
Best,
Yilei
…On Fri, Aug 28, 2020 at 4:25 AM Yan Wong ***@***.***> wrote:
Yes, I guess so. The question is whether we actively look for this and
raise a specific error, or just expect it to bomb out (with e.g.
time[parent] <= time[child]) when we try to convert to a tree sequence.
Shall I actually work this up into a PR, if you think it's a useful extra
method on a tree_sequence? I guess , following other examples, I could make
it an in-place method on a TableCollection, and then create a idempotent
(is that the right work) version for a tree sequence.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#809 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ALNWTMEUUCFZHRW4FGYFWKLSC5SWNANCNFSM4QNL6YRA>
.
|
Hi @hyl317 - to try it out, until #787 is merged you'll need to install my branch directly, e.g.
Then try out the code. ISWYM about the specific error, it's just a bit more work to do properly! |
Hi @hyl317 and @awohns - you can now test this using a single install via the PR I just made. The name has changed (for the time being) to
Simply call like
|
I'd be happy to help review code for this. |
Great, thanks @brianzhang01 - that would be really useful. The PR is at #815 but I don't know if we want to implement out own PRNG, so that we can have an equivalent C function. |
Just to note that the opposite: collapse edges into polytomies, only retaining those supported by mutations, is at #2926 |
Uh oh!
There was an error while loading. Please reload this page.
I've finally got some working code to randomly resolve polytomies in a tree sequence. It seems to work for the admittedly small sample of inferred trees that I have tried. It tries to be clever by not resolving per tree, but per polytomy (hence if an identical polytomy spans several trees, it will only resolve the polytomy once, creating fewer edges than a per-tree approch, which should mean it scales to larger sample sizes better). I'm posting it here so that:
a. I don't lose it (!)
b. We can decide if we want something like this in the base
tskit
library, as a method on a tree sequence.c. @hyl317 can use it if he wants, for ARGweaver compatibility (although his current solution may be faster)
Obviously it needs a fair bit of tidying up. It's also slow, but there's also considerable scope for optimization, I think. It requires the PR I made at #787
The code can be tested using something like this:
The text was updated successfully, but these errors were encountered: