Skip to content

Provide feedback or adaptive epsilon for split polytomies? #1088

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
hyanwong opened this issue Dec 13, 2020 · 11 comments
Closed

Provide feedback or adaptive epsilon for split polytomies? #1088

hyanwong opened this issue Dec 13, 2020 · 11 comments

Comments

@hyanwong
Copy link
Member

hyanwong commented Dec 13, 2020

We deliberately removed both the feedback about what epsilon value to used, and the ability to calculate a sensible epsilon value in split_polytomies, but that's now wasted a day of compute time because my simulations all aborted, firstly with the default, and again when I set epsilon to 1e-20, both with the somewhat unhelpful error message:

_tskit.LibraryError: ('time[parent] must be greater than time[child]', 'Epsilon=1e-20 not small enough to create new nodes below a polytomy, due to the time of one of child nodes being too close. ')

since I need to split polytomies before saving results. I'm now trying with 1e-40, but that might not work either (I guess this is because of trying to split polytomies in nodes produced via tsinfer path compression, which are placed by some small epsilon above their child nodes). So having an adaptive epsilon seems like a good plan here. Or if not that, at least suggesting what a possible working value of epsilon should be. I previously made some suggestions how to do this here.

@hyanwong hyanwong changed the title Adaptive epsilon for split polytomies? Feedback or adaptive epsilon for split polytomies? Dec 13, 2020
@hyanwong hyanwong changed the title Feedback or adaptive epsilon for split polytomies? Provide feedback or adaptive epsilon for split polytomies? Dec 13, 2020
@hyanwong
Copy link
Member Author

I'm currently still failing on epsilon = 1e-60, which is weird because PC_ANCESTOR_INCREMENT (1.0 / (1LL << 32)) which is ~2.33e-10. I'm assuming I'll hit floating point problems if I go beyond epsilon = 1e-127. These all seem far too extreme values, so I'm wondering if there's something weird about the (inferred) TS whose trees I'm trying to split.

@hyanwong
Copy link
Member Author

Attached is the 1-tree sequence, which indeed has a minimum time between nodes of ~2.33e-10, but which still fails to split_polytomies with epsilon=1e-100:

import tskit
ts = tskit.load("tmp.trees")
tree = ts.first()
time_diffs = np.array([ts.node(tree.parent(n)).time - ts.node(n).time for n in ts.first().nodes() if tree.parent(n) != tskit.NULL])
print(min(time_diffs))  # gives 2.3283064365386963e-10
tree.split_polytomies(epsilon=1e-100)  # Fails with "Epsilon=1e-100 not small enough to create new nodes below a polytomy"

tmp.trees.zip

@jeromekelleher
Copy link
Member

Or if not that, at least suggesting what a possible working value of epsilon should be. I previously made some suggestions how to do this here.

I don't think suggesting a value will make any difference here, since the suggested value would only work in the specific case that has just failed. In situations like this, it's very likely that it would fail again soon afterwards with a slightly smaller value. So, suggested values would just be frustrating, IMO.

Adaptive is tricky. If the values need to be this small, then I'm not sure it would work anyway, as we'd likely hit DBL_EPS. I'll take a look at the example, this seems weird.

@hyanwong
Copy link
Member Author

I have a PR (about to put in) that collects the bad nodes and then at the end reports the worst, with a recommended epsilon. I think that should allay your first point.

Thanks for looking into this. I was about to do so once I figured out how to best send informative error messages (i.e. the PR), so I could debug easily.

@hyanwong
Copy link
Member Author

PR now at #1089, which might help to debug the case above too.

@hyanwong
Copy link
Member Author

OK, so this is weird. My code is indeed useful:

import tskit
ts = tskit.load("~/tmp.trees")
ts.first().split_polytomies(random_seed=1)
# New code gives: ValueError: Child node 2153 of polytomy at node 2158 too close for epsilon=1e-10; try epsilon=5.820766091346741e-11
ts.first().split_polytomies(random_seed=1, epsilon = 5.820766091346741e-11)  # Works - hurrah!
ts.first().split_polytomies(random_seed=1, epsilon = 1e-20)  # Fails - WTF? 

@benjeffery
Copy link
Member

benjeffery commented Dec 14, 2020

I think 1+1e-20 == 1 right? I thought DBL_EPSILON was around 10^-16? So your last line failure is because the subtraction time-epsilon is a no-op.

@hyanwong
Copy link
Member Author

Ah, this might indeed be the problem. So we can't have differences between nodes that are smaller than some value, where that value is relative to the size of the node time, somewhere around 1e-16 times smaller?

@hyanwong
Copy link
Member Author

This is likely to catch out people like me, I think. We should warn about it somewhere. Perhaps when we fail to make the TS event after checking for minimum epsilon times in my PR #1089

@jeromekelleher
Copy link
Member

I'm working up a PR based on using nextafter and giving good errors @hyanwong. There's a few problems with your approach, it's simpler to code it up.

@hyanwong
Copy link
Member Author

hyanwong commented Dec 14, 2020

Sure. I would be careful about using nextafter though, as we might want to leave space to insert even more nodes in between the split ones, for instance if we have methods to do path compression an an existing TS. Some sort of even-spacing might be better (or at least, giving the option for even spacing. I would have thought that we can simply calculate the number of extra nodes needed, and for each polytomy find epsilon = (closest child->parent distance)/num_new_nodes - that will be the most conservative value for epsilon needed only when we resolve into a ladder tree.

jeromekelleher added a commit to jeromekelleher/tskit that referenced this issue Dec 14, 2020
@mergify mergify bot closed this as completed in 7d07b4a Dec 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants