Skip to content

Add option to compute_mutation_times to sample times uniformly along the edge. #849

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
benjeffery opened this issue Sep 11, 2020 · 7 comments
Labels
C API Issue is about the C API enhancement New feature or request

Comments

@benjeffery
Copy link
Member

benjeffery commented Sep 11, 2020

When compute_mutation_times was added we discussed randomly assigning times rather than the current method of using the midpoint (#513 (comment)). We decided to leave the source of randomness as an input to avoid setting up an MIT-license compatible RNG in tskit. However, since then we also have discussed the need of randomness for splitting polytomies (#815 (comment)) and it seems that https://www.pcg-random.org/download.html may fit the bill. I suggest we attempt to add an RNG to tskit for mutation times and see how it goes. Any thoughts? I'd be happy to take this on if there is consensus.

This came up in tskit-dev/msprime#977

@brianzhang01
Copy link
Member

Just caught up on the RNG discussion here on GPL vs. MIT. As long as the uniform sampling is implemented in tskit or msprime, I'd be happy, and I am sympathetic to the argument that randomness shouldn't be too prominent in tskit. Given a mutation with unknown time, a user should be able to do a few simple calls through the Python API to get the bottom and top nodes and do the sampling themselves if they really wanted to. @petrelharp

In other words, I'm a bit confused what the compute_mutation_times function is supposed to offer in the first place. Since tree sequences are typically generated in msprime or read in from file, I would think that any relevant mutation information should already be present when the tree sequence is instantiated.

@brianzhang01
Copy link
Member

Also, what is the current behavior for a mutation above a node with no parents? And what do we propose we change it to under pseudorandom uniform sampling?

@brianzhang01
Copy link
Member

brianzhang01 commented Sep 11, 2020

Here's an example of computing these things using the Python API and making a histogram of all mutation times. It's relatively efficient.

def get_mutation_times(ts, random_seed=None):
    if random_seed is not None:
        np.random.seed(random_seed)
    mutation_info = [[ts.site(m.site).position, m.node] for m in ts.mutations()]
    # mutation_info.sort()  # it's already sorted so we don't need to sort
    mutation_times = []
    mutation_index = 0
    position, node = mutation_info[mutation_index]
    for tree in ts.trees():
        while mutation_index < len(mutation_info) and tree.interval[0] <= position and position < tree.interval[1]:
            # process this mutation
            parent = tree.parent(node)
            if parent == tskit.NULL:
                raise RuntimeError("This case not supported")
            mutation_times.append(np.random.uniform(tree.time(node), tree.time(parent)))
            mutation_index += 1
            if mutation_index < len(mutation_info):
                position, node = mutation_info[mutation_index]

    assert len(mutation_times) == len(mutation_info)
    return mutation_times

mutation_times = get_mutation_times(my_ts)
interval_bins = np.linspace(0, 1e5, 50)
histogram = np.histogram(mutation_times, interval_bins, density=False)[0]

@petrelharp
Copy link
Contributor

And what do we propose we change it to under pseudorandom uniform sampling?

I think the only thing we can do is to put the times equal to the time of the node below them.

@jeromekelleher
Copy link
Member

I agree that we'll probably need randomness in tskit at some point and using the RNG that you mention above is probably the way to go. I'm with @brianzhang01 and the moment though, in wondering whether this is something we want to take on right now. I'm happy either way though. @petrelharp, do you see a use for this random version of compute_mutation_times?

@petrelharp
Copy link
Contributor

No, I'm with @brianzhang01 - if someone needs to do this, the python version is easy and efficient.

@benjeffery
Copy link
Member Author

Cool, we'll shelve this for now then.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C API Issue is about the C API enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants