-
Notifications
You must be signed in to change notification settings - Fork 77
Put intervals.py into tskit #2636
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Note that some HapMap files omit the first column with the chromosome name (e.g. https://ftp.ncbi.nlm.nih.gov/hapmap/recombination/latest/rates/). I wonder if in |
Codecov Report
@@ Coverage Diff @@
## main #2636 +/- ##
==========================================
+ Coverage 93.90% 93.95% +0.04%
==========================================
Files 27 28 +1
Lines 27655 27872 +217
Branches 1269 1302 +33
==========================================
+ Hits 25969 26186 +217
Misses 1648 1648
Partials 38 38
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
This will be needed for the more flexible tskit HMM matching algorithm. |
Hmm, codecov is increasingly flakey, have triggered a re-run. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's more uncovered here than I'd like. I know it will be covered by the msprime tests eventually, but it should be covered here.
Thanks for looking at this. I agree about the coverage (I was waiting a bit to see what CI picked up). I'll add tests later in the week and ping you when done @benjeffery. |
Oh - before I add more tests, just to check that my general approach in this PR is a sensible one, @benjeffery? |
Yeah, seems good to me! |
As we are relicensing this code from GPL to MIT, we need to look at the git history and make sure that everyone who has contributed agrees to this change. Looking at the file in the msprime code:
So, we have five contributors, @jeromekelleher, @hyanwong, @petrelharp @grahamgower and @awohns. Can each of these contributors please confirm that they wish to relicense this code from GPLv3 to MIT? |
I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT. |
Oh, well spotted.
Yes, I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT. |
I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT. |
Me too. I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT. |
Coverage looks OK now. Just waiting to the the remaining approval from @petrelharp |
I also took the liberty of modifying |
I notice that there's a "to do" in the deprecated def physical_to_genetic(self, x):
return self.map.get_cumulative_mass(x)
def genetic_to_physical(self, genetic_x):
if self.map.total_mass == 0:
# If we have a zero recombination rate throughout then everything
# except L maps to 0.
return self.get_sequence_length() if genetic_x > 0 else 0
if genetic_x == 0:
return self.map.position[0]
# TODO refactor this to this to use get_cumulative_mass() function / add the
# corresponding high-level function to the rate map.
index = np.searchsorted(self.map._cumulative_mass, genetic_x) - 1
y = (
self.map.position[index]
+ (genetic_x - self.map._cumulative_mass[index]) / self.map.rate[index]
)
return y I guess we want to implement a class RateMap:
...
def cumulative_mass_lookup(self, cumulative_mass):
"""
Returns a set of genomic positions corresponding to the passed-in
cumulative mass values. This is the inverse of :meth:`get_cumulative_mass`
.. note:: If the RateMap is a "genetic map" measuring the
rate of recombination along a chromosome, this function will return
the chromosomal position (i.e. the physical distance) for the provided
genetic map values.
:param numpy.ndarray cumulative_mass: The values for which to return genomic
positions.
:return: An array of positions along the genome, the same length as
``cumulative_mass``
:rtype: numpy.ndarray
"""
if self.total_mass == 0:
# If we have a zero rate throughout then anything above 0 maps to L.
return self.sequence_length if cumulative_mass > 0 else 0
if cumulative_mass == 0:
return 0
map_c_mass = self.get_cumulative_mass()
i = np.searchsorted(map_c_mass, cumulative_mass) - 1
positions = self.position[i] + (cumulative_mass - map_c_mass[i]) / self.rate[i]
return x |
I approve the license change! |
I don't think we care about the We want to keep the code provenance clear here, so let's do the minimum necessary to port across in this PR. Can you update the commit message that copies in the intervals.py code to include the agreements from each of the contributors please @hyanwong? Ideally we'd have a commit that copies it in directly with no changes, so that the updates are obvious later if/when someone does a code review for license violations. We want to be completely clear about licensing issues. |
To match the msprime `text_table` function
Allows it to be used by other tabular outputting routines. Also put the "limit" functionality into a single util function
The authors of this module and associated tests have agreed to release the code originally developed for the msprime package under the MIT licence (see tskit-dev#2636). Specifically: @jeromekelleher: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @hyanwong: "I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT." @awohns: "I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @grahamgower: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @petrelharp: "I approve the license change!"
d0dd629
to
d83900d
Compare
Done - commit f246afe is the port, with associated approvals. The other commits are just to get this working in tskit. I will PR the other changes separately. |
The authors of this module and associated tests have agreed to release the code originally developed for the msprime package under the MIT licence (see tskit-dev#2636). Specifically: @jeromekelleher: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @hyanwong: "I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT." @awohns: "I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @grahamgower: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @petrelharp: "I approve the license change!"
Rework a little to use the built-in tskit unicode and html output functions
@@ -0,0 +1,600 @@ | |||
# | |||
# MIT License | |||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to keep the original copyright notices I'm afraid.
I think it might be easier to start this with a new branch:
- first copy in the msprime code as-is, with no changes. Make a commit, with the message including text from contributors like we have
- Next update the utils code as required, and commit
- Then make the changes to intervals.py and tests needed to bring them into tskit, and commit. This includes new copyright headers in addition to the existing ones.
Sorry I know this is tedious, but licencing issues always are and we really don't want there to be any grey areas with it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, with a bit of cherry picking from these commits and rebasing you should be able to get to this setup reasonably easily. I can do it if you like.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree about avoiding grey areas. I think we need to copy the tests as well as the intervals.py file. I was hoping to do this so that the intermediate commit still passed tests, but I guess you are saying we don't need to worry about that.
How do we make it clear in the phrasing that the code is under MIT and GPL? Also, if all contributors agree to the change, surely we can re-release just under MIT (although ISWYM about the copyright rather than the licence)? But I guess you've looked into this a lot and there's some reason we have to jointly keep GPL too, @jeromekelleher
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, if you are happy to do it, and it's easy, go ahead @jeromekelleher . That way it'll be sure to be done as you want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave it with me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks
The authors of this module and associated tests have agreed to release the code originally developed for the msprime package under the MIT licence (see tskit-dev#2636). Specifically: @jeromekelleher: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @hyanwong: "I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT." @awohns: "I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @grahamgower: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @petrelharp: "I approve the license change!"
Superseded by #2678 |
The authors of this module and associated tests have agreed to release the code originally developed for the msprime package under the MIT licence (see tskit-dev#2636). Specifically: @jeromekelleher: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @hyanwong: "I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT." @awohns: "I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @grahamgower: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @petrelharp: "I approve the license change!"
The authors of this module and associated tests have agreed to release the code originally developed for the msprime package under the MIT licence (see tskit-dev#2636). Specifically: @jeromekelleher: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @hyanwong: "I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT." @awohns: "I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @grahamgower: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @petrelharp: "I approve the license change!"
The authors of this module and associated tests have agreed to release the code originally developed for the msprime package under the MIT licence (see #2636). Specifically: @jeromekelleher: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @hyanwong: "I am happy to relicense all of my contributions to this code (originally developed in msprime) from GPLv3 to MIT." @awohns: "I also approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @grahamgower: "I approve the relicensing of this code originally developed in msprime from GPLv3 to MIT." @petrelharp: "I approve the license change!"
Description
As noted in tskit-dev/tsinfer#753 (comment). Some reworking was needed to use the built-in tskit table formatting functions when outputting the RateMap to unicode or html. Once a stable tskit release with this is released, we can remove it from msprime and simply treat
mprime.RateMap
as an alias fortskit.RateMap