Skip to content

Rounding error in sample_work.py #524

@siwhitehouse

Description

@siwhitehouse

The logic in https://github.com/pwyf/aid-transparency-tracker/blob/original-version/iatidq/sample_work/sample_work.py can cause a rounding error which leads to n-1 files being sampled, where n is the total number of activity files an organisation publishes and n<20.

The code in question is repeated below:


        total = int(sum([x.results_data / 100. * x.results_num
                         for x in ag_results]))
        if total <= num_samples:
            indexes = range(total)
        else:
            indexes = sorted(random.sample(range(total), num_samples))

The above code attempts to works out how many files to sample from the aggregate results table in the database. It does this by looking at the fraction of results that pass the test (x.results_data/100) and multiplies it by the total number of results (x.results_num). It then adds up all of the rows in the table for that test and organisation and finally it casts this to an integer.

It's the last bit that is causing the problem. int(x) in Python 2.7 ignores everything after the decimal point so when x.results_data / 100. * x.results_num doesn't equal a whole number (for values of x.results_num such as 3, 7, 11, 13 etc.) then casting to an int causes the rounding error.

This can be 'fixed' by changing 'int' to 'round' in the code as this will always round up in these cases.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions