Skip to content

Incorrect handling of product versions from OSV source #3201

Open
@gluesmith2021

Description

@gluesmith2021

I found two issues with version numbers handling in OSV source parsing. As they are somewhat related, I decided to put them in a single issue here:

  • bug: duplicated ranges instead of multiple ranges
    • DB is populated with identical rows, whereas different affected version ranges are expected for them
  • probably incorrect: using "ecosystem" version for the product version.
    • Current behavior: ecosystem "versions" (that can be any string as well) show up in the database instead of product versions, and this can be misleading whenever a product version has to be checked against affected version ranges.
    • Expected behavior: unreliable versions or "non-version" strings do not appear in cve_range table for a product (perhaps not inserting the row at all). Even if the product is the ecosystem itself, "version" strings would then make sense but still be unusable programmatically.

Duplicated Ranges

According to OSV format, affected[].ranges[].events does not restrict child fields to be unique. For instance, there can be many "introduced" and "fixed" objects, as in https://osv-vulnerabilities.storage.googleapis.com/Android/ASB-A-219942275.json

However, event loop in actual code reuses the same affected object for each introduced/fixed occurrence. When parsing the above CVE, this same object is appended four times to the affected_data list. Then, when the DB is populated, it has four identical rows, all with the last version range encountered, instead of each version ranges.

A new object should be used each time instead.

ECOSYSTEM Versions

Many versions numbers should probably not be parsed to start with, because they are ecosystem versions, but that's open to interpretation.

According to the above OSV format again, if affected[].ranges[].type is ECOSYSTEM:

The versions introduced and fixed are arbitrary, uninterpreted strings specific to the package ecosystem, which does not conform to SemVer 2.0’s version ordering.

Basically, this is probably not the product version, and it can be anything, like a name. From the linked OSV sample, 12L:2022-09-01 is pushed to the DB as if it was a version of expat product.

So, in the same function as above, should such events be ignored as they are for "GIT" event type? affected[].versions would then be used instead.

But here's another problem: affected[].versions can be anything as well and is not even required to be specified when "ECOSYSTEM" type is used. Furthermore, as the documentations says:

The infrastructure and tooling provided by https://osv.dev also provides automation for auto-populating the versions list based on supported ECOSYSTEM ranges as part of the ingestion process.

In short, for "ECOSYSTEM" type, affected[].versions can be just as bad as affected[].ranges[].type to extract product version numbers. It might be best to ignore them rather than include them in a "best effort".

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workinghelp wantedExtra attention is needed

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions