-
Notifications
You must be signed in to change notification settings - Fork 7
Open
Description
This query about SemOpenAlex works:
PREFIX dct: <http://purl.org/dc/terms/>
PREFIX Service: <http://www.metaphacts.com/ontologies/platform/service/>
PREFIX entitylookup: <http://www.metaphacts.com/ontologies/platform/service/entitylookup/>
SELECT * WHERE {
SERVICE Service:entityLookup {
?subject entitylookup:entityName "semopen";
entitylookup:limit 100 ;
entitylookup:score ?score;
entitylookup:rank ?rank.
}
?subject dct:title ?title
} ORDER BY DESC (?score) DESC (?rank) Returns 7 Works that are 3 "true" works, plus 4 variants/versions thereof:
| subject | rank | title | comment |
|---|---|---|---|
| https://semopenalex.org/work/W4388144113 | 90.0 | SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples | published |
| https://semopenalex.org/work/W4385682125 | 45.0 | SemOpenAlex: The Scientific Landscape in 26 Billion RDF Triples | arxiv preprint |
| https://semopenalex.org/work/W4393797504 | 38.0 | SemOpenAlex Embeddings | Zenodo "all versions" |
| https://semopenalex.org/work/W4393798967 | 38.0 | SemOpenAlex Embeddings | Zenodo new version |
| https://semopenalex.org/work/W4393895619 | 38.0 | SemOpenAlex Embeddings | Zenodo old version |
| https://semopenalex.org/work/W4393691335 | 36.0 | RDF Knowledge Graph SemOpenAlex-SemanticWeb | Zenodo "all versions" |
| https://semopenalex.org/work/W4393746966 | 35.0 | RDF Knowledge Graph SemOpenAlex-SemanticWeb | Zenodo only version |
If this pattern holds, then over half of all Works in SOA are duplicates.
The problem is especially galling for the last case: every Zenodo resource, even if it has no versions at all, is represented by two Zenodo DOIs and consequently twice in SOA.
I know that these problems come from OpenAlex but I don't know where to report them.
I also understand that deduplicating works is not easy, eg the arxiv preprint doesn't state the DOI of the published version.
But maybe at least you can remove the Zenodo "all versions" URL/DOI, and maybe all old versions?
Metadata
Metadata
Assignees
Labels
No labels