Skip to content

Option for disabling document text method #25

@m0dd0

Description

@m0dd0

First, thanks for this very helpful library!

For many of the papers I read your algorithm works fine and finds the correct doi.
But as you already mention in the README, for some papers the used document_text method results in a wrong doi as the doi of other papers appear first.
Unfortunately this is very often the case for papers of certain conferences I read often as they contain arxiv IDs in the references and do not contain their own doi anywhere else in the text. At the same time, when I comment out the document_text method, I get pretty good results with the fourth method.
I am wondering if one of the following features might help to reduce these type of errors:

  • only using the first pages to look for doi in text
  • having an option to disable certain steps in the search process
  • being able to customize the order of the search methods

Do you think one of these options (or smth else) is something which the library would benefit from and can be implemented with a reasonable effort? If so, I can see if I find the time to turn my current "comment-out-workaround" into a mergable feature.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions