Allow bot changes to be indexed #5617
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Closes #5700
Currently bot edits are being excluded from the regular ongoing Solr index updates. This PR removes the deliberate bot exclusion.
Bots are the main source of new imports into Open Library by a long way. Being excluded from index means that any newly added works or editions are not discoverable, and every new author added is created with an empty 0 works page
e.g. https://openlibrary.org/authors/OL9405192A/Edward_L._Parker (which my bot recently added from an achive.org record)
It should list the work https://openlibrary.org/works/OL24955945W , instead the author page looks like a junk, unlinked record. This appears to be the default behaviour of newly imported authors, and won't be resolved without manual reindexing. I though there have been issues raised about 0 work authors in the past, but I can't located one right now.
This is just one example of many, and this appears to be the current standard behaviour with the exculde bots flag set.
Technical
Since the Solr 8 update I have noticed manual edits being reflected pretty promptly (less than the previously stated 15mins, although I wasn't timing accurately) -- it seems noticeably faster and better than the past, so Solr 8 has been a very good improvement. I hope the load from bot edits won't cause a preformance problem. OL needs to keep current with more books though, so it needs to be able to index its content, which the bots are providing.
There are also multiple clean up and fix tasks performed by bots that aren't being reflected in the search index (I noticed this problem after merging 10K editions and works from a librarian request bot task) -- the changes were written but there was no obvious effect.
As far as performance goes, I manually added the 10k to reindex in successive batches of 1000 to the admin interface (copy and pasting through the web UI) and the whole lot were picked up successfully and the expect index update occurred promptly, so I think solr can handle large batches like this as they happen.
Testing
Screenshot
Stakeholders
@cdrini
@mekarpeles