Skip to content

Doc Scraper removing old index on 2nd run #236

@gmourier

Description

@gmourier

Initially created by @munim
2 days ago

Dear team,

I am trying out Meilisearch and indexing our side using docs-scraper project from Meilisearch. It worked for me at some level but when I ran the scraper again with the same command, it cleaned all the items and started from scratch. Here's what I did:

  1. Created a Docker network and started Meilisearch with Docker:
$ docker run -it --rm \
    -p 7700:7700 \
    -e MEILI_MASTER_KEY='123'\
    -v $(pwd)/meili_data:/meili_data \
    --network="meilisearch-test-01" \
    getmeili/meilisearch:v0.28 \
    meilisearch --env="development"
  1. Created a scraper config file mentioned in the project README

  2. Started the scraper with the following command:

$ docker run -t --rm \
    -e MEILISEARCH_HOST_URL=http://exciting_banach:7700 \
    -e MEILISEARCH_API_KEY=123 \
    --network="meilisearch-test-01" \
    -v `pwd`/test-scraper.config.json:/docs-scraper/config.json \
    getmeili/docs-scraper:latest pipenv run ./docs_scraper config.json
  1. It took around 30 mins to scrap 50K pages.
  2. I rerun the scraper after making some changes to the config
  3. Now, I see all my previous entries from Meilisearch are removed and new entries are being added.

My question is: How can I update the entries rather than removing old entries and recreate again?

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions