-
Notifications
You must be signed in to change notification settings - Fork 378
How can I disable cache completely? #369
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hello and thank you for your interest in Crawlee! This seems closely related to #351. Could you please re-check that you get an empty string if you run this after removing the storage directory? I can imagine getting an empty string on a second run without deleting the storage (because of both |
After some debugging I found a workaround to avoid re-using the cache. Basically we have to ensure that each time the crawler runs it uses a different request queue, e.g. like this:
It would be great if we could actually disable caching at all but this works for now. |
Thanks. If it helps, I found out during debugging that the problem seems to be in that there’s the same instance of |
Yes, that is the case. We're carrying a lot o historical baggage here and maybe this mechanism won't even be necessary in the end. Until then, I'm happy that you found a workaround. |
Has anyone found any solutions instead of just changing the |
I don't know of anything, but we're refactoring the storage code so that this can work as expected out of the box. #1107 is the first step towards this. |
I am trying to write a simple function to crawl a website and I don't want crawlee to cache anything (each time I call this function it will do everything from scratch).
here is my attempt so far, I tried with
persist_storage=False
andpurge_on_start=True
in the configuration, and with removing the storage directory entirely, but I keep getting either a concatenated result of all the requests or and empty result in case I delete the storage directory.also is there a way to simple get the result of the crawl as a string, and not use
Dataset
?any help is appreciated 🤗 thank you in advance !
The text was updated successfully, but these errors were encountered: