A simple library for extracting text from any PDF in Python x AWS.
pip install textasaurusGet an API key from the textasaurus API
TEXTASAURUS_API_KEY=Your_API_KEYRun single file
textasaurus your_file.pdfRun file directory
textasaurus your_files/Import in Python
from textasaurus import Textasaurus
dino = Textasaurus('YOUR_API_KEY')
dino.analyze('my_file.pdf')from textasaurus import Textasaurus
dino = Textasaurus('YOUR_API_KEY')
dino.analyze('my_files/')Extract raw text from your PDFs for data analysis or machine learning model training
Skip the frusturation of dealing with the current Python libraries for working with PDFs in Python.

