Skip to content

Latest commit

 

History

History
67 lines (57 loc) · 3.23 KB

File metadata and controls

67 lines (57 loc) · 3.23 KB

Bookmarks tagged [web-content-extracting]

https://github.com/Alir3z4/html2text

Convert HTML to Markdown-formatted text.


https://github.com/michaelhelmick/lassie

Web Content Retrieval for Humans.


https://github.com/coleifer/micawber

A small library for extracting rich content from URLs.


https://github.com/codelucas/newspaper

News extraction, article extraction and content curation in Python.


https://github.com/buriy/python-readability

Fast Python port of arc90's readability tool.


https://github.com/kennethreitz/requests-html

Pythonic HTML Parsing for Humans.


https://github.com/miso-belica/sumy

A module for automatic summarization of text documents and HTML pages.


https://github.com/deanmalmgren/textract

Extract text from any document, Word, PowerPoint, PDFs, etc.


https://github.com/gaojiuli/toapi

Every web site provides APIs.