Over the years Web Scraping has become a personal hobby, a kind of challenge to practice my skills. Most of the projects done in this period were not distributed to the general public, so I decided to organize and publish them here on GitHub and the data on Kaggle.
The interest in Data Science encouraged me to use Web Scraping to analyze some data I was interested in, such as games and anime.
This repository will contain the code used for the data distributed in Kaggle, and also a step-by-step explanation of the process. Have fun with me as I venture into various sites with unstructured data.
Disclaimer: This repository is a personal project distributed under an MIT license to practice Web Scraping, distributing free data for people to do exploratory data analysis. I do not recommend using it for other purposes. Use at your own risk.
I exclusively use Python and some of its packages, like:
- BeautifulSoup
- Requests
- CloudScraper
Remember, respect the request limit of the site to not cause any harm.
You can recommend me any site to be part of this project, just send me an e-mail with the site and the reason to be part of this repository.
Below are all the projects I have done with the links. I hope you have a lot of fun.
| projects | category | github | kaggle | |
|---|---|---|---|---|
| 01 | anime-planet | comics | Link | Link |
| 02 | tapas | comics | Link | Link |
| 03 | toomics | comics | Link | Link |
| 04 | jmlr | articles | Link | Link |
| 05 | webtoons | comics | Link | Link |
| 06 | afk-arena | games | Link | |
| 07 | arknights | games | Link | Link |
| 08 | justwatch | streamings | Link | Multiple Links¹ |
| 09 | funko pop | collectibles | Link | Link |
| 10 | a24 | movies | ||
| 11 | ||||
| 12 | ||||
| 13 | ||||
| 14 |
Ref. 1: Each streaming contains a link. Below is a list of all the streamings links:
| streamings | kaggle |
|---|---|
| hbo max | Link |
| hulu | Link |
| netflix | Link |
| amazon prime | Link |
| paramount | Link |
| disney+ | Link |
| crunchyroll | Link |
| dark matter | Link |
| rakuten viki | Link |
Copyright (c) 2022 Victor Soeiro
This project is licensed under the MIT License
If you have any questions or suggestions, send me an email to victor.soeiro.araujo@gmail.com