Multi Repository Github Parallel Crawler

The main objective of this script is to crawl multiple Github repositories using etcr crawler in a parallel fashion and save the output in a postgresql database.

Current compatibility Status:

✅ Mac Silicon
❓ Mac Intel
❓ Ubuntu
❓ Windows

Contribution Requested: This script has only been tested on Mac silicon. If you are on Ubuntu, Apple intel, or Windows machine please try this tool and let me know if it works. Any PRs about compatibility is welcome.

requirements

Docker on your machine and you should be good!

How to run

Clone this repository into your machine.
Copy .env.sample and rename it to .env. You can update the setting according to your favorable output.
Copy repo_list.env.sample and rename it to repo_list.env. Make sure to put repositories and and multiple tokens (if you have) inside this file. Each token will be used for one project and the assignment is random.
Make sure run_for_repos.sh can be executed. If not, make it executable: sudo chmod +x run_for_repos.sh
Run ./run_for_repos.sh inside the cloned folder.

NOTE: Only modifying the repo_list.env is necessary before running the code. The .env file settings, can be used off-the-shelf.

.env file setting

db_host: Name of the container postgresql database or the host address of a remote database db_port: Database port that you want to connect to db_user: Database username db_pass: Database password repositories: A list of repositories that you want to crawl. The format is "owner1/repo1,owner2/repo2,owner3,repo3" token: You github token pool. You can create one (here)[https://github.com/settings/tokens?type=beta]. db_folder: Where you want your database information get stored in your machine. data_dump_folder: In case you want to export/import databases, this will be the folder you mount from your machine.

repo_list.env setting

repositories: A list of repositories that you want to crawl. The format is "owner1/repo1,owner2/repo2,owner3,repo3" token: You github token pool. You can create one (here)[https://github.com/settings/tokens?type=beta].

Questions/Issues?

Feel free to create an issue if you faced any issues.

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
scripts		scripts
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
db.docker-compose.yml		db.docker-compose.yml
docker-compose.yml		docker-compose.yml
repo_list.env.sample		repo_list.env.sample
run_for_repos.ps		run_for_repos.ps
run_for_repos.sh		run_for_repos.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Multi Repository Github Parallel Crawler

Current compatibility Status:

requirements

How to run

.env file setting

repo_list.env setting

Questions/Issues?

About

Uh oh!

Releases

Packages

Languages

inspiring71/multi-repo-crawler

Folders and files

Latest commit

History

Repository files navigation

Multi Repository Github Parallel Crawler

Current compatibility Status:

requirements

How to run

.env file setting

repo_list.env setting

Questions/Issues?

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages