Skip to content

Commit 6222ebc

Browse files
committed
improved documentationsv
1 parent fe636cb commit 6222ebc

File tree

2 files changed

+20
-10
lines changed

2 files changed

+20
-10
lines changed

data/README.md

Lines changed: 17 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,22 @@
11
# Data Folder
22

33

4-
- `huggingface_sort_by_createdAt_top996939.json.zip`: Compressed JSON file with metadata of all models from HuggingFace.
5-
6-
- `GH_data_safetensor.json`: GitHub PRs collected.
7-
- `SO_data_safetensor.json`: StackOverflow posts collected.
8-
- `GH_data_safetensor_sorted.csv`: GitHub PRs sorted by cosine similarity (descending).
9-
- `SO_data_safetensor_sorted.csv`: StackOverflow posts sorted by cosine similarity (descending).
10-
- `huggingface_sort_by_createdAt_top996939_commits_N_M.csv`: HuggingFace commit history for models N to M.
11-
- `huggingface_sort_by_createdAt_top996939_errors_N_M.csv`: Error logs for models N to M.
12-
- `huggingface_sort_by_createdAt_top996939_selected.json`: Selected models from HuggingFace to be analyzed based on our filtering criteria (see scripts/notebooks).
4+
- `huggingface_sort_by_createdAt_top996939.json.zip`:
5+
Compressed JSON file with metadata of all models from HuggingFace.
6+
- `huggingface_sort_by_createdAt_top996939_selected.json`:
7+
Selected models from HuggingFace to be analyzed based on our filtering criteria.
8+
- `GH_data_safetensor.json`:
9+
GitHub PRs collected.
10+
- `SO_data_safetensor.json`:
11+
StackOverflow posts collected.
12+
- `GH_data_safetensor_sorted.csv`:
13+
GitHub PRs sorted by cosine similarity (descending).
14+
- `SO_data_safetensor_sorted.csv`:
15+
StackOverflow posts sorted by cosine similarity (descending).
16+
- `huggingface_sort_by_createdAt_top996939_commits_N_M.csv`:
17+
HuggingFace commit history for models N to M.
18+
- `huggingface_sort_by_createdAt_top996939_errors_N_M.csv`:
19+
Error logs for models N to M.
20+
1321

1422

scripts/README.md

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -47,7 +47,7 @@ pip install .
4747
It will select model repositories and save the filtered list at `../data/huggingface_sort_by_createdAt_topN_selected.json`.
4848

4949
#### Step 3: Getting the commit history of the models
50-
- `get_models_history.py`: Script to get metadata of all models from HuggingFace.
50+
- `get_commit_logs.py`: Script to get metadata of all models from HuggingFace.
5151
It will produce commit history for each model repository and save it on the data folder.
5252
It requires the start and end index of the models to be processed. This script will take a long time to run (~1 day).
5353
```bash
@@ -58,6 +58,8 @@ Example: below it will process the first 517 models and then the next 517 models
5858
python get_commit_logs.py 0 517
5959
python get_commit_logs.py 517 1035
6060
```
61+
It will save the commit history for each model in the `../data/` folder.
62+
File names will be `huggingface_sort_by_createdAt_topN_commits_<first_index>_<last_index>.csv` and `huggingface_sort_by_createdAt_topN_commits_<first_index>_<last_index>.csv`.
6163

6264
#### Step 4: Merging the commit history into a single CSV file
6365
- `./merge_csvs.sh`: Merge the CSV files for the commit history into a single file.

0 commit comments

Comments
 (0)