You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
-`GH_data_safetensor_sorted.csv`: GitHub PRs sorted by cosine similarity (descending).
9
-
-`SO_data_safetensor_sorted.csv`: StackOverflow posts sorted by cosine similarity (descending).
10
-
-`huggingface_sort_by_createdAt_top996939_commits_N_M.csv`: HuggingFace commit history for models N to M.
11
-
-`huggingface_sort_by_createdAt_top996939_errors_N_M.csv`: Error logs for models N to M.
12
-
-`huggingface_sort_by_createdAt_top996939_selected.json`: Selected models from HuggingFace to be analyzed based on our filtering criteria (see scripts/notebooks).
Copy file name to clipboardExpand all lines: scripts/README.md
+3-1Lines changed: 3 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -47,7 +47,7 @@ pip install .
47
47
It will select model repositories and save the filtered list at `../data/huggingface_sort_by_createdAt_topN_selected.json`.
48
48
49
49
#### Step 3: Getting the commit history of the models
50
-
-`get_models_history.py`: Script to get metadata of all models from HuggingFace.
50
+
-`get_commit_logs.py`: Script to get metadata of all models from HuggingFace.
51
51
It will produce commit history for each model repository and save it on the data folder.
52
52
It requires the start and end index of the models to be processed. This script will take a long time to run (~1 day).
53
53
```bash
@@ -58,6 +58,8 @@ Example: below it will process the first 517 models and then the next 517 models
58
58
python get_commit_logs.py 0 517
59
59
python get_commit_logs.py 517 1035
60
60
```
61
+
It will save the commit history for each model in the `../data/` folder.
62
+
File names will be `huggingface_sort_by_createdAt_topN_commits_<first_index>_<last_index>.csv` and `huggingface_sort_by_createdAt_topN_commits_<first_index>_<last_index>.csv`.
61
63
62
64
#### Step 4: Merging the commit history into a single CSV file
63
65
-`./merge_csvs.sh`: Merge the CSV files for the commit history into a single file.
0 commit comments