simonseo · simonseo · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026 · Feb 23, 2026
diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -0,0 +1,32 @@
+name: CI
+
+on:
+  push:
+    branches: [main, modernize-packaging]
+  pull_request:
+    branches: [main]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install ruff
+      - run: ruff check src/ tests/
+      - run: ruff format --check src/ tests/
+
+  test:
+    runs-on: ubuntu-latest
+    strategy:
+      matrix:
+        python-version: ["3.10", "3.11", "3.12", "3.13"]
+    steps:
+      - uses: actions/checkout@v4
+      - uses: actions/setup-python@v5
+        with:
+          python-version: ${{ matrix.python-version }}
+      - run: pip install -e ".[dev]"
+      - run: pytest
diff --git a/.gitignore b/.gitignore
@@ -1,5 +1,77 @@
-.Python
+# Byte-compiled / optimized / DLL files
 __pycache__/
+*.py[cod]
+*$py.class
+
+# C extensions
+*.so
+
+# Distribution / packaging
+.Python
+build/
+develop-eggs/
+dist/
+downloads/
+eggs/
+.eggs/
+lib/
+lib64/
+parts/
+sdist/
+var/
+wheels/
+*.egg-info/
+*.egg
+
+# Virtual environments
+.venv/
+venv/
+ENV/
+env/
 bin/
 include/
-lib/
+
+# Installer logs
+pip-log.txt
+pip-delete-this-directory.txt
+pip-selfcheck.json
+
+# Unit test / coverage
+htmlcov/
+.tox/
+.nox/
+.coverage
+.coverage.*
+.cache
+nosetests.xml
+coverage.xml
+*.cover
+*.py,cover
+.hypothesis/
+.pytest_cache/
+
+# mypy / pyright
+.mypy_cache/
+.pyright/
+
+# ruff
+.ruff_cache/
+
+# Environments
+*.env
+.env
+
+# IDE
+.idea/
+.vscode/
+*.swp
+*.swo
+*~
+
+# OS
+.DS_Store
+Thumbs.db
+
+# Project-specific
+hashtags/
+*.session
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,7 @@
+repos:
+  - repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.8.6
+    hooks:
+      - id: ruff
+        args: [--fix]
+      - id: ruff-format
diff --git a/README.md b/README.md
@@ -1,35 +1,143 @@
 # Instagram Hashtag Crawler
-[![HitCount](http://hits.dwyl.io/simonseo/instagram-hashtag-crawler.svg)](http://hits.dwyl.io/simonseo/instagram-hashtag-crawler)
 
-This crawler was made because most of the crawlers out there seems to either require a browser or a developer account. This Instagram crawler utilizes a private API of Instagram and thus no developer account is required.
+Crawl Instagram hashtags and collect post metadata (likes, comments, captions, user profiles) without a developer account.
 
-Refer to a similar script I wrote. It might be more helpful in terms of documentation: [simonseo/instacrawler-privateapi](https://github.com/simonseo/instagram-hashtag-crawler)
+Uses [instaloader](https://instaloader.github.io/) under the hood.
 
 ## Installation
-First install [Instagram Private API](https://github.com/ping/instagram_private_api). Kudos for a great project!
+
+```bash
+pip install .
+```
+
+With browser cookie support (auto-extract session from Chrome, Firefox, etc.):
+
+```bash
+pip install ".[browser]"
 ```
-$ pip install git+https://github.com/ping/instagram_private_api.git
+
+For development:
+
+```bash
+pip install -e ".[dev,browser]"
 ```
 
-Now run `__init__.py`. It'll provide you with the command options. If this shows up, everything probably works
+## Usage
+
+### Crawl hashtags
+
+```bash
+# Using browser cookies (recommended — auto-extracts session from your browser)
+instagram-hashtag-crawler --browser chrome -t foodporn
+
+# If logged in on a non-default Chrome profile, specify the cookie file
+instagram-hashtag-crawler --browser chrome \
+    --cookie-file ~/Library/Application\ Support/Google/Chrome/Profile\ 1/Cookies \
+    -t foodporn
+
+# Using username/password
+instagram-hashtag-crawler -u YOUR_USERNAME -p YOUR_PASSWORD -t foodporn
+
+# Multiple hashtags from a file
+instagram-hashtag-crawler --browser chrome -f targets.txt
+
+# With options
+instagram-hashtag-crawler --browser chrome -t foodporn \
+    --max-posts 500 \
+    --output-dir ./data \
+    -v
 ```
-$ python __init__.py
-usage: __init__.py [-h] -u USERNAME -p PASSWORD [-f TARGETFILE] [-t TARGET]
+
+### Multi-hashtag AND search
+
+Pass `-t` multiple times to find posts that contain **all** specified hashtags:
+
+```bash
+# Posts tagged with BOTH #foodporn AND #pizza
+instagram-hashtag-crawler --browser chrome -t foodporn -t pizza
+
+# Three-way AND
+instagram-hashtag-crawler --browser chrome -t food -t pizza -t italy
 ```
 
-## Get Crawlin'
-To get crawlin', you need to provide your Instagram username and password, and either an Instagram Hashtag without the hash (target) or a text file of the hashtags in each row (targetfile).
-Wait a bit and a folder will be made with all the hashtags crawled.
+Output is saved as `food_AND_pizza.json` (tags sorted alphabetically, joined by `_AND_`).
+
+You can also run it as a module:
 
-## Options
-Inside `__init__.py`, there is a config dictionary. Each config option is explained in the comments.
-Note that `min_collect_media` and `max_collect_media` is trumped if `min_timestamp` is provided as a number.
+```bash
+python -m instagram_hashtag_crawler --browser chrome -t foodporn
 ```
-config = {
-	'profile_path' : './hashtags',                          # Path where output data gets saved
-	'min_collect_media' : 1,                                # how many media items to be collected per hashtag. If time is specified, this is ignored
-	'max_collect_media' : 2000,                             # how many media items to be collected per hashtag. If time is specified, this is ignored
-	# 'min_timestamp' : int(time() - 60*60*24*30*2)           # up to how recent you want the posts to be in seconds. If you do not want to use this, put None as value
-	'min_timestamp' : None
-}
+
+### Export to CSV
+
+```bash
+instagram-hashtag-export --json-dir ./hashtags --csv-dir ./output
 ```
+
+### Options
+
+| Flag | Description | Default |
+|------|-------------|---------|
+| `--browser` | Auto-extract session from browser (chrome, firefox, safari, edge, brave, etc.) | — |
+| `--cookie-file` | Path to browser cookie file (for non-default profiles) | — |
+| `-u`, `--username` | Instagram username (not needed with `--browser`) | — |
+| `-p`, `--password` | Instagram password (not needed with `--browser`) | — |
+| `-t`, `--target` | Hashtag to crawl (without `#`). Repeat for AND search. | — |
+| `-f`, `--targetfile` | File with hashtags, one per line | — |
+| `--output-dir` | Directory for JSON output | `./hashtags` |
+| `--max-posts` | Max posts per hashtag | `100` |
+| `--min-posts` | Min posts required | `1` |
+| `--since` | Unix timestamp — only collect newer posts | — |
+| `--session-file` | Path to save/load session (with `-u`/`-p`) | — |
+| `-v`, `--verbose` | Debug logging | off |
+
+### Target file format
+
+One hashtag per line, no `#` prefix:
+
+```
+delicious
+dish
+foodpornography
+```
+
+See [`examples/targets.txt`](examples/targets.txt) for a sample.
+
+## Output
+
+Each hashtag produces a JSON file in the output directory:
+
+```
+hashtags/
+  delicious.json
+  dish.json
+  food_AND_pizza.json   # multi-hashtag AND result
+```
+
+Each JSON file contains an array of post objects with fields like `shortcode`, `user_id`, `username`, `like_count`, `comment_count`, `caption`, `tags`, `pic_url`, `date`, and profile metadata.
+
+## Development
+
+```bash
+# Install dev dependencies
+pip install -e ".[dev,browser]"
+
+# Lint
+ruff check src/ tests/
+ruff format --check src/ tests/
+
+# Test
+pytest
+
+# Pre-commit hooks
+pre-commit install
+```
+
+## Requirements
+
+- Python 3.10+
+- An Instagram account (no developer/API access needed)
+
+## License
+
+MIT
diff --git a/__init__.py b/__init__.py