Create a searchable image tag website using Python and Elasticlunr.js.
This project uses a local BLIP-2 captioning model to automatically tag your images, generates thumbnails, and provides a web interface to search them.
- Python 3.8 or newer
- Pillow for image processing
- Transformers and PyTorch for the BLIP‑2 model
- spaCy with the
en_core_web_sm
language model - scikit-image for JPEG recompression metrics
tqdm
for progress bars- Optional JPEG compression with
-J/--jpegli
uses the jpeglib Python package.
Thumbnails are generated at 256×256 pixels by default, so ensure you have enough disk space for the resized copies.
-
Install Dependencies: Ensure you have Python installed. Then, install the necessary libraries. While specific versions may vary, you'll typically need:
pip install Pillow scikit-image transformers torch torchvision torchaudio spacy tqdm jpeglib python -m spacy download en_core_web_sm
(Note:
torch
installation can vary based on your system and CUDA availability. Refer to the official PyTorch website for specific instructions if needed.) -
Process Your Images: Navigate to the repository directory and run the main pipeline script, providing the path to your image folder: Typical locations are
%USERPROFILE%\Pictures
on Windows or~/Pictures
on Linux/macOS.python run_pipeline.py [PATH_TO_YOUR_IMAGES] [-I PATH_TO_YOUR_IMAGES] [-O OUTPUT_DIR] [-R | --recurse] [-C | --clear] [-Z | --compress] [-J | --jpegli] [-A | --add] [-D | --delete] [-V | --verbose] [-S [PORT]]
Windows users: Avoid quoting a path that ends with a single backslash. Either remove the trailing backslash or escape it as
\\
so additional flags are parsed correctly.This script will:
- Scan the
PATH_TO_YOUR_IMAGES
directory (positional or via-I
/--input
) for JPG, JPEG, and PNG files. Use-R
/--recurse
to include subfolders. - Generate descriptive tags for each image using a local BLIP-2 model.
- Create 256×256 thumbnails for each image and store them in the output directory (default
img/thumbs/
). An optional watermark fromimg/overlay/watermark.png
may be applied ifmake_thumbs.py
(called by the pipeline) is configured for it. Thumbnail file names now include a short hash of the original path so duplicates across folders or extensions will never collide. - Optionally clear the contents of the output folder first when using
-C
/--clear
. - Enable additional JPEG compression with
-Z
/--compress
or use thejpeglib
library with-J
/--jpegli
. These options are mutually exclusive. - Compile all tag information into
data.json
, which is used by the search interface. - Show per-image progress bars so you know exactly how many files remain.
- Use
-V
/--verbose
to print per-image details instead of progress bars. - Use
-A
/--add
to append new images without rebuilding existing entries, or-D
/--delete
to remove records and thumbnails for images in the folder. - Use
-S [PORT]
to automatically launch the local server after processing. OmitPORT
to useserve.py
's default.
- Scan the
-
Run the Web Server: If you didn't use
-S
during the pipeline step, start the local web server manually:python serve.py
(On Linux/macOS, you might need to use
python3 serve.py
)Then, open your web browser and go to
http://localhost:8000
(or the port specified byserve.py
) to view and search your images.
index.html
: The main page for the image search.app.js
: Handles the client-side logic, including Elasticlunr.js setup and search functionality.data.json
: Contains the image tags and metadata for the search index (generated byrun_pipeline.py
).img/thumbs/
: Default directory where thumbnails are stored.run_pipeline.py
: The main script to process your images (tagging and thumbnail generation).make_thumbs.py
: Script for generating thumbnails, typically called byrun_pipeline.py
.serve.py
: A simple Python HTTP server to run the website locally.
- Make the partial rendering loop stop when you click a result before it is finished.
- Add transactional folders (e.g., IN, PROCESSED, ERROR) for more efficient content addition.
- Check EXIF/File attributes for "Time Created" to compare against a "last sync" date for incremental updates.
- Add a script to search network drives for images.
- Conduct thorough testing, including corner cases.