Skip to content

twobob/Ki

Repository files navigation

Ki

SCREENSHOT

Create a searchable image tag website using Python and Elasticlunr.js.

This project uses a local BLIP-2 captioning model to automatically tag your images, generates thumbnails, and provides a web interface to search them.

Requirements

  • Python 3.8 or newer
  • Pillow for image processing
  • Transformers and PyTorch for the BLIP‑2 model
  • spaCy with the en_core_web_sm language model
  • scikit-image for JPEG recompression metrics
  • tqdm for progress bars
  • Optional JPEG compression with -J/--jpegli uses the jpeglib Python package.

Thumbnails are generated at 256×256 pixels by default, so ensure you have enough disk space for the resized copies.

How to Use

  1. Install Dependencies: Ensure you have Python installed. Then, install the necessary libraries. While specific versions may vary, you'll typically need:

    pip install Pillow scikit-image transformers torch torchvision torchaudio spacy tqdm jpeglib
    python -m spacy download en_core_web_sm 

    (Note: torch installation can vary based on your system and CUDA availability. Refer to the official PyTorch website for specific instructions if needed.)

  2. Process Your Images: Navigate to the repository directory and run the main pipeline script, providing the path to your image folder: Typical locations are %USERPROFILE%\Pictures on Windows or ~/Pictures on Linux/macOS.

    python run_pipeline.py [PATH_TO_YOUR_IMAGES] [-I PATH_TO_YOUR_IMAGES] [-O OUTPUT_DIR] [-R | --recurse] [-C | --clear] [-Z | --compress] [-J | --jpegli] [-A | --add] [-D | --delete] [-V | --verbose] [-S [PORT]]

    Windows users: Avoid quoting a path that ends with a single backslash. Either remove the trailing backslash or escape it as \\ so additional flags are parsed correctly.

    This script will:

    • Scan the PATH_TO_YOUR_IMAGES directory (positional or via -I/--input) for JPG, JPEG, and PNG files. Use -R/--recurse to include subfolders.
    • Generate descriptive tags for each image using a local BLIP-2 model.
    • Create 256×256 thumbnails for each image and store them in the output directory (default img/thumbs/). An optional watermark from img/overlay/watermark.png may be applied if make_thumbs.py (called by the pipeline) is configured for it. Thumbnail file names now include a short hash of the original path so duplicates across folders or extensions will never collide.
    • Optionally clear the contents of the output folder first when using -C/--clear.
    • Enable additional JPEG compression with -Z/--compress or use the jpeglib library with -J/--jpegli. These options are mutually exclusive.
    • Compile all tag information into data.json, which is used by the search interface.
    • Show per-image progress bars so you know exactly how many files remain.
    • Use -V/--verbose to print per-image details instead of progress bars.
    • Use -A/--add to append new images without rebuilding existing entries, or -D/--delete to remove records and thumbnails for images in the folder.
    • Use -S [PORT] to automatically launch the local server after processing. Omit PORT to use serve.py's default.
  3. Run the Web Server: If you didn't use -S during the pipeline step, start the local web server manually:

    python serve.py

    (On Linux/macOS, you might need to use python3 serve.py)

    Then, open your web browser and go to http://localhost:8000 (or the port specified by serve.py) to view and search your images.

Project Structure Highlights

  • index.html: The main page for the image search.
  • app.js: Handles the client-side logic, including Elasticlunr.js setup and search functionality.
  • data.json: Contains the image tags and metadata for the search index (generated by run_pipeline.py).
  • img/thumbs/: Default directory where thumbnails are stored.
  • run_pipeline.py: The main script to process your images (tagging and thumbnail generation).
  • make_thumbs.py: Script for generating thumbnails, typically called by run_pipeline.py.
  • serve.py: A simple Python HTTP server to run the website locally.

TODO/MAYBES:

  • Make the partial rendering loop stop when you click a result before it is finished.
  • Add transactional folders (e.g., IN, PROCESSED, ERROR) for more efficient content addition.
  • Check EXIF/File attributes for "Time Created" to compare against a "last sync" date for incremental updates.
  • Add a script to search network drives for images.
  • Conduct thorough testing, including corner cases.

About

Elasticlunr implementation to wrap tagged image outputs

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 2

  •  
  •