This repository presents the code and resources for the research project "Interpretable Prototype: A Bridge from Text to Time-Series for LLM"
This project explores how Large Language Models (LLMs) can be leveraged for time-series analysis tasks, including forecasting, classification, and anomaly detection.
To overcome the modality gap between time-series data and natural language, we introduce a novel method that aligns time-series embeddings with interpretable text prototypes.
- Generalizability: Traditional time-series models often require task-specific architectures and domain expertise.
- LLM Potential: LLMs show strong few-shot/zero-shot capabilities, but cannot directly handle raw time-series due to modality mismatch.
- Solution: Use interpretable text prototypes to align time-series embeddings with the LLM’s text embedding space.
- Use text prototypes (e.g., "Trend", "Volatility", "Consumption") to represent key characteristics of time-series data.
- Train a model to align time-series tokens with these interpretable text embeddings.
- Improve interpretability, alignment, and performance on downstream tasks.
-
Text Prototype Selection
- General:
Trend
,Seasonality
,Cyclicality
,Volatility
, ... - Dataset-specific:
Electricity
,Usage
,Household
, ...
- General:
-
Embedding Training
- Use CKA (Centered Kernel Alignment) Loss to align time-series embeddings with text embeddings.
- Fixed text-prototypes + learnable weights for interpretability and training efficiency.
-
Downstream Tasks
- Apply to forecasting and classification using aligned embeddings.
-
Interpretability
- Visualize attention maps and cosine similarity between time steps and prototypes.
- Pre-trained LLM backbone (Transformer blocks)
- Time-series embedder
- Text-prototype alignment module
- Cosine similarity + attention analysis
ElectricDevices Dataset
- Behavioral energy usage data from UK households
- 251 households, sampled every 2 minutes over 24 hours
- 720 time steps per sequence
📁 prototype-alignment-LLM/
├── test
├── ElectricDevices/ # Time-series datasets and labels
├── code/ # Model architecture and training scripts
├── visualization/ # Visualizations of Embedding
├── visulization_semantic/ # Visualizations of Embedding in Semantic Space matched with Words
├──README.md
- CKA score improvements before & after alignment
- Prototype attention maps
- Interpretability with domain-relevant text concepts
- OneFitsAll: Pretrained LM for Time-Series (NeurIPS 2023)
- Time-LLM (ICLR 2024)
- CALF (arXiv 2024)
- TEST (ICLR 2024)
- X-VILA (arXiv 2024)
- Jiyun Kim
Department of Computer Science & Engineering
Korea University
DAIS Lab (Lab Meeting 11.29)