Meta-Lingo

A Modern Multimodal Corpus Research Software

Features | Installation | Quick Start | Documentation | License

Overview

Meta-Lingo is a comprehensive desktop application designed for corpus linguistics research. Built with modern technologies (Electron + React + Python FastAPI), it provides powerful tools for multimodal corpus management, linguistic analysis, and annotation.

Features

Corpus Management

Multimodal Support: Text, audio, and video files with drag-and-drop upload
Audio Transcription: Whisper Large V3 Turbo with word-level timestamps
Forced Alignment: Wav2Vec2 word-level alignment for English audio (automatic)
Pitch Extraction: TorchCrepe F0 extraction for English audio (automatic)
Video Analysis: YOLOv8 object detection and CLIP semantic classification
Automatic Annotation: SpaCy NLP (POS/NER/Dependency), USAS semantic domains, MIPVU metaphor identification
Metadata Management: Language, author, source, text type with tag system

Analysis Tools

Module	Description
Word Frequency	Frequency analysis with POS filtering, lemma/word form selection, visualization
N-gram Analysis	2-6 gram support, nested grouping, Sankey diagrams
Keyword Extraction	TF-IDF, TextRank, YAKE!, RAKE, and 9 keyness statistics methods
Collocation	KWIC search with 6 modes, CQL query language, CQL Builder
Synonym Analysis	WordNet integration with network visualization
Semantic Domain	USAS-based analysis with dual view (by domain/by word)
Metaphor Analysis	MIPVU-based detection; 3-step pipeline (word filter → rules → Clause model); source color-coding by POS
Word Sketch	Grammar pattern analysis (50 relations), logDice scoring, difference comparison
Topic Modeling	BERTopic, LDA, LSA, NMF with dynamic topic analysis
Bibliography	Refworks parsing (WOS/CNKI), shadow corpus for abstracts, network visualization, burst detection; analysis modules support corpus/literature toggle and library selection (all / by keyword / manual).

Annotation Mode

Text Annotation: Sentence-level display, intelligent segmentation, batch annotation
Multimodal Annotation: Video frame tracking, DAW-style timeline, YOLO overlay
Audio Waveform Annotation: Wavesurfer.js waveform visualization with word alignment, pitch curve overlay, box drawing annotation (English audio only)
Framework Management: 49 preset frameworks (SFL, UAM, etc.), custom framework support
Inter-coder Reliability: Fleiss' Kappa, Cohen's Kappa, Krippendorff's Alpha, Gold Standard support (plain text archives only)
Syntax Visualization: Constituency and dependency parsing

Additional Features

Dictionary Lookup: Macmillan, Longman Collocations with fuzzy search
Bilingual Interface: Chinese and English with real-time switching
Custom Wallpaper: Personalized application background
Export Options: CSV, PNG, SVG for all visualizations

System Architecture

+----------------------------------------------------------+
|                      Meta-Lingo                           |
+----------------------------------------------------------+
|  Frontend (Electron + React + TypeScript)                 |
|  - Material-UI components                                 |
|  - Zustand state management                               |
|  - D3.js / Plotly.js visualizations                       |
|  - i18next internationalization                           |
+----------------------------------------------------------+
|                    HTTP REST API                          |
+----------------------------------------------------------+
|  Backend (Python FastAPI)                                 |
|  - SpaCy NLP processing                                   |
|  - USAS semantic tagging (PyMUSAS)                        |
|  - MIPVU metaphor detection (DeBERTa)                     |
|  - BERTopic / LDA / LSA / NMF topic modeling              |
|  - Whisper / YOLO / CLIP multimodal analysis              |
+----------------------------------------------------------+
|  Data Storage                                             |
|  - SQLite database (metadata)                             |
|  - File system (corpora, annotations)                     |
+----------------------------------------------------------+

Tech Stack

Frontend

Technology	Purpose
Electron 28+	Desktop application framework
React 18	UI framework
TypeScript 5	Type safety
Material-UI 5	Component library
D3.js 7	Data visualization
Plotly.js	Interactive charts

Backend

Technology	Purpose
Python 3.12	Runtime environment
FastAPI	Web framework
SpaCy 3.8+	NLP processing
PyMUSAS	Semantic tagging
BERTopic	Topic modeling
Transformers	Whisper/CLIP models
Ultralytics	YOLOv8

Installation

Download

Visit our official website to download the latest version:

https://tltanium.github.io/meta-lingo-website/

Source code in this repository is provided for reference and academic verification only. Please use the official distribution above to run Meta-Lingo.

Quick Start

After installing from the website, launch the application and follow the in-app guidance. For documentation, use the Help module inside the application.

Documentation

In-app Help: Access via the Help module with bilingual documentation
API Documentation: http://localhost:8000/docs (when backend is running)

API Overview

Category	Endpoints
Corpus	`/api/corpus/*` - CRUD, upload, annotation
Analysis	`/api/analysis/*` - Word frequency, N-gram, keywords, etc.
Collocation	`/api/collocation/*` - KWIC search, CQL parsing
Topic Modeling	`/api/topic-modeling/*` - BERTopic, LDA, LSA, NMF
Annotation	`/api/annotation/`, `/api/framework/`
Word Sketch	`/api/sketch/*` - Grammar patterns, difference
Bibliography	`/api/biblio/*` - Libraries, visualization

Full API documentation available at /docs endpoint.

Models & Resources

Meta-Lingo integrates several pre-trained models:

Model	Purpose	Source
Whisper Large V3 Turbo	Audio transcription	OpenAI
Wav2Vec2-base-960h	Forced alignment (English)	Facebook
TorchCrepe Full	Pitch extraction (F0)	maxrmorrison/torchcrepe
YOLOv8	Object detection	Ultralytics
CLIP ViT-Large-Patch14	Image classification	OpenAI
SpaCy en/zh_core_web_lg	NLP processing	Explosion
DeBERTa-v3-large-clause-metaphor	MIPVU metaphor detection (F1 75.83)	tommyleo2077
Sentence-BERT	Text embeddings	sentence-transformers

Contributing

This project is currently maintained for academic research purposes. For bug reports or feature requests, please open an issue.

Changelog

v3.9.56 (2026-03-07)

Metaphor Analysis — Clause-only pipeline: Removed HiTZ model entirely. All tokens now annotated by a single deberta-v3-large-clause-metaphor model using full-sentence context (max_length=192). 3-step pipeline: word-form filter → SpaCy rule filter → Clause model. Function words (IN/DT/RB/RP) keep orange tag (finetuned); other words use green tag (clause). Legacy hitz source in existing annotations treated as clause (green). Help docs updated with Clause model accuracy (Precision 78.08%, Recall 73.69%, F1 75.83; DT F1 90.87, IN F1 87.87).

v3.9.55 (2026-03-07)

Sentiment Analysis — USAS mode: Search panel adds "USAS Semantic Domain" mode; results aggregate sentiment scores by domain code with full domain name tooltip; word cloud uses domain names; CSV export adds domain_name column.

v3.9.54–v3.9.51 (2026-03-06)

Bibliography Visualization: PDF export rewritten via Electron IPC (printToPDF) to fix blank-page issue on large documents. Paper column with PDF upload and first-page thumbnail. 11 AI-generated fields per entry (research goal, questions, design, conclusions, mechanism, contribution, limitations, value, dialogue, future work, summary). Batch AI generation for multiple entries. Column visibility control. Export to styled PDF report.

v3.9.46 (2026-03-04)

Sentiment Analysis (NRC): Full NRC-EmoLex annotation added to corpus pipeline after MIPVU. New analysis page with polarity (pie chart + word cloud) and emotion dimensions (radar chart + word cloud). Result table cross-links to collocation/word sketch/N-gram/semantic domain. Backend: nrc_service.py, sentiment_analysis_service.py, POST /api/analysis/sentiment.

v3.9.44–v3.9.45 (2026-03-02)

Cross-module links default to case-insensitive search. Collocation wordlist search mode (multi-word input, one per line).

v3.9.43 (2026-03-02)

Bibliography: Bulk delete for selected entries. Relevance rating (0–5 stars), tags, and notes columns added to entry table and detail dialog. CSV export.

v3.9.36 (2026-02-27)

Metaphor Analysis: Added Clause model (deberta-v3-large-clause-metaphor) to MIPVU pipeline for function-word annotation. POS-group statistics (IN/DT/RB/RP/OTHER metaphor rates) shown in results table header.

v3.9.34–v3.9.35 (2026-02-26–27)

Cross-module corpus selection sync across all analysis modules. Topic modeling bibliography mode with publication year for dynamic analysis.

v3.9.27–v3.9.33 (2026-02-24–26)

AI Assistant: Robot icon in all analysis modules' left panel (requires Ollama or OpenAI-compatible API); sends current page state as context. OpenAI-compatible API support in Settings (address / key / model). Cross-module library-mode link sync fixes.

v3.9.22–v3.9.26 (2026-02-24)

Semantic domain analysis: CQL cross-link, word cloud, domain name display. Collocation network expand on click, MinSense fix, Word Sketch Difference word-form/lemma mode. Topic modeling: N-gram preprocessing mode, LDA/LSA/NMF dynamic topic analysis.

v3.9.15–v3.9.18 (2026-02-17–22)

Praat acoustic analysis: Spectrogram, formants (F1–F5), intensity, HNR, jitter, shimmer. Chinese audio full visualization support. Corpus building script (saves/corpus/corpus_building.py) for 13 English corpora.

v3.9.0–v3.9.14 (2026-02-08–16)

Ridge plot SVG/PNG full export, CQL top-level OR operator and template auto-fill. Collocation search mode (lemma/word form). Result table search fix across all modules. Unified UI spacing and labeling. Cross-module N-gram link.

v3.8.96–v3.8.99 (2026-01-28–2026-02-08)

Audio waveform annotation (Wavesurfer.js + TorchCrepe pitch + box drawing). Full annotation pipeline for audio/video transcripts. Inter-coder reliability gold standard fix. CQL distance selector fix.

v3.8.86–v3.8.95 (2026-01-18–28)

LLM topic naming (Ollama). USAS annotation modes (rule / neural / hybrid). Stopword removal (20+ languages). Custom wallpaper. Keyword extraction enhancements. Theme/Rheme auto-annotation. Dark theme for all topic modeling visualizations.

For the full version history, see PROJECT.md or the Git commit log.

License

Meta-Lingo Software License (Non-Commercial)

Meta-Lingo is an independently developed corpus research software by Tommy Leo, protected under the Copyright Law of the People's Republic of China.

This software is licensed only for:

Personal learning
Academic research
Non-commercial corpus analysis and linguistic research

Commercial use is prohibited without written permission.

See LICENSE_CN.txt (Chinese) or LICENSE_EN.txt (English) for full terms.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
assets		assets
backend		backend
build/mcp_server/mcp_server		build/mcp_server/mcp_server
data		data
docs		docs
electron		electron
help		help
mcp-extension		mcp-extension
memory		memory
models		models
public		public
saves		saves
scripts		scripts
src		src
.gitignore		.gitignore
CITATION		CITATION
LICENSE		LICENSE
LICENSE_CN.txt		LICENSE_CN.txt
LICENSE_EN.txt		LICENSE_EN.txt
README.md		README.md
corpus_building.py		corpus_building.py
excludes.txt		excludes.txt
package.json		package.json
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Meta-Lingo

Overview

Features

Corpus Management

Analysis Tools

Annotation Mode

Additional Features

System Architecture

Tech Stack

Frontend

Backend

Installation

Download

Quick Start

Documentation

API Overview

Models & Resources

Contributing

Changelog

v3.9.56 (2026-03-07)

v3.9.55 (2026-03-07)

v3.9.54–v3.9.51 (2026-03-06)

v3.9.46 (2026-03-04)

v3.9.44–v3.9.45 (2026-03-02)

v3.9.43 (2026-03-02)

v3.9.36 (2026-02-27)

v3.9.34–v3.9.35 (2026-02-26–27)

v3.9.27–v3.9.33 (2026-02-24–26)

v3.9.22–v3.9.26 (2026-02-24)

v3.9.15–v3.9.18 (2026-02-17–22)

v3.9.0–v3.9.14 (2026-02-08–16)

v3.8.96–v3.8.99 (2026-01-28–2026-02-08)

v3.8.86–v3.8.95 (2026-01-18–28)

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages