I hold an M.A. in Language, Mind, Technology from Adam Mickiewicz University. This interdisciplinary program combines linguistics with computer science, cognitive science, and AI to study language using both theoretical and experimental methods.
During a 3-year paid research internship on the PRODIS project, I built and maintained the full machine learning and data processing stack. This included a first-of-its-kind phoneme-level GPT model for Polish, CI pipelines for survey processing, transcription QA tools, a batch ASR wrapper, and a custom web interface for data collection. I also developed an internal GUI tool to track the progress of data collection.
Outside of research, I build cross-platform tools, games, and backend infrastructure in Python and C++. My projects emphasize automation, reproducibility, and performance. I've used Linux since 2016 and maintain a self-hosted server.
Highlights include:
model
– Python CLI pipeline for training a phoneme-level GPT model on Polish IPA, with a custom tokenizer, TOML-based configs, and multithreaded scripts for formant extraction, surprisal prediction, alignment, and stress annotation.survey
– Python CI-based tool for cleaning and standardizing survey exports, including translation and structural validation.asr
– Python CLI wrapper around Whisper for batch ASR with stereo-to-mono conversion and model/language selection.header-warden
– C++ CLI multithreaded static analysis tool that reports missing standard library headers in C++ code.fattura
– C++ GUI app for editing transcription verification status CSV with autosave.aegyo
– C++ GUI app for learning Korean Hangul with full mouse and keyboard input.vroom
– In-progress C++ GUI 2D racing game with arcade drift physics, procedurally-generated tracks, and waypoint AI.
More: ryouze.net/projects
- Website: ryouze.net
- LinkedIn: Jan Foremski
- Email: [email protected]