What's New
Block-based OpenAI API
Language#with_openai: A new block-based API for OpenAI integration. Yields an OpenAIHelper instance that is configured once and reused for all calls within the block, making it efficient for batch processing with pipe.
Doc#linguistic_summary: Generates a JSON summary of spaCy's linguistic analysis (tokens, entities, noun chunks, sentences) that can be passed directly to LLMs as context. Sections and token attributes are fully customizable.
OpenAIHelper#chat: Convenient system:/user: shortcuts for building messages, with raw: option for full API response access.
OpenAIHelper#embeddings: Standalone embeddings method that accepts text directly.
Quality Improvements
- Bug fix:
Doc#ents now returns proper Span objects instead of raw Python objects
- Security: Model name validation in
Language#initialize to prevent injection
- OpenAI: Temperature handling for o-series models, 429 retry with exponential backoff, client reuse,
dimensions/response_format parameters, tool call depth limit
- Code quality: Unified
map usage across 17 methods, improved respond_to_missing? with py_hasattr?, optimized Doc#initialize retry loop
New Features
Token#idx for character offset access
Span#to_s for string representation
Language#memory_zone for spaCy 3.8+ memory management
PhraseMatcher support via Language#phrase_matcher
instance_variables_to_inspect for Ruby 4.0+ compatibility
Dependencies
- Added
base64 gem (Ruby 3.4+)
- Added
fiddle gem (Ruby 4.0+)
Tests
- 24 new tests added (81 total)
- All passing on Ruby 3.4.6 and Ruby 4.0.1
Example
nlp = Spacy::Language.new("en_core_web_sm")
texts = ["The bank approved the loan.", "I sat on the river bank."]
nlp.with_openai(model: "gpt-5-mini") do |ai|
nlp.pipe(texts).each do |doc|
result = ai.chat(
system: "Analyze using the linguistic data.",
user: doc.linguistic_summary
)
puts result
end
end