🔄 Modernize storytelling-chatbot Example with Latest Pipecat Patterns

## Context

I've been studying Pipecat through the official examples, and the **storytelling-chatbot** example stands out as a compelling concept for demonstrating multimodal AI capabilities. However, based on reviewing the codebase and comparing it to newer examples like `local-smart-turn`, `word-wrangler-gemini-live`, and the foundational examples in the main Pipecat repo, the storytelling example appears to use patterns that could benefit from modernization.

The storytelling example is particularly valuable because it demonstrates:
- Multi-turn interactive narratives
- Image generation synchronized with narration
- Voice-driven user input for "choose your own adventure" experiences
- Integration of Gemini 2.0 LLM with Google Imagen

However, the implementation could leverage more recent Pipecat architectural patterns to improve clarity and extensibility.

---

## 🎯 Proposed Modernization

### 1. Structured Text Segmentation with `PatternPairAggregator` + `LLMTextProcessor`

**Current approach (assumed):** Text parsing likely uses manual regex or string splitting to separate narration from image prompts.

**Modern approach:** Use `PatternPairAggregator` with XML-style tags for clean segmentation:

```python
from pipecat.processors.llm_text_processor import LLMTextProcessor
from pipecat.utils.text.pattern_pair_aggregator import PatternPairAggregator, MatchAction

# Configure pattern aggregator for story segments
pattern_aggregator = PatternPairAggregator()

# Define patterns for different content types
pattern_aggregator.add_pattern(
    type="narration",
    start_pattern="<narration>",
    end_pattern="</narration>",
    action=MatchAction.AGGREGATE
)

pattern_aggregator.add_pattern(
    type="image_prompt",
    start_pattern="<image_prompt>",
    end_pattern="</image_prompt>",
    action=MatchAction.AGGREGATE
)

# Create processor to segment LLM output
llm_text_processor = LLMTextProcessor(text_aggregator=pattern_aggregator)
```

**Benefits:**
- No manual regex parsing
- Clear separation between narration and image generation instructions
- Structured metadata attached to each segment
- Easier to extend with additional segment types (e.g., sound effects, scene transitions)

---

### 2. Dedicated Orchestration with `StoryOrchestratorProcessor`

**Proposal:** Create a custom processor that manages story flow and coordinates multimodal outputs.

```python
class StoryOrchestratorProcessor(FrameProcessor):
    """Coordinates story pages, image generation, and narration sequencing."""
    
    def __init__(self, image_generator):
        super().__init__()
        self._current_page = 0
        self._pages = []
        self._image_generator = image_generator
        
    async def process_frame(self, frame, direction):
        # Handle narration segments
        if isinstance(frame, AggregatedTextFrame) and frame.aggregated_by == "narration":
            await self._queue_narration(frame.text)
            
        # Handle image prompts
        elif isinstance(frame, AggregatedTextFrame) and frame.aggregated_by == "image_prompt":
            await self._generate_and_queue_image(frame.text)
            
        await self.push_frame(frame, direction)
```

**Responsibilities:**
- Maintain story state (current page, history)
- Trigger image generation for each `<image_prompt>` segment
- Forward narration frames to TTS pipeline
- Handle user input for story choices

This separates orchestration logic from parsing logic, making the pipeline more modular.

---

### 3. Synchronize Narration & Images with Frame Observers

**Current approach (assumed):** Timing may be implicit or based on delays.

**Modern approach:** Use frame observers to detect narration completion:

```python
class StorySyncObserver(FrameObserver):
    """Observes TTS completion to trigger next story page."""
    
    def __init__(self, orchestrator):
        self._orchestrator = orchestrator
        
    async def on_frame(self, frame):
        if isinstance(frame, TTSAudioEndFrame):
            # Narration finished, show next image
            await self._orchestrator.advance_to_next_page()
```

**Flow:**
1. Narration 1 plays → `TTSAudioEndFrame` → Show Image 2 → Play Narration 2
2. Narration 2 plays → `TTSAudioEndFrame` → Show Image 3 → Play Narration 3

This creates deterministic, page-by-page storytelling without timing hacks.

---

### 4. Clear Component Separation

| Responsibility | Component |
|----------------|-----------|
| Text segmentation | `LLMTextProcessor` with `PatternPairAggregator` |
| Story flow / page management | `StoryOrchestratorProcessor` |
| Image generation | `StoryImageProcessor` (or existing Imagen integration) |
| Narration timing | `StorySyncObserver` (observes `TTSAudioEndFrame`) |
| UI updates | Transport image frames |

---

## ✅ Benefits of Modernization

1. **Educational Value**: Demonstrates recommended Pipecat patterns for multimodal agents
2. **Extensibility**: Easy to add features like:
   - Multiple story paths with user choice
   - Background music synchronized with scene changes
   - Video generation (future)
   - Story progress UI
3. **Maintainability**: Clear separation of concerns makes debugging easier
4. **Consistency**: Aligns with patterns in newer examples (`local-smart-turn`, `simple-chatbot`)
5. **Performance**: Async image generation doesn't block narration pipeline

---

## 📝 Implementation Checklist

If this proposal is approved, I'm happy to contribute a PR with:

- [ ] Updated LLM prompt to output `<narration>` and `<image_prompt>` tags
- [ ] `PatternPairAggregator` configuration for segmentation
- [ ] `StoryOrchestratorProcessor` for flow control
- [ ] `StorySyncObserver` for narration/image synchronization
- [ ] Updated README with architecture explanation
- [ ] Frontend updates to display incoming image frames
- [ ] Example demonstrating the pattern

---

## 🤔 Questions for Maintainers

1. Should this be a **refactor** of the existing storytelling-chatbot, or a **new example** (e.g., `storytelling-chatbot-v2`)?
2. Are there specific Pipecat patterns you'd like demonstrated in this example?

---

## Additional Deliverables (if helpful)

I can also provide:
- ✔ Pipeline architecture diagram (ASCII or Mermaid)
- ✔ Step-by-step migration guide from old to new pattern
- ✔ Comparative example showing before/after code

This example has great potential to showcase Pipecat's multimodal capabilities. Modernizing it would provide a clearer learning path for developers building similar experiences.

---

**References:**
- [PatternPairAggregator PR #1387](https://github.com/pipecat-ai/pipecat/pull/1387)
- [LLMTextProcessor Release Notes](https://github.com/pipecat-ai/pipecat/releases)
- [Text Aggregation Docs](https://docs.pipecat.ai/guides/learn/text-to-speech)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🔄 Modernize storytelling-chatbot Example with Latest Pipecat Patterns #126

Context

🎯 Proposed Modernization

1. Structured Text Segmentation with `PatternPairAggregator` + `LLMTextProcessor`

2. Dedicated Orchestration with `StoryOrchestratorProcessor`

3. Synchronize Narration & Images with Frame Observers

4. Clear Component Separation

✅ Benefits of Modernization

📝 Implementation Checklist

🤔 Questions for Maintainers

Additional Deliverables (if helpful)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Responsibility	Component
Text segmentation	`LLMTextProcessor` with `PatternPairAggregator`
Story flow / page management	`StoryOrchestratorProcessor`
Image generation	`StoryImageProcessor` (or existing Imagen integration)
Narration timing	`StorySyncObserver` (observes `TTSAudioEndFrame`)
UI updates	Transport image frames

🔄 Modernize storytelling-chatbot Example with Latest Pipecat Patterns #126

Description

Context

🎯 Proposed Modernization

1. Structured Text Segmentation with PatternPairAggregator + LLMTextProcessor

2. Dedicated Orchestration with StoryOrchestratorProcessor

3. Synchronize Narration & Images with Frame Observers

4. Clear Component Separation

✅ Benefits of Modernization

📝 Implementation Checklist

🤔 Questions for Maintainers

Additional Deliverables (if helpful)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Structured Text Segmentation with `PatternPairAggregator` + `LLMTextProcessor`

2. Dedicated Orchestration with `StoryOrchestratorProcessor`