Integration: Add langextract as a local visual trace layer (observability HTML viewer + CLI)

## Overview

Add first-class integration with [google/langextract](https://github.com/google/langextract) so that a user can **run a PraisonAI workflow and view the full execution** — agent boundaries, LLM turns, tool calls, final output — **as an interactive, self-contained HTML visualization** that highlights every step grounded in the source prompt/input text.

This is the **visualization analogue of the n8n integration** (`praisonai n8n open`) we just shipped. Where n8n gave us a visual **editor** for workflows, langextract will give us a visual **viewer** for workflow executions — a zero-server, zero-sign-up way to inspect what an agent actually did, grounded in the exact spans of input it reasoned about.

Primary UX target:

```bash
# Run any YAML workflow with langextract observability
praisonai agents.yaml --observe langextract
# -> produces trace.jsonl + trace.html, opens browser to HTML viewer

# Or render an existing trace/session to HTML
praisonai langextract view trace.jsonl --open
```

Python API target:

```python
from praisonaiagents import Agent
from praisonai.observability import LangextractSink, LangextractSinkConfig

agent = Agent(name="researcher", instructions="Summarize text")
with LangextractSink(config=LangextractSinkConfig(output_path="run.html")):
    agent.start("Long input document ...")
# run.jsonl + run.html written; open run.html to explore the trace
```

---

## Background

### What is langextract?

> "A Python library for extracting structured information from unstructured text using LLMs with **precise source grounding** and **interactive visualization**." — [google/langextract README](https://github.com/google/langextract)

Key properties that make it ideal as PraisonAI's visualization layer:

1. **Precise source grounding** — every extraction maps to an exact `char_interval` of the source text.
2. **Self-contained interactive HTML** — `lx.visualize("extractions.jsonl")` produces a single HTML file with highlights, timeline, filters. No server, no sign-up.
3. **Stable data model** — `lx.data.Extraction(extraction_class, extraction_text, attributes, char_interval)` + `lx.data.AnnotatedDocument` is small and round-trippable.
4. **Apache-2.0** — compatible with PraisonAI licensing.
5. **Optional dependency** — only activated when the user opts in.

### Why this is valuable

- **Zero-config review UX** — today, to understand what an agent did, the user has to trawl terminal logs, Langfuse cloud, or enable verbose mode. A single local HTML file is dramatically simpler.
- **Grounding debugging** — for agents that read large inputs (docs, web pages, transcripts), seeing which spans the agent actually used vs. hallucinated is the #1 debugging ask. langextract renders this natively.
- **Shareable, offline** — the HTML is self-contained; drop it into Slack/a PR for review.
- **Complements — does not replace — Langfuse** — Langfuse is cloud SaaS + production tracing; langextract is local file + run-time review.
- **Matches the n8n pattern** — "one command, external UI becomes your eyes on the workflow" is proven user-friendly.

### Current ecosystem state

Today users who want visual trace review must either:
- Use `--observe langfuse` (requires cloud sign-up, API keys, internet).
- Write a custom notebook that wires `Agent` events into a plotting library.
- Use `output="actions"` which only prints to the terminal.

langextract closes the "local-only, zero-install-extra-infra, self-contained HTML review" gap.

---

## Architecture Analysis

### Current Implementation

**Existing trace / observability infrastructure (ready for reuse — DRY):**

| Component | Path | Role |
|-----------|------|------|
| `ActionEvent` / `ActionEventType` | `src/praisonai-agents/praisonaiagents/trace/protocol.py` | Canonical event schema (AGENT_START/END, TOOL_START/END, ERROR, OUTPUT) |
| `TraceSinkProtocol` | `src/praisonai-agents/praisonaiagents/trace/protocol.py` | Pluggable sink interface (`emit`, `flush`, `close`) |
| `ContextTraceEmitter` + `get_context_emitter()` | `src/praisonai-agents/praisonaiagents/trace/context_events.py` | Already wired into `Agent.chat`/`Agent.start` — emits `agent_start`, `agent_end`, `llm_response`, `tool_start`, `tool_end` |
| `LangfuseSink` | `src/praisonai/praisonai/observability/langfuse.py` | Reference implementation of a `TraceSinkProtocol` adapter — we copy this pattern |
| `--observe` flag | `src/praisonai/praisonai/cli/app.py:124-153` | Already parses provider name; currently only accepts `langfuse`. Extend with `langextract`. |
| `_setup_langfuse_observability()` | `src/praisonai/praisonai/cli/app.py:153` | Pattern for how a sink is wired up at CLI entry — copy for `_setup_langextract_observability` |

**Zero existing langextract integration** in any of:
- `src/praisonai-agents/praisonaiagents/tools/` (no `langextract_tools.py`)
- `src/praisonai/praisonai/observability/` (only `langfuse.py`)
- `examples/python/tools/` (only the user's unofficial `~/test/langextract/app.py`)
- `PraisonAI-tools/` (nothing)
- `PraisonAIDocs/docs/observability/` (not covered)

So this is a **greenfield addition** that slots cleanly into existing extension points — no protocol changes, no breaking changes.

### Key File Locations

| File | Purpose | Why it matters here |
|------|---------|-----|
| `src/praisonai-agents/praisonaiagents/trace/protocol.py` | `ActionEvent`, `TraceSinkProtocol`, `NoOpSink`, `ListSink` | Our new adapter implements `TraceSinkProtocol`. Zero changes here. |
| `src/praisonai-agents/praisonaiagents/trace/context_events.py` | `ContextTraceEmitter`, `ContextEvent` (with richer LLM token / content fields) | Source of events we map to `lx.data.Extraction`. Zero changes here. |
| `src/praisonai/praisonai/observability/langfuse.py` (306 lines) | Reference `TraceSinkProtocol` adapter | Template for the new `LangextractSink`. |
| `src/praisonai/praisonai/observability/__init__.py` | Lazy-loaded exports | Register `LangextractSink` / `LangextractSinkConfig` here. |
| `src/praisonai/praisonai/cli/app.py:124-153` | `--observe` typer option + `_setup_*_observability` | Extend to accept `langextract` and wire up the sink. |
| `src/praisonai/praisonai/cli/commands/` | Per-feature CLI subapps | New `langextract.py` sub-app with `view`, `render`, `open` commands. |
| `src/praisonai-agents/praisonaiagents/tools/` | Built-in tools | New optional `langextract_tools.py` wrapping `lx.extract` as a callable tool for agents. |

### Data-flow (proposed)

```
Agent.start(input_text)
  └─ ContextTraceEmitter.agent_start/llm_response/tool_start/tool_end/agent_end
       └─ LangextractSink.emit(ActionEvent)          ← NEW
            └─ maps event → lx.data.Extraction
                 · extraction_class  = "agent" | "llm_turn" | "tool_call" | "final_output" | "error"
                 · extraction_text   = verbatim span from input_text (or from agent output for output classes)
                 · char_interval     = exact offsets (None for non-grounded events)
                 · attributes        = {agent_name, tool_name, duration_ms, tokens, finish_reason, status}
            └─ buffered list of Extractions
       └─ on agent_end / close():
            · build lx.data.AnnotatedDocument(text=input_text, extractions=[...])
            · lx.io.save_annotated_documents([doc], output_name=path.jsonl)
            · html = lx.visualize(path.jsonl)
            · write path.html, optionally webbrowser.open()
```

---

## Gap Analysis

| Area | Current State | Gap | Severity | Placement |
|------|---------------|-----|----------|-----------|
| **Observability adapter** | Only `LangfuseSink` exists | No `LangextractSink` implementing `TraceSinkProtocol` | **Critical** (blocks feature) | `praisonai/observability/langextract.py` (wrapper) |
| **Event→Extraction mapping** | No mapper; `ContextEvent` has rich fields already | Need a pure function that grounds events against input text and produces `lx.data.Extraction` list | **Critical** | `praisonai/observability/langextract.py` |
| **CLI observe provider** | `--observe langfuse` only; hard-coded check at `cli/app.py:151` | Need `langextract` accepted; need `_setup_langextract_observability()` | **High** | `praisonai/cli/app.py` |
| **CLI sub-command** | No `langextract` command | Need `praisonai langextract view`, `render`, `open` for ad-hoc + session rendering | **High** | `praisonai/cli/commands/langextract.py` |
| **Built-in tool** | `tavily_tools.py`, `agentql` examples — nothing for langextract | Optional `langextract_tools.py` exposing `lx.extract` as a tool (so agents can *use* langextract, not only be visualized by it) | **Medium** | `praisonaiagents/tools/langextract_tools.py` |
| **Python export** | No public symbols | Export `LangextractSink`, `LangextractSinkConfig` via lazy `__getattr__` in `praisonai/observability/__init__.py` | **High** | Already lazy-loader exists, just add two entries |
| **Packaging** | `langextract` not in any extras | Add `[langextract]` extra to `src/praisonai/pyproject.toml` + optional-dependencies table | **High** | `pyproject.toml` |
| **Docs** | No page | Add `PraisonAIDocs/docs/observability/langextract.mdx` with quick-start, API, CLI, examples | **Medium** | Docs repo |
| **Examples** | Only user's unofficial `~/test/langextract/app.py` | Add `examples/python/observability/langextract_basic.py` and `langextract_with_tools.py` | **Medium** | `examples/python/observability/` |
| **Tests** | None | Unit tests for the event→Extraction mapper + sink + CLI wiring; a real-agentic smoke test that runs an Agent and asserts HTML + jsonl are produced | **High** | `src/praisonai/tests/test_langextract_integration.py` |
| **Perf (import time)** | — | `langextract` is **heavy**; must be a lazy import gated behind the extra | **High** | Implementation discipline |

No existing test, example, doc, or CLI contract is broken by any of the above — this is strictly additive.

---

## Proposed Implementation

### Phase 1: Minimal (MVP) — `LangextractSink` adapter + `--observe langextract`

1. **Add the adapter** — `src/praisonai/praisonai/observability/langextract.py`:

```python
"""
Langextract TraceSinkProtocol Implementation for PraisonAI.

Provides LangextractSink adapter that implements TraceSinkProtocol from the core SDK,
producing self-contained interactive HTML visualizations of agent runs grounded in
the original input text.

Architecture:
- Core SDK (praisonaiagents): Defines TraceSinkProtocol (unchanged)
- Wrapper (praisonai): Implements LangextractSink adapter (this file)
- Pattern: Protocol-driven design per AGENTS.md §4.1 — mirrors LangfuseSink
"""

from __future__ import annotations
import os
import threading
import webbrowser
from dataclasses import dataclass, field
from pathlib import Path
from typing import Any, Dict, List, Optional

from praisonaiagents.trace.protocol import (
    ActionEvent,
    ActionEventType,
    TraceSinkProtocol,
)


@dataclass
class LangextractSinkConfig:
    """Configuration for the langextract trace sink."""
    output_path: str = "praisonai-trace.html"
    jsonl_path: Optional[str] = None           # derived from output_path if None
    document_id: str = "praisonai-run"
    auto_open: bool = False                     # open HTML in browser on close()
    include_llm_content: bool = True            # include response text in attributes
    include_tool_args: bool = True
    enabled: bool = True


class LangextractSink:
    """
    Implements `TraceSinkProtocol` by accumulating ActionEvents and, on `close()`,
    rendering them as a langextract AnnotatedDocument + interactive HTML.

    Grounding strategy:
      - We record the first AGENT_START's `metadata["input"]` as the source text.
      - OUTPUT events produce extractions grounded against the agent's output.
      - TOOL_* events produce ungrounded extractions (char_interval=None) whose
        `attributes` carry the tool name, args summary, duration, status.
      - AGENT_START/END bracket a run; we emit a single parent "agent" extraction
        spanning the whole document for overview.
    """

    __slots__ = ("_config", "_lock", "_events", "_source_text", "_closed")

    def __init__(self, config: Optional[LangextractSinkConfig] = None) -> None:
        self._config = config or LangextractSinkConfig()
        self._lock = threading.Lock()
        self._events: List[ActionEvent] = []
        self._source_text: Optional[str] = None
        self._closed = False

    # ---- TraceSinkProtocol -------------------------------------------------

    def emit(self, event: ActionEvent) -> None:
        if not self._config.enabled or self._closed:
            return
        with self._lock:
            # Capture source text from first AGENT_START
            if (
                self._source_text is None
                and event.event_type == ActionEventType.AGENT_START.value
                and event.metadata
            ):
                self._source_text = event.metadata.get("input") or ""
            self._events.append(event)

    def flush(self) -> None:
        pass  # no-op; HTML is built on close()

    def close(self) -> None:
        if self._closed:
            return
        self._closed = True
        try:
            self._render()
        except Exception as e:
            # Observability must never break the agent
            import logging
            logging.getLogger(__name__).warning("LangextractSink render failed: %s", e)

    # ---- Rendering ---------------------------------------------------------

    def _render(self) -> None:
        # Lazy import — langextract is optional
        import langextract as lx  # type: ignore

        source = self._source_text or ""
        extractions = list(self._events_to_extractions(lx, source))
        doc = lx.data.AnnotatedDocument(
            document_id=self._config.document_id,
            text=source,
            extractions=extractions,
        )

        jsonl = self._config.jsonl_path or (Path(self._config.output_path).with_suffix(".jsonl").as_posix())
        Path(jsonl).parent.mkdir(parents=True, exist_ok=True)
        lx.io.save_annotated_documents([doc], output_name=os.path.basename(jsonl), output_dir=os.path.dirname(jsonl) or ".")

        html = lx.visualize(jsonl)
        html_text = html.data if hasattr(html, "data") else html
        Path(self._config.output_path).write_text(html_text, encoding="utf-8")

        if self._config.auto_open:
            webbrowser.open(f"file://{Path(self._config.output_path).resolve()}")

    def _events_to_extractions(self, lx, source: str):
        """Pure mapper: ActionEvent list -> lx.data.Extraction generator."""
        for ev in self._events:
            et = ev.event_type
            attrs: Dict[str, Any] = {
                "agent_name": ev.agent_name,
                "duration_ms": ev.duration_ms,
                "status": ev.status,
            }
            if et == ActionEventType.AGENT_START.value:
                yield lx.data.Extraction(
                    extraction_class="agent_run",
                    extraction_text=(source[:200] if source else ev.agent_name or "agent"),
                    attributes={**attrs, "kind": "start"},
                )
            elif et == ActionEventType.TOOL_START.value:
                yield lx.data.Extraction(
                    extraction_class="tool_call",
                    extraction_text=ev.tool_name or "tool",
                    attributes={
                        **attrs,
                        "tool_name": ev.tool_name,
                        "tool_args": ev.tool_args if self._config.include_tool_args else None,
                    },
                )
            elif et == ActionEventType.TOOL_END.value:
                yield lx.data.Extraction(
                    extraction_class="tool_result",
                    extraction_text=ev.tool_result_summary or "(empty)",
                    attributes={**attrs, "tool_name": ev.tool_name},
                )
            elif et == ActionEventType.OUTPUT.value:
                yield lx.data.Extraction(
                    extraction_class="final_output",
                    extraction_text=(ev.metadata or {}).get("content", "")[:1000],
                    attributes=attrs,
                )
            elif et == ActionEventType.ERROR.value:
                yield lx.data.Extraction(
                    extraction_class="error",
                    extraction_text=ev.error_message or "error",
                    attributes=attrs,
                )
            # AGENT_END is summary-only — skip for now; could produce run stats extraction
```

2. **Extend the CLI `--observe` flag** — `src/praisonai/praisonai/cli/app.py`:

```python
# Replace cli/app.py:150-153
if observe:
    if observe == "langfuse":
        _setup_langfuse_observability(verbose=verbose)
    elif observe == "langextract":
        _setup_langextract_observability(verbose=verbose)
    else:
        raise typer.BadParameter(
            f"Unsupported observe provider: {observe}. "
            "Choose one of: langfuse, langextract."
        )
```

`_setup_langextract_observability()` mirrors `_setup_langfuse_observability()`: builds a `LangextractSinkConfig` from env vars (`PRAISONAI_LANGEXTRACT_OUTPUT`, `PRAISONAI_LANGEXTRACT_AUTO_OPEN`), constructs `LangextractSink`, registers it on the global `ContextTraceEmitter`.

3. **Register lazy exports** — append to `src/praisonai/praisonai/observability/__init__.py`:

```python
# In __getattr__
elif name == "LangextractSink":
    from .langextract import LangextractSink
    return LangextractSink
elif name == "LangextractSinkConfig":
    from .langextract import LangextractSinkConfig
    return LangextractSinkConfig
```

4. **Packaging extra** — `src/praisonai/pyproject.toml`:

```toml
[project.optional-dependencies]
langextract = ["langextract>=1.0.0"]
```

### Phase 2: Production — CLI sub-app + tool + session rendering

5. **New CLI sub-app** — `src/praisonai/praisonai/cli/commands/langextract.py`:

```python
import typer
from pathlib import Path
from typing import Optional

app = typer.Typer(name="langextract", help="Render PraisonAI traces with langextract.")


@app.command(name="view")
def view(
    jsonl_path: Path = typer.Argument(..., help="Path to annotated-documents JSONL"),
    output_html: Path = typer.Option("trace.html", "--output", "-o"),
    no_open: bool = typer.Option(False, "--no-open"),
):
    """Render an existing annotated-documents JSONL to an interactive HTML."""
    import langextract as lx
    import webbrowser

    html = lx.visualize(str(jsonl_path))
    html_text = html.data if hasattr(html, "data") else html
    output_html.write_text(html_text, encoding="utf-8")
    typer.echo(f"✅ Wrote {output_html}")
    if not no_open:
        webbrowser.open(f"file://{output_html.resolve()}")


@app.command(name="render")
def render(
    yaml_path: Path = typer.Argument(..., help="PraisonAI YAML workflow"),
    output_html: Path = typer.Option("workflow.html", "--output", "-o"),
    no_open: bool = typer.Option(False, "--no-open"),
    api_url: Optional[str] = typer.Option(None, "--api-url"),
):
    """Run a workflow end-to-end with LangextractSink attached, then open the HTML."""
    from praisonai.observability import LangextractSink, LangextractSinkConfig
    from praisonai import PraisonAI

    sink = LangextractSink(
        config=LangextractSinkConfig(output_path=str(output_html), auto_open=not no_open)
    )
    # attach sink to the global trace emitter for the duration of the run
    from praisonaiagents.trace import get_context_emitter
    get_context_emitter().add_sink(sink)
    try:
        result = PraisonAI(agent_file=str(yaml_path)).main()
        typer.echo(result)
    finally:
        sink.close()
    typer.echo(f"✅ Trace rendered: {output_html}")
```

6. **Built-in tool (Phase 2b)** — `src/praisonai-agents/praisonaiagents/tools/langextract_tools.py`:

```python
"""Thin wrapper exposing lx.extract as a callable tool."""
from typing import Any, Dict, List, Optional


def langextract_extract(
    text: str,
    prompt_description: str,
    examples: List[Dict[str, Any]],
    model_id: str = "gemini-2.5-flash",
    extraction_passes: int = 1,
    max_workers: int = 10,
) -> Dict[str, Any]:
    """Run langextract over `text` using the given prompt and few-shot examples.
    Returns the serialized extraction result dict."""
    try:
        import langextract as lx  # lazy
    except ImportError as e:
        raise ImportError("pip install 'praisonai[langextract]'") from e

    # Convert dict examples -> lx.data.ExampleData (DRY helper elsewhere)
    ex_objs = [_dict_to_example(lx, e) for e in examples]
    result = lx.extract(
        text_or_documents=text,
        prompt_description=prompt_description,
        examples=ex_objs,
        model_id=model_id,
        extraction_passes=extraction_passes,
        max_workers=max_workers,
    )
    return {
        "text": result.text,
        "extractions": [
            {
                "extraction_class": e.extraction_class,
                "extraction_text": e.extraction_text,
                "attributes": e.attributes or {},
                "char_interval": (
                    {"start": e.char_interval.start_pos, "end": e.char_interval.end_pos}
                    if e.char_interval else None
                ),
            }
            for e in result.extractions
        ],
    }


def _dict_to_example(lx, d: Dict[str, Any]):
    return lx.data.ExampleData(
        text=d["text"],
        extractions=[
            lx.data.Extraction(
                extraction_class=e["extraction_class"],
                extraction_text=e["extraction_text"],
                attributes=e.get("attributes"),
            )
            for e in d.get("extractions", [])
        ],
    )
```

Agents can then use this tool directly:

```python
from praisonaiagents import Agent
from praisonaiagents.tools.langextract_tools import langextract_extract

agent = Agent(
    name="extractor",
    instructions="Extract entities from provided text",
    tools=[langextract_extract],
)
```

---

## Files to Create / Modify

### New files

| File | Purpose |
|------|---------|
| `src/praisonai/praisonai/observability/langextract.py` | `LangextractSink` + `LangextractSinkConfig` adapter |
| `src/praisonai/praisonai/cli/commands/langextract.py` | `praisonai langextract view|render` subcommands |
| `src/praisonai-agents/praisonaiagents/tools/langextract_tools.py` | Optional built-in tool wrapping `lx.extract` |
| `src/praisonai/tests/test_langextract_integration.py` | Unit + integration + smoke tests (mirrors `test_n8n_integration.py`) |
| `src/praisonai/tests/fixtures/sample_trace.jsonl` | Deterministic fixture for mapper tests |
| `examples/python/observability/langextract_basic.py` | "Run an agent → open HTML" minimal example |
| `examples/python/observability/langextract_with_tools.py` | Multi-tool agent rendered as annotated workflow |
| `PraisonAIDocs/docs/observability/langextract.mdx` | Quick-start, API, CLI, examples, screenshot |

### Modified files

| File | Change |
|------|--------|
| `src/praisonai/praisonai/observability/__init__.py` | Add `LangextractSink`/`LangextractSinkConfig` to lazy `__getattr__`; keep `__all__` symbolic for TYPE_CHECKING |
| `src/praisonai/praisonai/cli/app.py` | Replace the hard-coded `observe != "langfuse"` check with a dispatch that also accepts `langextract`; add `_setup_langextract_observability()` |
| `src/praisonai/praisonai/cli/commands/__init__.py` (or wherever subapps register) | Register the new `langextract` Typer sub-app |
| `src/praisonai/pyproject.toml` | Add `langextract = ["langextract>=1.0.0"]` under `[project.optional-dependencies]` |
| `src/praisonai-agents/praisonaiagents/tools/__init__.py` | Lazy export `langextract_extract` |
| `PraisonAIDocs/docs/observability/index.mdx` (if exists) / sidebar config | Add sidebar entry for the new page |

Zero files need to be **removed**; zero public APIs change.

---

## Technical Considerations

### Dependencies

- `langextract>=1.0.0` (Apache-2.0) — pulls `google-genai`, `openai`, `httpx`, `beautifulsoup4`, `pydantic`. ~50 MB install.
- **Must be an optional extra** (`pip install praisonai[langextract]`) — not a hard dependency.

### Performance impact

- **Zero** when `--observe langextract` is not set: `LangextractSink` is never imported; lazy `__getattr__` in `praisonai/observability/__init__.py` keeps import time flat.
- With the sink active: a bounded in-memory list of `ActionEvent`s (typically 10s–100s per run). Rendering happens exactly once in `close()` — amortized across the whole run.
- `langextract` itself is only imported inside `_render()` — adding the sink alone (without triggering `close()`) does not pull the heavy dep.
- Target: no measurable impact on `import praisonaiagents` (<200ms invariant per AGENTS.md §4.2).

### Safety / approval

- Read-only observability — no network calls in the default path (`lx.visualize` is pure local templating). No user data leaves the machine.
- If users also use `langextract_extract` **as a tool**, that calls an LLM — must honor the agent's existing approval / policy hooks automatically (no new hook needed; the tool just runs like any other tool).
- HTML output is written to a user-chosen path; nothing is executed, only rendered.

### Multi-agent safety

- `LangextractSink` instances are **per-run**, not global. `cli/app.py` wires one sink per CLI invocation.
- Internally `_events` is guarded by `threading.Lock` (same pattern as `LangfuseSink`).
- For `AgentTeam` / concurrent agents: each agent's `ContextTraceEmitter` gets the same sink, which accumulates all events and groups them by `agent_name` in the rendered HTML.

### Backward compatibility

- Strictly additive: no existing behavior changes. `--observe langfuse` continues to work identically.
- `TraceSinkProtocol` is not modified.
- New optional extra — default installs are unchanged.

### Import-time discipline (MUST)

- `src/praisonai/praisonai/observability/langextract.py` must not `import langextract` at module scope. Use `import langextract as lx` inside `_render()` only.
- Likewise `langextract_tools.py` imports inside the function body.

---

## Acceptance Criteria

- [ ] `pip install 'praisonai[langextract]'` works and pulls `langextract`.
- [ ] `from praisonai.observability import LangextractSink, LangextractSinkConfig` succeeds without triggering a `langextract` import.
- [ ] `praisonai agents.yaml --observe langextract` runs the workflow and writes both `praisonai-trace.jsonl` and `praisonai-trace.html`.
- [ ] `praisonai langextract view trace.jsonl` writes an HTML file and (by default) opens it in the browser.
- [ ] `praisonai langextract render agents.yaml -o run.html` runs the workflow and opens `run.html`.
- [ ] `praisonai --observe langextract agents.yaml` still works when `langextract` is NOT installed — fails with a clear message pointing to the extra (not an unhandled `ImportError`).
- [ ] Python API: `with LangextractSink(...)` context manager works (implement `__enter__`/`__exit__` in addition to `close()`).
- [ ] Unit tests cover the event→Extraction mapper with ≥90% line coverage; tests are deterministic (no network, no LLM calls).
- [ ] Real agentic smoke test: runs a simple `Agent` with one tool, asserts jsonl is valid + HTML contains the agent name, tool name, and input text.
- [ ] Multi-agent test: two agents running sequentially produce one HTML that shows both agents distinctly.
- [ ] Cold-import benchmark: `python -c "import praisonaiagents"` is within ±2% of main (no regression).
- [ ] Cold-import benchmark: `python -c "from praisonai.observability import LangextractSink"` does **not** import `langextract`.
- [ ] Docs page `PraisonAIDocs/docs/observability/langextract.mdx` published with: prerequisites, one-liner quick-start, screenshot of HTML viewer, CLI reference, Python API reference.
- [ ] Added to the `--observe` help text: `Enable observability (langfuse, langextract)`.

---

## Implementation Notes

### Key files to read first

1. `src/praisonai/praisonai/observability/langfuse.py` (306 lines) — mirror this pattern exactly for the new sink.
2. `src/praisonai-agents/praisonaiagents/trace/protocol.py` — `ActionEvent` and `TraceSinkProtocol` (unchanged).
3. `src/praisonai-agents/praisonaiagents/trace/context_events.py` — how events are emitted from `Agent.chat`; note `LLM_RESPONSE` and `CONTEXT_SNAPSHOT` carry the richest data.
4. `src/praisonai/praisonai/cli/app.py:115-180` — where `--observe` is parsed and dispatched.
5. `examples/python/observability/` — existing examples for shape of a good example file.
6. Langextract upstream docs: https://github.com/google/langextract#3-visualize-the-results

### Critical integration points

1. **Event capture**: hook into `get_context_emitter()` (already global, already wired into `Agent.chat`). Use the same registration pattern used by `LangfuseSink` — no SDK changes needed.
2. **Source text grounding**: the source text comes from `AGENT_START.metadata["input"]`. If missing (e.g., programmatic `agent.chat()` without input metadata), fall back to the concatenation of all prompts. Document this clearly.
3. **Close timing**: `close()` must run even when the agent errors out. Use a `try/finally` in `_setup_langextract_observability` so `close()` fires at interpreter shutdown (via `atexit.register`) and on exception.
4. **`lx.visualize` return shape**: may be a plain string **or** an `IPython.display.HTML` object (`.data` attribute). Handle both branches (see the LangfuseSink `.get()` / `hasattr` pattern).
5. **Char-grounding for ungrounded events**: tool calls and errors are not in the source text. Leave `char_interval=None`; langextract explicitly supports this and will render them in the side panel, not inline.

### Testing commands

```bash
# Unit + integration
pytest src/praisonai/tests/test_langextract_integration.py -v

# Lazy-import check — must NOT trigger langextract import
python - <<'PY'
import sys
from praisonai.observability import LangextractSink
assert 'langextract' not in sys.modules, "LangextractSink import pulled heavy dep!"
print("OK: LangextractSink is lazy")
PY

# Real agentic smoke test
cat > /tmp/agents.yaml <<'YAML'
name: Demo
agents:
  writer:
    role: Writer
    goal: Write a haiku
    llm: gpt-4o-mini
YAML
praisonai /tmp/agents.yaml --observe langextract --quiet
ls -la praisonai-trace.html praisonai-trace.jsonl
python -c "import pathlib; html=pathlib.Path('praisonai-trace.html').read_text(); assert '<html' in html.lower() and 'writer' in html.lower(), 'HTML missing agent'; print('OK')"

# CLI sub-app
praisonai langextract view praisonai-trace.jsonl --no-open -o /tmp/out.html
test -s /tmp/out.html && echo "OK: view produced HTML"
```

---

## References

- **Google langextract**: https://github.com/google/langextract — Apache-2.0
- **langextract quick-start**: https://github.com/google/langextract#quick-start
- **PraisonAI AGENTS.md**: `/Users/praison/praisonai-package/src/praisonai-agents/AGENTS.md` — §4.1 protocol-driven core, §4.2 lazy imports, §4.9 naming
- **Existing reference adapter**: `src/praisonai/praisonai/observability/langfuse.py`
- **Sibling integration (pattern)**: PraisonAI #1404-#1407 (n8n integration) — same design applied to a different external UI layer
- **User's proof-of-concept**: `~/test/langextract/app.py` — demonstrates `lx.extract` wrapped as a `BaseTool`; informed the Phase 2b tool design


Component	Path	Role
`ActionEvent` / `ActionEventType`	`src/praisonai-agents/praisonaiagents/trace/protocol.py`	Canonical event schema (AGENT_START/END, TOOL_START/END, ERROR, OUTPUT)
`TraceSinkProtocol`	`src/praisonai-agents/praisonaiagents/trace/protocol.py`	Pluggable sink interface (`emit`, `flush`, `close`)
`ContextTraceEmitter` + `get_context_emitter()`	`src/praisonai-agents/praisonaiagents/trace/context_events.py`	Already wired into `Agent.chat`/`Agent.start` — emits `agent_start`, `agent_end`, `llm_response`, `tool_start`, `tool_end`
`LangfuseSink`	`src/praisonai/praisonai/observability/langfuse.py`	Reference implementation of a `TraceSinkProtocol` adapter — we copy this pattern
`--observe` flag	`src/praisonai/praisonai/cli/app.py:124-153`	Already parses provider name; currently only accepts `langfuse`. Extend with `langextract`.
`_setup_langfuse_observability()`	`src/praisonai/praisonai/cli/app.py:153`	Pattern for how a sink is wired up at CLI entry — copy for `_setup_langextract_observability`

File	Purpose	Why it matters here
`src/praisonai-agents/praisonaiagents/trace/protocol.py`	`ActionEvent`, `TraceSinkProtocol`, `NoOpSink`, `ListSink`	Our new adapter implements `TraceSinkProtocol`. Zero changes here.
`src/praisonai-agents/praisonaiagents/trace/context_events.py`	`ContextTraceEmitter`, `ContextEvent` (with richer LLM token / content fields)	Source of events we map to `lx.data.Extraction`. Zero changes here.
`src/praisonai/praisonai/observability/langfuse.py` (306 lines)	Reference `TraceSinkProtocol` adapter	Template for the new `LangextractSink`.
`src/praisonai/praisonai/observability/__init__.py`	Lazy-loaded exports	Register `LangextractSink` / `LangextractSinkConfig` here.
`src/praisonai/praisonai/cli/app.py:124-153`	`--observe` typer option + `_setup_*_observability`	Extend to accept `langextract` and wire up the sink.
`src/praisonai/praisonai/cli/commands/`	Per-feature CLI subapps	New `langextract.py` sub-app with `view`, `render`, `open` commands.
`src/praisonai-agents/praisonaiagents/tools/`	Built-in tools	New optional `langextract_tools.py` wrapping `lx.extract` as a callable tool for agents.

Area	Current State	Gap	Severity	Placement
Observability adapter	Only `LangfuseSink` exists	No `LangextractSink` implementing `TraceSinkProtocol`	Critical (blocks feature)	`praisonai/observability/langextract.py` (wrapper)
Event→Extraction mapping	No mapper; `ContextEvent` has rich fields already	Need a pure function that grounds events against input text and produces `lx.data.Extraction` list	Critical	`praisonai/observability/langextract.py`
CLI observe provider	`--observe langfuse` only; hard-coded check at `cli/app.py:151`	Need `langextract` accepted; need `_setup_langextract_observability()`	High	`praisonai/cli/app.py`
CLI sub-command	No `langextract` command	Need `praisonai langextract view`, `render`, `open` for ad-hoc + session rendering	High	`praisonai/cli/commands/langextract.py`
Built-in tool	`tavily_tools.py`, `agentql` examples — nothing for langextract	Optional `langextract_tools.py` exposing `lx.extract` as a tool (so agents can use langextract, not only be visualized by it)	Medium	`praisonaiagents/tools/langextract_tools.py`
Python export	No public symbols	Export `LangextractSink`, `LangextractSinkConfig` via lazy `__getattr__` in `praisonai/observability/__init__.py`	High	Already lazy-loader exists, just add two entries
Packaging	`langextract` not in any extras	Add `[langextract]` extra to `src/praisonai/pyproject.toml` + optional-dependencies table	High	`pyproject.toml`
Docs	No page	Add `PraisonAIDocs/docs/observability/langextract.mdx` with quick-start, API, CLI, examples	Medium	Docs repo
Examples	Only user's unofficial `~/test/langextract/app.py`	Add `examples/python/observability/langextract_basic.py` and `langextract_with_tools.py`	Medium	`examples/python/observability/`
Tests	None	Unit tests for the event→Extraction mapper + sink + CLI wiring; a real-agentic smoke test that runs an Agent and asserts HTML + jsonl are produced	High	`src/praisonai/tests/test_langextract_integration.py`
Perf (import time)	—	`langextract` is heavy; must be a lazy import gated behind the extra	High	Implementation discipline

File	Purpose
`src/praisonai/praisonai/observability/langextract.py`	`LangextractSink` + `LangextractSinkConfig` adapter
`src/praisonai/praisonai/cli/commands/langextract.py`	`praisonai langextract view
`src/praisonai-agents/praisonaiagents/tools/langextract_tools.py`	Optional built-in tool wrapping `lx.extract`
`src/praisonai/tests/test_langextract_integration.py`	Unit + integration + smoke tests (mirrors `test_n8n_integration.py`)
`src/praisonai/tests/fixtures/sample_trace.jsonl`	Deterministic fixture for mapper tests
`examples/python/observability/langextract_basic.py`	"Run an agent → open HTML" minimal example
`examples/python/observability/langextract_with_tools.py`	Multi-tool agent rendered as annotated workflow
`PraisonAIDocs/docs/observability/langextract.mdx`	Quick-start, API, CLI, examples, screenshot

File	Change
`src/praisonai/praisonai/observability/__init__.py`	Add `LangextractSink`/`LangextractSinkConfig` to lazy `__getattr__`; keep `__all__` symbolic for TYPE_CHECKING
`src/praisonai/praisonai/cli/app.py`	Replace the hard-coded `observe != "langfuse"` check with a dispatch that also accepts `langextract`; add `_setup_langextract_observability()`
`src/praisonai/praisonai/cli/commands/__init__.py` (or wherever subapps register)	Register the new `langextract` Typer sub-app
`src/praisonai/pyproject.toml`	Add `langextract = ["langextract>=1.0.0"]` under `[project.optional-dependencies]`
`src/praisonai-agents/praisonaiagents/tools/__init__.py`	Lazy export `langextract_extract`
`PraisonAIDocs/docs/observability/index.mdx` (if exists) / sidebar config	Add sidebar entry for the new page

Uh oh!

Integration: Add langextract as a local visual trace layer (observability HTML viewer + CLI) #1412

Description

Overview

Background

What is langextract?

Why this is valuable

Current ecosystem state

Architecture Analysis

Current Implementation

Key File Locations

Data-flow (proposed)

Gap Analysis

Proposed Implementation

Phase 1: Minimal (MVP) — LangextractSink adapter + --observe langextract

Phase 2: Production — CLI sub-app + tool + session rendering

Files to Create / Modify

New files

Modified files

Technical Considerations

Dependencies

Performance impact

Safety / approval

Multi-agent safety

Backward compatibility

Import-time discipline (MUST)

Acceptance Criteria

Implementation Notes

Key files to read first

Critical integration points

Testing commands

References

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions

Phase 1: Minimal (MVP) — `LangextractSink` adapter + `--observe langextract`