[Documentation] Add Redis State Management & FLUSHDB/FLUSHALL Warning to Troubleshooting Guide

## Description

The official OpenCTI documentation does not currently warn administrators about the destructive impact of running `FLUSHDB` or `FLUSHALL` on a live OpenCTI Redis instance, nor does it provide recovery procedures when this occurs.

This is a recurring operational scenario — when Redis memory grows unexpectedly (often due to connector queue backlogs), administrators may attempt to resolve it by flushing Redis. This destroys critical platform state and creates a cascading failure that is difficult to diagnose without understanding OpenCTI's internal architecture.

## Problem

OpenCTI uses Redis for:
- **Work tracking** — each connector ingestion job is tracked via work IDs stored in Redis
- **Distributed locks** — preventing duplicate entity creation during concurrent ingestion
- **Stream coordination** — live stream and TAXII data sharing state
- **Caching** — API response caching and session data

Running `FLUSHDB` or `FLUSHALL` destroys all of this state. However, **RabbitMQ queues survive** (they're in a separate system), creating orphaned bundles that reference work IDs that no longer exist.

### The failure chain

1. Redis is flushed → all work-tracking state is destroyed
2. RabbitMQ still has queued bundles referencing now-dead work IDs
3. Workers dequeue bundles → attempt to update work status → Redis returns "work doesn't exist"
4. Platform throws `WORK_NOT_ALIVE` errors (`Work is no longer alive, no request can be done within the context of this work`)
5. Workers cannot complete bundles → retry or stall
6. Result: **CPU burn on ingest nodes with zero Elasticsearch writes**, massive queue backlog that never drains

### Symptoms

- `WORK_NOT_ALIVE` errors in platform/worker logs
- Queue backlog growing or not draining despite healthy infrastructure
- Elasticsearch idle (zero write rejections, zero active merges) despite large queue
- Ingest node CPU imbalance — some pods hot (retry loops), others idle
- Works stuck "In Progress" with zero completed operations

## Requested Documentation

A section in the [Troubleshooting](https://docs.opencti.io/latest/deployment/troubleshooting/) page (or a dedicated page) covering:

### 1. Warning: Never run `FLUSHDB`/`FLUSHALL` on a live OpenCTI Redis
- What state is stored in Redis and why it's critical
- What happens when it's destroyed (the failure chain above)

### 2. Recovery Procedure
When `FLUSHDB` has already been run:
1. Purge stale connector queues in RabbitMQ (bundles referencing dead work IDs)
2. Reset affected connector state in OpenCTI
3. Restart ingest/worker pods
4. Restart platform pods
5. Restart connectors (they will create new work IDs)
6. Monitor for `WORK_NOT_ALIVE` errors clearing

### 3. Safe Alternatives When Redis Memory Is High
- Set `maxmemory` with `noeviction` policy to prevent unbounded growth
- Use `redis-cli --bigkeys` to identify what's consuming memory
- Use stream trimming for event stream growth
- Purge specific RabbitMQ queues (not Redis) for connector backlogs
- Surgical key deletion for specific stuck locks

### 4. Recommended Redis Configuration
- `maxmemory` — should be set explicitly (not rely on container OOM)
- `maxmemory-policy` — must be `noeviction`
- Monitoring thresholds for memory, blocked clients, slowlog

## Context

This documentation request is based on a real production incident at a customer site with 161M+ documents, 6 ingest nodes, and 12 workers. The `FLUSHDB` was run to address Redis memory pressure, which caused a 2+ week outage of ingestion processing. The root cause was non-obvious — all infrastructure components (disk I/O, Elasticsearch, Redis itself) appeared healthy, but the write pipeline was completely stalled due to orphaned work IDs.

The existing FAQ entry in internal documentation ("Is it safe to flush the redis cache?") provides a brief warning but lacks the failure chain explanation, symptoms, recovery procedure, and safe alternatives needed for operational use.

## Labels

Documentation, Troubleshooting

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Documentation] Add Redis State Management & FLUSHDB/FLUSHALL Warning to Troubleshooting Guide #14881

Description

Problem

The failure chain

Symptoms

Requested Documentation

1. Warning: Never run `FLUSHDB`/`FLUSHALL` on a live OpenCTI Redis

2. Recovery Procedure

3. Safe Alternatives When Redis Memory Is High

4. Recommended Redis Configuration

Context

Labels

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Documentation] Add Redis State Management & FLUSHDB/FLUSHALL Warning to Troubleshooting Guide #14881

Description

Description

Problem

The failure chain

Symptoms

Requested Documentation

1. Warning: Never run FLUSHDB/FLUSHALL on a live OpenCTI Redis

2. Recovery Procedure

3. Safe Alternatives When Redis Memory Is High

4. Recommended Redis Configuration

Context

Labels

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

1. Warning: Never run `FLUSHDB`/`FLUSHALL` on a live OpenCTI Redis