On-device PII anonymization module for high-privacy AI workflows. Detects and replaces Personally Identifiable Information (PII) with semantically valuable placeholder tags while maintaining an encrypted mapping for rehydration.
npm install rehydraWorks in Node.js, Bun, and browsers
- Structured PII Detection: Regex-based detection for emails, phones, IBANs, credit cards, IPs, URLs
- Soft PII Detection: ONNX-powered NER model for names, organizations, locations (auto-downloads on first use if enabled)
- Semantic Enrichment: AI/MT-friendly tags with gender/location attributes
- Secure PII Mapping: AES-256-GCM encrypted storage of original PII values
- Cross-Platform: Works identically in Node.js, Bun, and browsers
- Configurable Policies: Customizable detection rules, thresholds, and allowlists
- Validation & Leak Scanning: Built-in validation and optional leak detection
npm install rehydraFor bun support see Bun Support
npm install rehydra onnxruntime-web<script type="module">
// Import directly from your dist folder or CDN
import { createAnonymizer } from './node_modules/rehydra/dist/index.js';
// onnxruntime-web is automatically loaded from CDN when needed
</script>The full workflow for privacy-preserving LLM workflows:
import {
createAnonymizer,
decryptPIIMap,
rehydrate,
InMemoryKeyProvider
} from 'rehydra';
// 1. Create a key provider (required to decrypt later)
const keyProvider = new InMemoryKeyProvider();
// 2. Create anonymizer with key provider
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' },
semantic: { enabled: true },
keyProvider: keyProvider
});
await anonymizer.initialize();
// 3. Anonymize before translation
const original = 'Hello John Smith from Acme Corp in Berlin!';
const result = await anonymizer.anonymize(original);
console.log(result.anonymizedText);
// "Hello <PII type="PERSON" gender="male" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>!"
// 4. Translate (or do other AI workloads that preserve placeholders)
const translated = await yourAIWorkflow(result.anonymizedText, { from: 'en', to: 'de' });
// "Hallo <PII type="PERSON" gender="male" id="1"/> von <PII type="ORG" id="2"/> in <PII type="LOCATION" scope="city" id="3"/>!"
// 5. Decrypt the PII map using the same key
const encryptionKey = await keyProvider.getKey();
const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
// 6. Rehydrate - replace placeholders with original values
const rehydrated = rehydrate(translated, piiMap);
// "Hallo John Smith von Acme Corp in Berlin!"
// 7. Clean up
await anonymizer.dispose();For structured PII like emails, phones, IBANs, credit cards:
import { anonymizeRegexOnly } from 'rehydra';
const result = await anonymizeRegexOnly(
'Contact [email protected] or call +49 30 123456. IBAN: DE89370400440532013000'
);
console.log(result.anonymizedText);
// "Contact <PII type="EMAIL" id="1"/> or call <PII type="PHONE" id="2"/>. IBAN: <PII type="IBAN" id="3"/>"The NER model is automatically downloaded on first use (~280 MB for quantized):
import { createAnonymizer } from 'rehydra';
const anonymizer = createAnonymizer({
ner: {
mode: 'quantized', // or 'standard' for full model (~1.1 GB)
onStatus: (status) => console.log(status),
}
});
await anonymizer.initialize(); // Downloads model if needed
const result = await anonymizer.anonymize(
'Hello John Smith from Acme Corp in Berlin!'
);
console.log(result.anonymizedText);
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="2"/> in <PII type="LOCATION" id="3"/>!"
// Clean up when done
await anonymizer.dispose();Add gender and location scope for better machine translation:
import { createAnonymizer } from 'rehydra';
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' },
semantic: {
enabled: true, // Downloads ~12 MB of semantic data on first use
onStatus: (status) => console.log(status),
}
});
await anonymizer.initialize();
const result = await anonymizer.anonymize(
'Hello Maria Schmidt from Berlin!'
);
console.log(result.anonymizedText);
// "Hello <PII type="PERSON" gender="female" id="1"/> from <PII type="LOCATION" scope="city" id="2"/>!"Full documentation on https://docs.rehydra.ai.
import { createAnonymizer, InMemoryKeyProvider } from 'rehydra';
const anonymizer = createAnonymizer({
// NER configuration
ner: {
mode: 'quantized', // 'standard' | 'quantized' | 'disabled' | 'custom'
backend: 'local', // 'local' (default) | 'inference-server'
autoDownload: true, // Auto-download model if not present
onStatus: (status) => {}, // Status messages callback
onDownloadProgress: (progress) => {
console.log(`${progress.file}: ${progress.percent}%`);
},
// For 'inference-server' backend:
inferenceServerUrl: 'http://localhost:8080',
// For 'custom' mode only:
modelPath: './my-model.onnx',
vocabPath: './vocab.txt',
},
// Semantic enrichment (adds gender/scope attributes)
semantic: {
enabled: true, // Enable MT-friendly attributes
autoDownload: true, // Auto-download semantic data (~12 MB)
onStatus: (status) => {},
onDownloadProgress: (progress) => {},
},
// Encryption key provider
keyProvider: new InMemoryKeyProvider(),
// Custom policy (optional)
defaultPolicy: { /* see Policy section */ },
});
await anonymizer.initialize();| Mode | Description | Size | Auto-Download |
|---|---|---|---|
'disabled' |
No NER, regex only | 0 | N/A |
'quantized' |
Smaller model, ~95% accuracy | ~280 MB | Yes |
'standard' |
Full model, best accuracy | ~1.1 GB | Yes |
'custom' |
Your own ONNX model | Varies | No |
Fine-tune ONNX Runtime performance with session options:
const anonymizer = createAnonymizer({
ner: {
mode: 'quantized',
sessionOptions: {
// Graph optimization level: 'disabled' | 'basic' | 'extended' | 'all'
graphOptimizationLevel: 'all', // default
// Threading (Node.js only)
intraOpNumThreads: 4, // threads within operators
interOpNumThreads: 1, // threads between operators
// Memory optimization
enableCpuMemArena: true,
enableMemPattern: true,
}
}
});By default, Rehydra uses:
- Node.js: CPU (fastest for quantized models)
- Browsers: CPU (WASM)
For NVIDIA GPU acceleration with CUDA/TensorRT, use the inference server backend (see GPU Acceleration).
For high-throughput production deployments, Rehydra supports GPU-accelerated inference via a dedicated inference server. This is useful for large documents.
const anonymizer = createAnonymizer({
ner: {
backend: 'inference-server',
inferenceServerUrl: 'http://localhost:8080',
}
});
await anonymizer.initialize();Performance Comparison:
| Text Size | CPU (local) | GPU (server) |
|---|---|---|
| Short (~40 chars) | 4.3ms | 62ms |
| Medium (~500 chars) | 26ms | 73ms |
| Long (~2000 chars) | 93ms | 117ms |
| Entity-dense | 13ms | 68ms |
Local CPU faster for most use cases due to network overhead. GPU is beneficial for batch processing and large documents.
Backend Options:
| Backend | Description | Latency (2K chars) |
|---|---|---|
'local' |
CPU inference (default) | ~4,300ms |
'inference-server' |
GPU server (enterprise) | ~117ms |
Creates a reusable anonymizer instance:
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' }
});
await anonymizer.initialize();
const result = await anonymizer.anonymize('text');
await anonymizer.dispose();One-off anonymization (regex-only by default):
import { anonymize } from 'rehydra';
const result = await anonymize('Contact [email protected]');One-off anonymization with NER:
import { anonymizeWithNER } from 'rehydra';
const result = await anonymizeWithNER(
'Hello John Smith',
{ mode: 'quantized' }
);Fast regex-only anonymization:
import { anonymizeRegexOnly } from 'rehydra';
const result = await anonymizeRegexOnly('Card: 4111111111111111');Decrypts the PII map for rehydration:
import { decryptPIIMap } from 'rehydra';
const piiMap = await decryptPIIMap(result.piiMap, encryptionKey);
// Returns Map<string, string> where key is "PERSON:1" and value is "John Smith"Replaces placeholders with original values:
import { rehydrate } from 'rehydra';
const original = rehydrate(translatedText, piiMap);interface AnonymizationResult {
// Text with PII replaced by placeholder tags
anonymizedText: string;
// Detected entities (without original text for safety)
entities: Array<{
type: PIIType;
id: number;
start: number;
end: number;
confidence: number;
source: 'REGEX' | 'NER';
}>;
// Encrypted PII mapping (for later rehydration)
piiMap: {
ciphertext: string; // Base64
iv: string; // Base64
authTag: string; // Base64
};
// Processing statistics
stats: {
countsByType: Record<PIIType, number>;
totalEntities: number;
processingTimeMs: number;
modelVersion: string;
leakScanPassed?: boolean;
};
}| Type | Description | Detection | Semantic Attributes |
|---|---|---|---|
EMAIL |
Email addresses | Regex | - |
PHONE |
Phone numbers (international) | Regex | - |
IBAN |
International Bank Account Numbers | Regex + Checksum | - |
BIC_SWIFT |
Bank Identifier Codes | Regex | - |
CREDIT_CARD |
Credit card numbers | Regex + Luhn | - |
IP_ADDRESS |
IPv4 and IPv6 addresses | Regex | - |
URL |
Web URLs | Regex | - |
CASE_ID |
Case/ticket numbers | Regex (configurable) | - |
CUSTOMER_ID |
Customer identifiers | Regex (configurable) | - |
PERSON |
Person names | NER | gender (male/female/neutral) |
ORG |
Organization names | NER | - |
LOCATION |
Location/place names | NER | scope (city/country/region) |
ADDRESS |
Physical addresses | NER | - |
DATE_OF_BIRTH |
Dates of birth | NER | - |
import { createAnonymizer, PIIType } from 'rehydra';
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' },
defaultPolicy: {
// Which PII types to detect
enabledTypes: new Set([PIIType.EMAIL, PIIType.PHONE, PIIType.PERSON]),
// Confidence thresholds per type (0.0 - 1.0)
confidenceThresholds: new Map([
[PIIType.PERSON, 0.8],
[PIIType.EMAIL, 0.5],
]),
// Terms to never treat as PII
allowlistTerms: new Set(['Customer Service', 'Help Desk']),
// Enable semantic enrichment (gender/scope)
enableSemanticMasking: true,
// Enable leak scanning on output
enableLeakScan: true,
},
});Add domain-specific patterns:
import { createCustomIdRecognizer, PIIType, createAnonymizer } from 'rehydra';
const customRecognizer = createCustomIdRecognizer([
{
name: 'Order Number',
pattern: /\bORD-[A-Z0-9]{8}\b/g,
type: PIIType.CASE_ID,
},
]);
const anonymizer = createAnonymizer();
anonymizer.getRegistry().register(customRecognizer);Models and semantic data are cached locally for offline use.
| Data | macOS | Linux | Windows |
|---|---|---|---|
| NER Models | ~/Library/Caches/rehydra/models/ |
~/.cache/rehydra/models/ |
%LOCALAPPDATA%/rehydra/models/ |
| Semantic Data | ~/Library/Caches/rehydra/semantic-data/ |
~/.cache/rehydra/semantic-data/ |
%LOCALAPPDATA%/rehydra/semantic-data/ |
In browsers, data is stored using:
- IndexedDB: For semantic data and smaller files
- Origin Private File System (OPFS): For large model files (~280 MB)
Data persists across page reloads and browser sessions.
import {
// Model management
isModelDownloaded,
downloadModel,
clearModelCache,
listDownloadedModels,
// Semantic data management
isSemanticDataDownloaded,
downloadSemanticData,
clearSemanticDataCache,
} from 'rehydra';
// Check if model is downloaded
const hasModel = await isModelDownloaded('quantized');
// Manually download model with progress
await downloadModel('quantized', (progress) => {
console.log(`${progress.file}: ${progress.percent}%`);
});
// Check semantic data
const hasSemanticData = await isSemanticDataDownloaded();
// List downloaded models
const models = await listDownloadedModels();
// Clear caches
await clearModelCache('quantized'); // or clearModelCache() for all
await clearSemanticDataCache();The PII map is encrypted using AES-256-GCM via the Web Crypto API (works in both Node.js and browsers).
import {
InMemoryKeyProvider, // For development/testing
ConfigKeyProvider, // For production with pre-configured key
KeyProvider, // Interface for custom implementations
generateKey,
} from 'rehydra';
// Development: In-memory key (generates random key, lost on page refresh)
const devKeyProvider = new InMemoryKeyProvider();
// Production: Pre-configured key
// Generate key: openssl rand -base64 32
const keyBase64 = process.env.PII_ENCRYPTION_KEY; // or read from config
const prodKeyProvider = new ConfigKeyProvider(keyBase64);
// Custom: Implement KeyProvider interface
class SecureKeyProvider implements KeyProvider {
async getKey(): Promise<Uint8Array> {
// Retrieve from secure storage, HSM, keychain, etc.
return await getKeyFromSecureStorage();
}
}- Never log the raw PII map - Always use encrypted storage
- Persist the encryption key securely - Use platform keystores (iOS Keychain, Android Keystore, etc.)
- Rotate keys - Implement key rotation for long-running applications
- Enable leak scanning - Catch any missed PII in output
For applications that need to persist encrypted PII maps (e.g., chat applications where you need to rehydrate later), use sessions with built-in storage providers.
| Provider | Environment | Persistence | Use Case |
|---|---|---|---|
InMemoryPIIStorageProvider |
All | None (lost on restart) | Development, testing |
SQLitePIIStorageProvider |
Node.js, Bun only* | File-based | Server-side applications |
IndexedDBPIIStorageProvider |
Browser | Browser storage | Client-side applications |
*Not available in browser builds. Use IndexedDBPIIStorageProvider for browser applications.
Note: The
piiStorageProvideris only used when you callanonymizer.session(). Callinganonymizer.anonymize()directly does NOT save to storage - the encrypted PII map is only returned in the result for you to handle manually.
// ❌ Storage NOT used - you must handle the PII map yourself
const result = await anonymizer.anonymize('Hello John!');
// result.piiMap is returned but NOT saved to storage
// ✅ Storage IS used - auto-saves and auto-loads
const session = anonymizer.session('conversation-123');
const result = await session.anonymize('Hello John!');
// result.piiMap is automatically saved to storageFor simple use cases where you don't need persistence:
import { createAnonymizer, decryptPIIMap, rehydrate, InMemoryKeyProvider } from 'rehydra';
const keyProvider = new InMemoryKeyProvider();
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' },
keyProvider,
});
await anonymizer.initialize();
// Anonymize
const result = await anonymizer.anonymize('Hello John Smith!');
// Translate (or other processing)
const translated = await translateAPI(result.anonymizedText);
// Rehydrate manually using the returned PII map
const key = await keyProvider.getKey();
const piiMap = await decryptPIIMap(result.piiMap, key);
const original = rehydrate(translated, piiMap);For applications that need to persist PII maps across requests/restarts:
import {
createAnonymizer,
InMemoryKeyProvider,
SQLitePIIStorageProvider,
} from 'rehydra';
// 1. Setup storage (once at app start)
const storage = new SQLitePIIStorageProvider('./pii-maps.db');
await storage.initialize();
// 2. Create anonymizer with storage and key provider
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' },
keyProvider: new InMemoryKeyProvider(),
piiStorageProvider: storage,
});
await anonymizer.initialize();
// 3. Create a session for each conversation
const session = anonymizer.session('conversation-123');
// 4. Anonymize - auto-saves to storage
const result = await session.anonymize('Hello John Smith from Acme Corp!');
console.log(result.anonymizedText);
// "Hello <PII type="PERSON" id="1"/> from <PII type="ORG" id="1"/>!"
// 5. Later (even after app restart): rehydrate - auto-loads and decrypts
const translated = await translateAPI(result.anonymizedText);
const original = await session.rehydrate(translated);
console.log(original);
// "Hello John Smith from Acme Corp!"
// 6. Optional: check existence or delete
await session.exists(); // true
await session.delete(); // removes from storageEach session ID maps to a separate stored PII map:
// Different chat sessions
const chat1 = anonymizer.session('user-alice-chat');
const chat2 = anonymizer.session('user-bob-chat');
await chat1.anonymize('Alice: Contact me at [email protected]');
await chat2.anonymize('Bob: My number is +49 30 123456');
// Each session has independent storage
await chat1.rehydrate(translatedText1); // Uses Alice's PII map
await chat2.rehydrate(translatedText2); // Uses Bob's PII mapWithin a session, entity IDs are consistent across multiple anonymize() calls:
const session = anonymizer.session('chat-123');
// Message 1: User provides contact info
const msg1 = await session.anonymize('Contact me at [email protected]');
// → "Contact me at <PII type="EMAIL" id="1"/>"
// Message 2: References same email + new one
const msg2 = await session.anonymize('CC: [email protected] and [email protected]');
// → "CC: <PII type="EMAIL" id="1"/> and <PII type="EMAIL" id="2"/>"
// ↑ Same ID (reused) ↑ New ID
// Message 3: No PII
await session.anonymize('Please translate to German');
// Previous PII preserved
// All messages can be rehydrated correctly
await session.rehydrate(msg1.anonymizedText); // ✓
await session.rehydrate(msg2.anonymizedText); // ✓This ensures that follow-up messages referencing the same PII produce consistent placeholders, and rehydration works correctly across the entire conversation.
The SQLite provider works on both Node.js and Bun with automatic runtime detection.
Note:
SQLitePIIStorageProvideris not available in browser builds. When bundling for browser with Vite/webpack, useIndexedDBPIIStorageProviderinstead. The browser-safe build automatically excludes SQLite to avoid bundling Node.js dependencies.
// Node.js / Bun only
import { SQLitePIIStorageProvider } from 'rehydra';
// Or explicitly: import { SQLitePIIStorageProvider } from 'rehydra/storage/sqlite';
// File-based database
const storage = new SQLitePIIStorageProvider('./data/pii-maps.db');
await storage.initialize();
// Or in-memory for testing
const testStorage = new SQLitePIIStorageProvider(':memory:');
await testStorage.initialize();Dependencies:
- Bun: Uses built-in
bun:sqlite(no additional install needed) - Node.js: Requires
better-sqlite3:
npm install better-sqlite3import {
createAnonymizer,
InMemoryKeyProvider,
IndexedDBPIIStorageProvider,
} from 'rehydra';
// Custom database name (defaults to 'rehydra-pii-storage')
const storage = new IndexedDBPIIStorageProvider('my-app-pii');
const anonymizer = createAnonymizer({
ner: { mode: 'quantized' },
keyProvider: new InMemoryKeyProvider(),
piiStorageProvider: storage,
});
await anonymizer.initialize();
// Use sessions as usual
const session = anonymizer.session('browser-chat-123');
const result = await session.anonymize('Hello John!');
const original = await session.rehydrate(result.anonymizedText);The session object provides these methods:
interface AnonymizerSession {
readonly sessionId: string;
anonymize(text: string, locale?: string, policy?: Partial<AnonymizationPolicy>): Promise<AnonymizationResult>;
rehydrate(text: string): Promise<string>;
load(): Promise<StoredPIIMap | null>;
delete(): Promise<boolean>;
exists(): Promise<boolean>;
}Entries persist forever by default. Use cleanup() on the storage provider to remove old entries:
// Delete entries older than 7 days
const count = await storage.cleanup(new Date(Date.now() - 7 * 24 * 60 * 60 * 1000));
// Or delete specific sessions
await session.delete();
// List all stored sessions
const sessionIds = await storage.list();The library works seamlessly in browsers without any special configuration.
- First-use downloads: NER model (~280 MB) and semantic data (~12 MB) are downloaded on first use
- ONNX runtime: Automatically loaded from CDN if not bundled
- Offline support: After initial download, everything works offline
- Storage: Uses IndexedDB and OPFS - data persists across sessions
The package uses conditional exports to automatically provide a browser-safe build when bundling for the web. This means:
- Automatic: Vite, webpack, esbuild, and other modern bundlers will automatically use
dist/browser.js - No Node.js modules: The browser build excludes
SQLitePIIStorageProviderand other Node.js-specific code - Tree-shakable: Only the code you use is included in your bundle
// package.json exports (simplified)
{
"exports": {
".": {
"browser": "./dist/browser.js",
"node": "./dist/index.js",
"default": "./dist/index.js"
}
}
}This library works with Bun. Since onnxruntime-node is a native Node.js addon, Bun uses onnxruntime-web:
bun add rehydra onnxruntime-webUsage is identical - the library auto-detects the runtime.
Benchmarks on Apple M-series (CPU) and NVIDIA T4 (GPU). Run npm run benchmark:compare to measure on your hardware.
| Backend | Short (~40 chars) | Medium (~500 chars) | Long (~2K chars) | Entity-dense |
|---|---|---|---|---|
| Regex-only | 0.38 ms | 0.50 ms | 0.91 ms | 0.35 ms |
| NER CPU | 4.3 ms | 26 ms | 93 ms | 13 ms |
| NER GPU | 62 ms | 73 ms | 117 ms | 68 ms |
Local CPU inference is faster than GPU for typical workloads due to network overhead. GPU servers are beneficial for high-throughput batch processing where many requests can be parallelized.
| Backend | Short | Medium | Long |
|---|---|---|---|
| Regex-only | ~2,640 | ~2,017 | ~1,096 |
| NER CPU | ~234 | ~38 | ~11 |
| NER GPU | ~16 | ~14 | ~9 |
| Model | Size | First-Use Download |
|---|---|---|
| Quantized NER | ~265 MB | ~30s on fast connection |
| Standard NER | ~1.1 GB | ~2min on fast connection |
| Semantic Data | ~12 MB | ~5s on fast connection |
| Use Case | Recommended Backend |
|---|---|
| Structured PII only (email, phone, IBAN) | Regex-only |
| General use with name/org/location detection | NER CPU (default) |
| High-throughput batch processing (1000s of docs) | NER GPU |
| Privacy-sensitive / zero-knowledge required | NER CPU (data never leaves device) |
Note: Local CPU inference now outperforms GPU for most use cases due to network overhead elimination. The trie-based tokenizer provides O(token_length) lookups instead of O(vocab_size), making local inference practical for production use.
| Environment | Version | Notes |
|---|---|---|
| Node.js | >= 18.0.0 | Uses native onnxruntime-node |
| Bun | >= 1.0.0 | Requires onnxruntime-web |
| Browsers | Chrome 86+, Firefox 89+, Safari 15.4+, Edge 86+ | Uses OPFS for model storage |
# Install dependencies
npm install
# Run tests
npm test
# Build
npm run build
# Lint
npm run lintFor development or custom models:
# Requires Python 3.8+
npm run setup:ner # Standard model
npm run setup:ner:quantized # Quantized modelMIT
