Krish Vadhani krishvadhani19

Krish Mehul Vadhani

AI Systems · Full-Stack · DevOps · Startup Engineer

I build things that scale. I've done it multiple times. I'll do it again.

Open to Work

I'm actively looking for roles in Software Engineering and DevOps / Cloud Infrastructure where I can own hard problems end-to-end and move fast. I thrive in startup-speed environments, but I bring the engineering discipline of a much larger org. If you're building something ambitious, let's talk.

vadhani.k@northeastern.edu · (617) 560-0171

Who I Am

I'm the engineer who joined Resemble AI and built their entire voice agents platform from zero. Architecture, infrastructure, real-time streaming, multi-agent orchestration, and enterprise deployment. All of it. Before that I engineered event-driven microservices at a Sequoia-backed company, led a full platform redesign, and built a multi-agent RAG system for a Northeastern-backed research lab. From scratch. Every time.

I don't wait for tickets. I identify the problem, design the solution, build it, instrument it, and ship it. That's just how I work.

I've operated across the full stack: frontend, backend, cloud infrastructure, AI/ML pipelines, DevOps. I'm equally at home optimizing a Kubernetes cluster as I am tuning vLLM inference or building a React dashboard. What stays constant is the standard: production-grade, observable, scalable, and fast.

MS Computer Software Engineering · Northeastern University, Boston · GPA 3.8 BS Computer Engineering · University of Mumbai · GPA 3.6

What I've Actually Built

Voice Agents Platform

Resemble AI · Google & Sony backed

Built the entire platform from scratch: SIP trunk integration across Twilio, Telnyx, and BYO-SIP, full call lifecycle management, MCP integration, RAG pipelines, tool calling, and multi-agent orchestration for real-time in-call handoffs between sales, technical, and billing agents. Scaled to 1,000+ concurrent enterprise calls.
Went deep into vLLM internals: KV cache allocation, continuous batching, speculative decoding. Cut TTFT by 37%.
Built custom WebSocket streaming pipelines with backpressure handling and connection multiplexing to hit sub-800ms p95 end-to-end latency.
GPU-aware auto-scaling and model sharding brought inference costs down by 38%.
Built the full observability layer: per-session tracking of end-of-utterance delay, STT transcription latency, LLM TTFT, TTS TTFB, tokens per second, and character counts for cost profiling.
Extended the platform into a post-call suite with recording, transcription, and a management dashboard that competed with Otter.ai and Fireflies and generated paying customers.

Real-Time Deepfake Detection

Resemble AI · Google & Sony backed

Designed and shipped a live deepfake detection system for Google Meet, Teams, and Zoom.
Engineered per-participant audio and video stream ingestion feeding proprietary deepfake models in real time.
Wired up automated host alerting at 90%+ confidence. Real-time. Production.

Multi-Agent RAG System

Dash Labs · Northeastern University backed

Built a multi-agent RAG pipeline from scratch: LangChain orchestration, FAISS vector indexing, and sentence-transformer embeddings for automated essay evaluation.
Built the entire product around it: frontend, backend, API layer, and integration points between the agent pipeline and application layer.
Benchmarked grading accuracy across Llama 3.1 and DeepSeek-R1 using single-shot prompting, chain-of-thought, and multi-agent strategies to pick the right model for production.

Platform Engineering

Avataar · Sequoia & Tiger Global backed

Engineered a JWT session governance system with server-side token rotation and invalidation. Result: +33% paid conversions among free-tier users.
Led a platform-wide responsive redesign with a breakpoint-driven layout system, adaptive component rendering, and lazy-loaded viewport-specific assets. Result: +57% mobile signups.
Architected event-driven microservices on Node.js and Express for social interactions with async message queues and read-replica routing. Result: -43% API response times.
Built a multi-tier caching layer using AWS CloudFront, S3, and cache invalidation policies with origin shield and edge-optimized distribution. Result: -38% thumbnail load times globally.

Technical Stack

Languages

AI / ML & LLMs

Full-Stack

Cloud & DevOps

Databases

How I Work

I own the outcome, not the task. I don't execute tickets and hand off. I identify the problem, design the solution, build it, ship it, and monitor it. If something breaks at 2am, I already have alerts set up.

I move startup fast with production standards. Short cycles, fast feedback, aggressive iteration. But with proper observability, error handling, and architecture baked in from the start. I've seen what happens when you skip those steps. I don't skip them.

I go deep. I've debugged vLLM KV cache behavior. I've traced WebSocket backpressure to the byte level. I've profiled SIP signaling latency end-to-end. When something is slow or broken, I don't guess. I instrument, measure, and fix.

I build for scale from day one. Not premature optimization. But architecture decisions that don't require a full rewrite at 10x traffic. Concurrency models, resource management, and deployment infrastructure designed to grow.

Agile isn't a process to me. It's a mindset. Deliver incrementally. Validate constantly. Adapt quickly. Ship.

Currently Exploring

Advanced LLM inference optimization · Real-time multimodal AI systems · Large-scale DevOps and platform engineering

"Move fast, build things that last."

Provide feedback

Saved searches

Use saved searches to filter your results more quickly