AI Systems · Full-Stack · DevOps · Startup Engineer
I build things that scale. I've done it multiple times. I'll do it again.
I'm actively looking for roles in Software Engineering and DevOps / Cloud Infrastructure where I can own hard problems end-to-end and move fast. I thrive in startup-speed environments, but I bring the engineering discipline of a much larger org. If you're building something ambitious, let's talk.
vadhani.k@northeastern.edu · (617) 560-0171
I'm the engineer who joined Resemble AI and built their entire voice agents platform from zero. Architecture, infrastructure, real-time streaming, multi-agent orchestration, and enterprise deployment. All of it. Before that I engineered event-driven microservices at a Sequoia-backed company, led a full platform redesign, and built a multi-agent RAG system for a Northeastern-backed research lab. From scratch. Every time.
I don't wait for tickets. I identify the problem, design the solution, build it, instrument it, and ship it. That's just how I work.
I've operated across the full stack: frontend, backend, cloud infrastructure, AI/ML pipelines, DevOps. I'm equally at home optimizing a Kubernetes cluster as I am tuning vLLM inference or building a React dashboard. What stays constant is the standard: production-grade, observable, scalable, and fast.
MS Computer Software Engineering · Northeastern University, Boston · GPA 3.8 BS Computer Engineering · University of Mumbai · GPA 3.6
Resemble AI · Google & Sony backed
- Built the entire platform from scratch: SIP trunk integration across Twilio, Telnyx, and BYO-SIP, full call lifecycle management, MCP integration, RAG pipelines, tool calling, and multi-agent orchestration for real-time in-call handoffs between sales, technical, and billing agents. Scaled to 1,000+ concurrent enterprise calls.
- Went deep into vLLM internals: KV cache allocation, continuous batching, speculative decoding. Cut TTFT by 37%.
- Built custom WebSocket streaming pipelines with backpressure handling and connection multiplexing to hit sub-800ms p95 end-to-end latency.
- GPU-aware auto-scaling and model sharding brought inference costs down by 38%.
- Built the full observability layer: per-session tracking of end-of-utterance delay, STT transcription latency, LLM TTFT, TTS TTFB, tokens per second, and character counts for cost profiling.
- Extended the platform into a post-call suite with recording, transcription, and a management dashboard that competed with Otter.ai and Fireflies and generated paying customers.
Resemble AI · Google & Sony backed
- Designed and shipped a live deepfake detection system for Google Meet, Teams, and Zoom.
- Engineered per-participant audio and video stream ingestion feeding proprietary deepfake models in real time.
- Wired up automated host alerting at 90%+ confidence. Real-time. Production.
Dash Labs · Northeastern University backed
- Built a multi-agent RAG pipeline from scratch: LangChain orchestration, FAISS vector indexing, and sentence-transformer embeddings for automated essay evaluation.
- Built the entire product around it: frontend, backend, API layer, and integration points between the agent pipeline and application layer.
- Benchmarked grading accuracy across Llama 3.1 and DeepSeek-R1 using single-shot prompting, chain-of-thought, and multi-agent strategies to pick the right model for production.
Avataar · Sequoia & Tiger Global backed
- Engineered a JWT session governance system with server-side token rotation and invalidation. Result: +33% paid conversions among free-tier users.
- Led a platform-wide responsive redesign with a breakpoint-driven layout system, adaptive component rendering, and lazy-loaded viewport-specific assets. Result: +57% mobile signups.
- Architected event-driven microservices on Node.js and Express for social interactions with async message queues and read-replica routing. Result: -43% API response times.
- Built a multi-tier caching layer using AWS CloudFront, S3, and cache invalidation policies with origin shield and edge-optimized distribution. Result: -38% thumbnail load times globally.
Languages
AI / ML & LLMs
Full-Stack
Cloud & DevOps
Databases
I own the outcome, not the task. I don't execute tickets and hand off. I identify the problem, design the solution, build it, ship it, and monitor it. If something breaks at 2am, I already have alerts set up.
I move startup fast with production standards. Short cycles, fast feedback, aggressive iteration. But with proper observability, error handling, and architecture baked in from the start. I've seen what happens when you skip those steps. I don't skip them.
I go deep. I've debugged vLLM KV cache behavior. I've traced WebSocket backpressure to the byte level. I've profiled SIP signaling latency end-to-end. When something is slow or broken, I don't guess. I instrument, measure, and fix.
I build for scale from day one. Not premature optimization. But architecture decisions that don't require a full rewrite at 10x traffic. Concurrency models, resource management, and deployment infrastructure designed to grow.
Agile isn't a process to me. It's a mindset. Deliver incrementally. Validate constantly. Adapt quickly. Ship.
Advanced LLM inference optimization · Real-time multimodal AI systems · Large-scale DevOps and platform engineering



