
Surya Tiwari
AI architect specializing in scalable model deployment and data engineering
Bengaluru, Karnataka, India
Joined March 2026
Summary
Practical expertise in scalable model deployment and inference engineering: Surya has documented and implemented production patterns—containerized model runtimes with external weight mounts, job-queue architectures for GPU workloads, and websocket-based agent serving—to reduce deployment time, lower costs, and improve reliability for large models and multimodal services. medium+2
Contributor to applied AI research and reasoning models: associated with Fractal AI Research projects that target advanced reasoning (e.g., Fathom-R1-14B) and platformized AI products, reflecting involvement in both model development and open research releases. fractal+1
Experienced data and platform engineer with enterprise delivery background: career progression through senior engineering and architect roles shows experience in designing streaming ingestion, enterprise data warehouse solutions, and analytics platforms for large customers. theorg
Technical communicator and documented practitioner: Surya authors technical write-ups that explain engineering tradeoffs and system design choices for deploying AI systems at scale, indicating a focus on sharing operational knowledge within and beyond the company. medium+1
Work
Education
Projects
Writing
Containerizing AI Models: The Why and How for Scalable Deployments
May 1, 2025Describes an approach to containerize LLMs/VLMs while decoupling model weights via mounted storage on AKS to reduce build times, lower storage costs, and enable zero-downtime rollbacks.
The Hidden Cost of Request-Response in AI — And How We Fixed It
March 1, 2025Explains the shift from HTTP request-response to a job queue pattern for GPU-intensive image generation tasks, resulting in near-zero failure rates, better GPU utilization, and improved user retention.
Scalable deployment of LangChain agents with human-in-the-loop capability
January 1, 2024Discusses architectural patterns for deploying autonomous AI agents at scale, addressing generator-based state challenges and recommending websockets for persistent, concurrent human-in-the-loop interactions.