Surya Tiwari

AI architect specializing in scalable model deployment and data engineering

Bengaluru, Karnataka, India

rocketreach

Joined March 2026

linkedin.com

fractal.ai

Summary

Practical expertise in scalable model deployment and inference engineering: Surya has documented and implemented production patterns—containerized model runtimes with external weight mounts, job-queue architectures for GPU workloads, and websocket-based agent serving—to reduce deployment time, lower costs, and improve reliability for large models and multimodal services. medium+2

Contributor to applied AI research and reasoning models: associated with Fractal AI Research projects that target advanced reasoning (e.g., Fathom-R1-14B) and platformized AI products, reflecting involvement in both model development and open research releases. fractal+1

Experienced data and platform engineer with enterprise delivery background: career progression through senior engineering and architect roles shows experience in designing streaming ingestion, enterprise data warehouse solutions, and analytics platforms for large customers. theorg

Technical communicator and documented practitioner: Surya authors technical write-ups that explain engineering tradeoffs and system design choices for deploying AI systems at scale, indicating a focus on sharing operational knowledge within and beyond the company. medium+1

Work

Education

Projects

Writing

Containerizing AI Models: The Why and How for Scalable Deployments

May 1, 2025

Describes an approach to containerize LLMs/VLMs while decoupling model weights via mounted storage on AKS to reduce build times, lower storage costs, and enable zero-downtime rollbacks.

medium.com

The Hidden Cost of Request-Response in AI — And How We Fixed It

March 1, 2025

Explains the shift from HTTP request-response to a job queue pattern for GPU-intensive image generation tasks, resulting in near-zero failure rates, better GPU utilization, and improved user retention.

medium.com

Scalable deployment of LangChain agents with human-in-the-loop capability

January 1, 2024

Discusses architectural patterns for deploying autonomous AI agents at scale, addressing generator-based state challenges and recommending websockets for persistent, concurrent human-in-the-loop interactions.

medium.com

Surya Tiwari

Summary

Work

Principal AI ArchitectFractal2025–Present

Lead ArchitectFractal2021–2026

Senior EngineerFractal2019–2021

Senior EngineerFRACTAL ANALYTICS UK LTD2017–2019

Senior EngineerFractal Analytics2016–2017

ConsultantCapgemini2014–2016

Education

Bachelor's in Information TechnologyUniversity of Mumbai2008–2012

Projects

Fathom-R1-14B2025–Present

Kalaido text-to-image deployment improvements (job-queue migration)

Writing

Containerizing AI Models: The Why and How for Scalable Deployments

The Hidden Cost of Request-Response in AI — And How We Fixed It

Scalable deployment of LangChain agents with human-in-the-loop capability