Initializing terminal…

Kaustubh Sonawane

Building intelligent systems at the intersection of Data Science and Machine Learning.

Machine Learning Python RAG Systems PySpark Deep Learning NLP Model Context Protocol Data Science

About Me

I am an AI/ML Engineer and Data Scientist passionate about building intelligent, scalable systems that solve complex real-world problems.

My expertise lies in bridging the gap between cutting-edge research and production-ready applications. From developing Retrieval-Augmented Generation (RAG) systems for biomedical data to engineering automated Technical Debt refactoring tools, I thrive on pushing the boundaries of what's possible with code and data.

When I'm not training models or optimizing data pipelines, I'm exploring the latest advancements in natural language processing, contributing to open-source, or experimenting with new generative AI paradigms.

Kaustubh Sonawane

Experience & Education

Masters Era · 2024 – 2026

Data Science Intern, NetApp

Raleigh, NC, USA

  • Increased quarterly revenue forecasting accuracy by 15% by architecting two end-to-end ETL and production forecasting pipelines (AzureML, AbacusAI) to ingest and process over 3M+ sales records.
  • Reduced manual processing time by 40% and risk estimate variance by 9% by designing automated SQL and Python data pipelines to extract 20+ features for customer lifecycle models.

North Carolina State University, Raleigh, NC

Master of Computer Science

Bachelors Era · 2020 – 2024

Azure Solutions Intern, Hackveda

Navi Mumbai, India

  • Optimized data pipeline prediction error to a 0.1397 NRMSE by deploying an automated voting ensemble model on Azure ML integrated with custom feature engineering modules.

Data Analyst Intern, SIES GST

Navi Mumbai, India

  • Eliminated 90% of manual dashboard updates by engineering automated data ingestion pipelines that instantly standardized and fed real-time metrics into live Power BI and Tableau tracking systems.

University of Mumbai, Mumbai, India

Bachelor of Engineering, Computer Engineering

Minor: Data Science

Selected Work

AI/ML & Developer Tools

AI-Powered Technical Debt Agent

Engineered a Model Context Protocol (MCP) server and GitHub Actions workflow integrating with VS Code and Copilot. Fine-tuned GPT-4.1-mini models to automatically detect, classify, and refactor technical debt across 5 severity levels using a comprehensive 3-step workflow.

TypeScript Python MCP GitHub Actions
AI/ML & Data Engineering

Biomedical Assistant RAG

Engineered a distributed PySpark data preprocessing pipeline (1M+ records) with shuffle-aware partitioning and partitioned Parquet output to power a biomedical RAG system integrating a Knowledge Graph and LLMs. Delivered >95% reliable responses across 10K+ biomedical entities and 4,920 disease-symptom relationships.

PySpark LangChain Knowledge Graph Streamlit
Predictive Modeling

Customer Behavior Analysis

Built comprehensive recommendation system comparing ML and deep learning approaches using Amazon Food Reviews dataset with 568K+ reviews.

Scikit-Learn XGBoost
GPU Computing

Triton-Based Flash Attention

Accelerated transformer inference runtime by 1.8× for sequence lengths up to 8,192 tokens by engineering a fused, SRAM-aware 64×64 tiled FlashAttention GPU kernel in Triton/CUDA to eliminate high-latency HBM round-trips.

Triton CUDA PyTorch
Computer Vision

Real-Time Object Recognition Analysis

Comparative analysis of YOLO vs SSD models using PASCAL VOC dataset, achieving optimal speed-accuracy trade-offs for real-time applications.

PyTorch TensorFlow CUDA YOLO & SSD

Get In Touch

Let's collaborate.

Whether you have a question, a project proposal, or just want to connect over a virtual coffee, my inbox is always open.

ksonawa@ncsu.edu LinkedIn Profile