Skip to content
B

Loading experience

DATA SCIENCE • DESIGN • MOTION •DATA SCIENCE • DESIGN • MOTION •DATA SCIENCE • DESIGN • MOTION •DATA SCIENCE • DESIGN • MOTION •

Data Scientist  ·  Designer  ·  Builder

Basanth Periyapatna Roopa Kumar

Building things that live at the intersection of data, design, and human experience.

About

Who I am

During my first year of engineering, two people I love survived sudden cardiac arrest. Watching a cardiologist trace every millisecond on an ECG print-out — and pinpoint the exact moment each heart faltered — was a revelation: data, interpreted with precision, saves lives.

Those late-night explorations into HMMs, SVMs, and arrhythmia detection set my compass. Coming from a family of professors, I learned that knowledge is worthless unless it's shared for social good. Today I'm a data-science practitioner who enjoys connecting the dots — ideas across disciplines, people across teams, applications across industries.

Backed by 4+ years translating complex datasets into clear actions for stakeholders in telecom, logistics, and e-commerce — I'm now pursuing an M.S. in Applied Data Science at SJSU, sharpening statistical rigor and entrepreneurial vision toward a healthcare-focused analytics startup that empowers NGOs to improve outcomes for underserved communities.

4+Years building
10+Projects shipped
4Certifications
AZ-900DP-900PL-900IIT-M · Master Data Science
PythonRC++GoPyTorchApache KafkaAWSAzureKubernetesNext.jsSQLApache SparkLangGraphScikit-learnPower BITableauDAXTypeScriptTerraformPower PlatformPythonRC++GoPyTorchApache KafkaAWSAzureKubernetesNext.jsSQLApache SparkLangGraphScikit-learnPower BITableauDAXTypeScriptTerraformPower PlatformPythonRC++GoPyTorchApache KafkaAWSAzureKubernetesNext.jsSQLApache SparkLangGraphScikit-learnPower BITableauDAXTypeScriptTerraformPower Platform

Selected Experience

Present
Jan 2025 – PresentT+16M

San Jose State University (SJSU)

research

Architected a fault-tolerant LangGraph multi-agent pipeline with GPT-4 tool-calling for geospatial data, analyzing structural inequity in Large Language Models (LLMs) for non-English scripts. Optimized downstream causality pipelines using PyTorch and R, directly improving risk analytics precision by 15%.

"Authored comprehensive research on LLMs & engineered multi-agent risk ingestion pipelines"

PythonLangGraphLLMsPyTorchOpenCV
Aug 2025
Jun 2025 – Aug 2025T+3M

NASA Ames Research Center

internship

Fine-tuned NASA Prithvi Vision Transformer via MAE pretraining in PyTorch for high-risk satellite detection. Architected scalable, cloud-native REST APIs on AWS EC2 & Lambda with Kubernetes orchestration and Prometheus observability, cutting spatial inference latency by 25%.

"Fine-tuned Vision Transformers processing 2TB daily via robust AWS/Kubernetes pipelines"

AWSKubernetesPyTorchPythonSHAP
Jun 2024
Apr 2023 – Jun 2024T+15M

AtkinsRéalis

Full-Time

Developed Kafka-backed event-driven microservices on Azure Databricks with exactly-once semantics. Deployed RESTful XGBoost inference APIs via blue-green deployments, slashing infrastructure costs by $400K and reducing unplanned operational failures by 20%.

"Designed event-driven Azure tracking algorithms for 1.8M IoT signals & ML optimization"

Azure DatabricksKafkaData EngineeringXGBoostPython
Sep 2022
Sep 2021 – Sep 2022T+13M

6D Technologies

Full-Time

Built low-latency RESTful APIs in Python/C++ for production messaging architectures. Engineered stateful stream processing via Kafka, refactored PostgreSQL materialized views tuning query latency (-28%), and deployed zero-downtime microservices using GitLab CI/CD and Docker.

"Stabilized concurrent PostgreSQL databases handling 15k+ req/sec & modernized CI/CD"

PostgreSQLC++KafkaDockerCI/CD

Work

Shipped

Sequential Horizon
01F1 Prediction Engine
Data ·Python / XGBoost / Monte Carlo

F1 Prediction Engine

A triple-model ensemble (XGBoost, Monte Carlo, Bayesian) predicting the 2026 F1 era with 38.9% accuracy.

02LLM Multilingual Deficit
Data ·NLP / Tokenization / LLMs

LLM Multilingual Deficit

Quantifying the structural 'Token Tax' and economic inequality disadvantaging non-English languages in global LLMs.

03Cartograph
Data ·React / Three.js / D3.js

Cartograph

An interactive data visualization platform that transforms raw datasets into cinematic 3D narratives.

Research & Engineering

Deep dives

AI/ML

Australian GP 2026: V1 Prediction — Russell 38.9%

Season opener under the biggest regulatory reset in F1 history. XGBoost loved Hadjar at 35.2%; Monte Carlo gave Russell 75.8% from pole physics. The ensemble resolved at 38.9% — and documented three V1 failures for V2.

AI/ML

Japanese GP 2026: Antonelli on Pole at Suzuka

Post-Australia recalibration tightened the model's variance. Antonelli's first career pole at 19, Verstappen buried in P14 from an ERS error. Mercedes 1-2 probability: 84.5%. The Suzuka circuit amplifies what qualifying says.

AI/ML

Chinese GP 2026: V2 — Sprint as Tier 0 Data

First sprint weekend of 2026. V2 rebuilt with live sprint race results as the highest-weight input layer. Hamilton re-rated to 21.2% after his V1 underestimation. The fp1_to_quali_divergence feature corrected a systematic blind spot.

Research

The Multilingual Deficit: Tokenization Inequality in LLMs

Kannada requires 3-5× more tokens than English for equivalent semantic content. Quantifying the Token Tax, the $262K-$365K annual cost premium for Hindi API services, and the three architectural bottlenecks that cascade from tokenization.

AI/Health

Cognivi — Stroke Detection Under $10

A privacy-first, multimodal AI system built at TreeHacks 2026. MediaPipe arm-drift + facial asymmetry detection fused with a PubMed-grounded RAG pipeline feeding Claude 3 for neurologic speech grading — all from a 60-second smartphone video.

Academic Work

Research & Papers

Peer-ReviewedBig Data · Healthcare2025

A Big Data Pipeline Approach for Predicting Real-Time Pandemic Hospitalization Risk

Vishnu S. Pendyala, Mayank Kapadia, Basanth Periyapatna Roopa Kumar, Manav Anandani, Nischitha Nagendran

A dual big-data pipeline for real-time pandemic triage combining a streaming epidemiological risk prediction system (XGBoost + Bloom filter pre-screening on 3M+ CDC COVID-19 records) with a chest X-ray classification pipeline (EfficientNet-B0 + Grad-CAM). A lightweight GPT-based reasoning layer generates auditable ALERT/FLAG/LOG triage comments. CTGAN validates streaming robustness under synthetic load. Provides scalable, explainable, near-real-time decision support for public health readiness.

XGBoost Minority F1

0.76

Chest X-Ray Accuracy

99.5%

CDC Records Processed

3M+

Generative AIHealthcare2025

A Multimodal Generative AI Framework for Cancer Pathology Classification

Mayank Kapadia, Basanth Periyapatna Roopa Kumar, Nischitha Nagendran

A multimodal generative AI framework combining histopathology image classification, clinical note classification via ClinicalBERT with SFT/TAPT/LoRA fine-tuning, and prompt-driven clinical captioning using RAG. Establishes baselines on PatchCamelyon (262K images) and curates TCGA BRCA clinical notes into a balanced dataset of 2,380 reports for unified multimodal cancer diagnostics.

ViT-Base/16 ROC-AUC

0.9601

ClinicalBERT LoRA F1

0.94

PCam Test Images

32,768

Deep LearningRemote Sensing2025

FireSense: Multispectral Fire Detection with Channel Attention and Probabilistic Calibration

Basanth Periyapatna Roopa Kumar, Nischitha Nagendran, Nandhakumar Apparsamy, Nitya Rondla

A deep learning framework for automated wildfire detection in Landsat-8 satellite imagery. Combines ResNet34-UNet with Convolutional Block Attention Modules (CBAM) across all 10 spectral bands, introduces soft labels from multi-annotator consensus to capture boundary uncertainty, and applies temperature scaling for calibrated probability estimates ready for operational deployment.

Mean IoU

69.6%

vs. Classical Baselines

+45.6%

ECE Reduction

86.1%

Knowledge GraphsML2025

Knowledge Graph-Enhanced Traffic Optimization System for the Bay Area

Anshu Reddy Dhamana, Basanth Periyapatna Roopa Kumar, Manav Rajesh Anandani, Nischitha Nagendran, Nitya Rondla, Srithareddy Devireddy, Vinuthna Papana

A knowledge graph-based traffic optimization system for the San Francisco Bay Area fusing traffic flow, weather, incidents, and events into a unified semantic graph. Node2vec embeddings power a stacking ensemble achieving 92% prediction accuracy with the connects_to relationship, delivering actionable insights into weekday vs. weekend traffic patterns for city planning and commuter support.

Prediction Accuracy

92%

Graph Relationship Types

6

Graph Nodes

550+

Writing

Transmissions

[AI + GOVERNANCE]

Secure AI Governance with the Applied Intelligence Systems Club

Nov 2, 20255 MIN READRead original ↗
[DATA + F1]

Why F1 Data Is the Best Playground for Learning ML

Mar 15, 20257 MIN
[MUSIC]

The GOT Score is Statistically the Greatest TV Soundtrack

Nov 20, 20246 MIN

Library

What I'm consuming

Books read & screens watched in 2025. Hover to read my take.

The Design of Everyday Things

The Design of Everyday Things

Don Norman

Design
Sapiens

Sapiens

Yuval Noah Harari

History
The Creative Act

The Creative Act

Rick Rubin

Creative
Zero to One

Zero to One

Peter Thiel

Business
Thinking, Fast and Slow

Thinking, Fast and Slow

Daniel Kahneman

Psychology
The Almanack of Naval Ravikant

The Almanack of Naval Ravikant

Eric Jorgenson

Philosophy
Show Your Work!

Show Your Work!

Austin Kleon

Creative

Quotes

Words I live by

Vision

Thepeoplewhoarecrazyenoughtothinktheycanchangetheworldaretheoneswhodo.

Steve Jobs

01 / 05
Design

Simplicityistheultimatesophistication.

Leonardo da Vinci

02 / 05
Growth

Youdon'thavetobegreattostart,butyouhavetostarttobegreat.

Zig Ziglar

03 / 05
Agency

Thebestwaytopredictthefutureistocreateit.

Peter Drucker

04 / 05
Life

Stayhungry.Stayfoolish.

Stewart Brand

Whole Earth Catalog

05 / 05

Get in touch

Let'smakesomething

Have a project in mind? A collaboration idea? Or just want to say hi?
My inbox is always open.

Or find me on
Open to opportunities