Portrait of Shikhar Shukla
Graduate • Open to Opportunities

Shikhar Shukla

Clinician-Turned Data Scientist | Ex-CDC | Advancing AI & Analytics for Public Health and Clinical Research

Healthcare & AI professional with 10+ years of combined clinical and data science expertise, bridging hands-on dental practice with advanced biomedical informatics. Former CDC Public Health Informatics Fellow, experienced in designing reproducible pipelines and scalable AI/ML solutions across multimodal health data - including EHRs, imaging, physiologic signals, and voice. Skilled in statistical modeling, time-series forecasting, and deep learning for risk prediction, clinical decision support, and public health surveillance. Strong track record of peer-reviewed publications, international conference presentations, and collaborative research with clinicians, public health agencies, and academic institutions. Passionate about translating AI innovations into equitable, real-world healthcare impact.

0
Years in Healthcare & Data Science
0
Research Projects & Pipelines
0
CDC AI & Informatics Projects
0
Publications & Abstracts

CDC Certificate of Outstanding Contribution

Advancing Artificial Intelligence in Public Health.

Ex-CDC Fellow (NCEZID)

Led 9 federal projects in surveillance AI & pipelines.

AMIA 2024 - Poster

Clinical analytics accepted at American Medical Informatics Association.

IEEE CBMS 2024 - Paper

Wound trajectory interpolation & forecasting. Presented at Guadalajara, Mexico.

Artificial Intelligence in Medicine (AIME) 2023 - AI in Radiology

Presented NSF-supported imaging AI research at Slovenia.

DHIS2 Annual Conf. 2024

Novel OpenCPU + DHIS2 tool in-platform statistics. Presented at University of Oslo

DHIS2 Annual Conf. 2025

DHIS2 for U.S. Community/Public Health accepted.

Clinically-Grounded · AI-Driven

Health informatics researcher bridging clinical expertise with advanced data science to create reproducible, equitable, and deployable AI solutions for healthcare and public health.

Clinical & Public Health Insight

Clinician-turned-data scientist with a decade of clinical experience, translating patient-level complexity into structured data problems. Skilled in interpreting biomedical, epidemiologic, and physiologic data to inform outcome prediction and health equity research.

Statistical & Machine Learning Modeling

Design and validate predictive pipelines using regression, ensemble methods, and deep learning architectures, including recurrent and transformer-based models for temporal and multimodal data, focused on interpretability, calibration, and fairness.

Data Engineering & Integration

Build scalable and version-controlled data pipelines using Python, SQL, PySpark, and cloud-based platforms. Experienced in ETL design, schema mapping, and high-volume data processing that ensure reproducibility and auditability from raw data to model output.

Equity & Reproducibility

Embed fairness and transparency into model development through bias analysis, stratified evaluation, and open documentation, advancing trustworthy, generalizable AI in real-world healthcare settings.

Featured Projects

Clinical + public health data systems, human-in-the-loop AI, and scalable pipelines.

CDC

FungiSurv - AI-Assisted Schema Mapping & Data Integration

Public Health Informatics • Data Engineering

I architected the core ingestion flow for CDC’s fungal disease surveillance (FungiSurv) to handle diverse state CSV/Excel formats. The system validates uploads, uses GPT-4 to propose column-to-standard mappings, lets epidemiologists review/approve edits, then standardizes and integrates data with full versioning and audit logs. Outcome: less manual mapping, faster partner onboarding, and higher-quality, reproducible surveillance datasets.

Semi-Auto
Upload→Integrate
GPT-4
Mapping suggestion
Versioned
Configs & audit
PythonPalantir's Foundry/1CDPPipeline BuilderGPT-4 APIs
CDC

Facility Matching & Geocoding for AR Surveillance

Record Linkage • Geospatial Analytics

Developed a semi-automated pipeline to standardize and match healthcare facilities across state surveillance data. Combined proportional/partial/fixed string distance metrics with CMS and NPI identifiers, then applied geocoding to resolve ambiguous matches. Outcome: higher-confidence matches, reduced manual review, and more precise spatial data for antimicrobial resistance risk analysis.

Top-5
Candidate matches
CMS+NPI
ID fusion
Geo-Precise
Facility resolution
Python Fuzzy Matching String Distances Geocoding APIs Palantir's Foundry/1CDP
Research

AI-Enabled Radiology in LibreHealth RIS + OHIF

Clinical AI • Human-in-the-Loop

Built an end-to-end AI assistant within LibreHealth’s open-source Radiology Information System (RIS), integrating directly with the OHIF viewer. Enabled radiologists to train, retrain, and apply models in workflow using DICOM-SR overlays, active learning (few-shot retraining on feedback), and swarm learning to share model weights across users. Demonstrated with CheXNet on chest X-rays and presented at AIME 2023 (NSF-supported). Outcome: proof of concept for scalable, standards-based AI embedded in radiology workflows.

DICOM-SR
Standards overlays
Few-Shot
Retraining loop
Swarm
Shared model weights
OHIF LibreHealth RIS DICOM REST APIs CheXNet JavaScript/TypeScript
Research

Interpolating & Forecasting Wound Trajectory

Time Series • Clinical Outcomes

Built a reproducible pipeline to impute, interpolate, and forecast chronic wound healing trajectories across 14,571 wounds from 6,171 patients. Used XGBoost for demographic imputation, AIC-driven selection across interpolation methods (Linear, Krogh, Akima, RBF), and forecasting with Holt-Winters, ARIMA, Prophet, and deep models (LSTM, BiLSTM). Integrated subgroup fairness checks by gender, race, and ethnicity. Outcome: Published as second-author IEEE CBMS 2024 paper; demonstrated interpretable, equitable wound outcome forecasting.

14,571
Wounds analyzed
98.9%
Good interpolation
Multi-Model
Forecasting
Python XGBoost ARIMA Prophet LSTM/BiLSTM AIC
DHIS2

DHIS2 + OpenCPU - In-Platform Statistics App

Open Source • R Execution

Designed and built a DHIS2 web app that runs R code inside the platform via OpenCPU. Users select DHIS2 datasets through the API, write/execute R in an editor (RStudio-like layout), and view results and plots in real time, no CSV exports, preserving data governance and reducing friction for analysts and epidemiologists.

In-app
Live R execution
API
Direct DHIS2 pulls
Secure
No downloads
DHIS2 OpenCPU R TypeScript REST APIs
Research

Predicting 30-Day Mortality from Intraoperative Hemodynamics

EHR Time Series • Surgical Outcomes

Led analysis of 42 intraoperative MAP (Mean Arterial Pressure) features from 10 years of EHR data in noncardiac surgeries. Identified dynamic blood pressure variability markers-percent shifts, prolonged hypotension, frequent changepoints, and entropy measures- as strong predictors of 30-day mortality, beyond conventional threshold methods. Outcome: Identified 20 high-value intraoperative hemodynamic predictors of mortality risk; presented at AMIA 2024 Annual Symposium.

42
TS features
20
Predictive indicators
10 yrs
EHR cohort
Python TSFRESH TSFEL Changepoint Detection (PELT) Logistic Regression EHR Data
CDC

Mining NNDSS ODSE for Candida auris Surveillance

SQL Engineering • Data Modeling

Reverse-engineered the NNDSS Operational Data Store (ODSE) to recover priority elements missing from standard views. Built a full ERD across 200+ SQL tables and a dynamic SQL scanner to locate target values across free-text columns. Mapped recovered elements to US state's case IDs and NNDSS identifiers, establishing a validated pathway to enrich Candida auris surveillance without waiting for upstream schema changes. Outcome: Established validated linkage pathways between state case IDs and national identifiers, enabling more complete surveillance reporting.

200+
ODSE tables profiled
Dynamic
SQL value scanner
Linkage
TX ↔ NNDSS IDs
T-SQL SSMS ERD Dynamic SQL Azure Data Studio
CDC

FORT - Fungal Outbreak Response Tracker

Product Analytics • UI/UX in Workshop

Co-developed and transformed CDC’s internal outbreak tracker into a fast, analyst-friendly product in Palantir Foundry (1CDP). Fixed broken filtering, added inline response toggles for rapid status updates, enabled one-click PDF & large Excel exports, and extended the ontology to include Donor-Derived Infections (DDI). Outcome: faster and easier outbreak tracking for epidemiologists, enabling CDC to respond to multi-state fungal outbreaks more quickly and on time.

Real-time
Interactive filtering
200k
Row PDF/Excel export
DDI
Ontology expansion
Foundry Workshop Ontology UX Reporting Agile
Research

Bridge2AI - Voice-Based Detection of Parkinson’s Disease

Digital Biomarkers • Deep Learning on Audio Time-Series

Developed and validated machine learning models for detecting Parkinson’s Disease from voice-derived Mel-Frequency Cepstral Coefficient (MFCC) time-series data using the NIH Bridge2AI Voice dataset. Designed a custom 1D ResNet architecture for temporal feature learning and compared its performance with traditional and transformer-based baselines. Manuscipt in progress. Outcome: The ResNet-1D model achieved AUC > 0.80, demonstrating robust discrimination of Parkinson’s Disease from control participants using non-invasive voice biomarkers.

442
Voice participants
AUC > 0.80
Detection accuracy
MFCC
Voice biomarker features
Python PyTorch ResNet-1D PatchTST MFCCs
Research

Multi-Domain ML Framework for Predicting Postoperative Delirium

Clinical Prediction • Perioperative Machine Learning

Developed a reproducible ML framework to predict postoperative delirium using perioperative EHR data, integrating intraoperative Mean Arterial Pressure (MAP) dynamics, frailty indicators, anesthesia exposures, surgical domain, and comorbidities. Implemented domain-specific models combined through a calibrated ensemble super-learner to balance performance and interpretability. Outcome: Achieved ROC-AUC > 0.85 with well-calibrated predictions and clinically coherent feature effects, supporting explainable and domain-aware AI for perioperative cognitive risk assessment.

ROC-AUC > 0.85
Cross-validated performance
5
Clinical feature domains
SHAP
Model interpretability
Python LightGBM XGBoost SHAP Bayesian hyperparameter tuning
Healthcare

Clinical Risk Analytics in Dentistry

Population Health • Predictive Modeling

Built one of the first integrated datasets combining clinical, behavioral, and socioeconomic indicators from a longitudinal dental population (>2,000 patients). Applied rigorous statistical and regression methods (logistic + ordinal models) to identify multifactorial determinants of caries, periodontal disease, and treatment outcomes. Designed reproducible R pipelines with validated outputs, enabling evidence-based preventive strategies and community outreach for underserved populations. Outcome: Early integration of clinical practice and data science; informed evidence-based outreach strategies for underserved populations.

2,000+
Patients studied
Validated
Regression models
Actionable
Risk patterns
R Logistic Regression Ordinal Regression Data Visualization Population Health

Professional Journey

From clinical practice to AI-driven health informatics and public health innovation

Academics

Healthcare Data Analyst Intern

Mercy Hospital | Springfield, Missouri Jun 2023 - Aug 2023

Delivered data-driven insights to optimize hospital operations, staffing, and patient care efficiency through advanced analytics and visualization projects.

Key Highlights & Projects:

  • Night Admissions Analysis: Identified trends and peak-hour distributions in nocturnist admissions, leading to recruitment of additional physicians and improved overnight patient care.
  • Hospitalist Workload & Billing Accuracy: Analyzed six months of encounter and procedure code data to balance staffing loads; uncovered charge code discrepancies that triggered billing process improvements.
  • Discharge Efficiency Study: Developed a new metric for physician discharge efficiency across Springfield & Lebanon facilities; provided leadership with actionable insights for resource allocation and process optimization.
Healthcare Analytics Python & Pandas Operational Efficiency Data Visualization Process Improvement

M.S. Health Informatics (STEM)

Indiana University – Indianapolis Aug 2022 - May 2024 GPA: 3.8 / 4.0

Graduate training in biomedical data science, clinical informatics, and public health analytics - combining coursework with applied projects in healthcare operations, prediction, and AI research.

Key Highlights & Projects:

  • Substance Use Risk Prediction: Integrated six years of NSDUH data, applied statistical tests and ML (CatBoost, Random Forest), and identified demographic/behavioral predictors to inform prevention strategies.
  • Hospital Length of Stay Modeling: Built predictive models (GLM, Random Forest, XGBoost) on hospitalization data; identified admission type, insurance, and demographics as key LOS drivers.
  • SecureData Shield 2.0: Designed a multi-layer IoT encryption framework combining AES-256, Blockchain, and deep learning for scalable and secure transmission of health device data.

Coursework: Clinical Decision Support Systems, Applied Statistical Methods, Health Informatics Standards, Clinical Information Systems, Project Management, Machine Learning in Bioinformatics, Health Information Exchange.

Biomedical Data Science Machine Learning Health Informatics Predictive Analytics Public Health Research Publications

B.D.S. – Bachelor of Dental Surgery

Devi Ahilya University - Indore, India Oct 2006 - Dec 2010 GPA: 3.3 / 4.0 (WES)

Comprehensive undergraduate training in biomedical sciences, clinical practice, and public health. Built a strong foundation in healthcare delivery and patient care, later integrating data science approaches to analyze outcomes and inform prevention strategies.

Key Highlights & Works:

  • Healthcare Domain Mastery: Gained expertise across general medicine, pathology, surgery, and community dentistry, grounding me in holistic patient care and health systems knowledge.
  • Applied Data Analytics: Conducted statistical studies (Chi-square, ANOVA, logistic regression) on >2,000 patient records to evaluate treatment outcomes, oral disease risk factors, and links to socioeconomic, hygiene, and diet variables.
  • Community Dentistry & Outreach: Designed and executed preventive programs, integrating screenings with data-driven analyses that improved treatment adherence and access to care in underserved populations.

Coursework:

General Anatomy & Physiology, Dental Materials, Pathology & Microbiology, Dental Pharmacology, Oral Anatomy & Histology, General Medicine & Surgery, Oral Pathology, Community Dentistry, Prosthodontics, Endodontics, Oral Surgery & Anesthesia, Periodontics, Orthodontics, Pedodontics, Oral Medicine & Radiology.

Clinical Training Healthcare Systems Data Analytics Public Health

Industry Experience

Senior Data Analyst

Indiana University - NIH CIMDAR-HIVE Program Aug 2025 - Present

Contributing to NIH-funded multimodal health data science initiatives, developing reproducible pipelines and benchmarking workflows for sepsis prediction, clinical time-series forecasting, and speech-based disease detection - advancing biomedical informatics through applied AI methods.

    • Sepsis Forecasting: Designed and validated pipelines on the MIMIC-IV ICU dataset (>70,000 admissions), benchmarking traditional forecasting methods against deep learning architectures, delivering clinically interpretable and reliable models for early sepsis detection.
    • Voice-Based Parkinson’s Detection Study - NIH Bridge2AI: Built reproducible pipelines to evaluate voice-derived digital biomarkers for Parkinson’s Disease using the NIH Bridge2AI Voice dataset (442 participants). Preprocessed Mel-Frequency Cepstral Coefficient (MFCC) time-series and developed a residual convolutional model (ResNet-1D) that achieved high discrimination and robust recall, demonstrating the feasibility of non-invasive, voice-based disease phenotyping.
    • Postoperative Delirium Prediction Study - Dept. of Anesthesiology & Critical Care Medicine: Developed a reproducible, multi-domain ML framework for predicting postoperative delirium using >70M perioperative EHR records, modeling intraoperative MAP dynamics, frailty, anesthesia exposure, and surgical context. The stacked LightGBM-XGBoost-logistic ensemble achieved ROC-AUC > 0.85 with strong calibration and SHAP-based interpretability, aligning model structure with clinical reasoning.
    • Training & Equity: Created applied datathon assets for HBCUs/MSIs, embedding fairness and bias-check methods into time-series pipelines to promote equitable AI adoption in clinical and public health research.
    Clinical Forecasting Deep Learning Health Equity Multimodal Data

Public Health Informatics Fellow

US Centers for Disease Control and Prevention (CDC)Aug 2024 - Jul 2025

At CDC’s Mycotic Diseases Branch, I engineered national-scale informatics systems that modernized fungal surveillance, automated data stewardship, and advanced AI evaluation. My work combined deep technical skill (Python, SQL, Palantir Foundry, GPT-4, R, geocoding) with measurable public health impact, directly strengthening outbreak response and national data infrastructure.

  • FungiSurv Infrastructure: Designed and deployed AI-assisted ingestion and schema-mapping pipelines in Palantir Foundry, integrating GPT-4 based column alignment, JSON transformations, and version-controlled ingestion. Reduced manual data harmonization by ~60% and improved timeliness of multistate fungal surveillance.
  • Facility Matching & Geocoding: Built semi-automated facility linkage pipelines using SQL string-distance methods, CMS/NPI registries, and geocoding to enhance entity resolution and spatial accuracy for antimicrobial resistance tracking.
  • NNDSS ODSE Mining: Engineered dynamic SQL workflows across >200 relational tables to recover and map Candida auris elements missing from upstream schemas, improving completeness and HL7 v2 interoperability for national case reporting.
  • FORT Outbreak Tracker: Enhanced CDC’s outbreak dashboard with filters, exports (CSV/Excel/PDF), and new Donor-Derived Infection entities. Gave epidemiologists faster, easier access to 200k+ outbreak records - enabling quicker nationwide response.
  • DAART JSON Workflow: Replaced contractor-built logic with an in-house Python pipeline that flattens, cleans, pivots, and auto-publishes JSON data for Data for Action on Antimicrobial Resistance Threats (DAART). Fully automated scheduling triggers, QA-ready datasets, and scalable workflows for future production migration.
  • Daily Data Stewardship: Served as primary data steward monitoring 1CDP pipelines for build, schedule, and data-health failures. Diagnosed root causes, deployed Python/R fixes, and implemented proactive validation checks - ensuring continuity, preventing downtime, and preserving data trustworthiness.
  • Branch Capacity Building: Led GitLab support tracker for technical requests (Python, SQL, Foundry), and delivered live training sessions on 1 CDC Data Platform (1CDP) tools (Pipeline Builder, Data Lineage). Directly upskilled MDB staff and strengthened internal independence in informatics.
  • AI Tiger Team (Core Contributor): Designed and tested 2,556 GPT-4 prompts across six public health domains, ran evaluations with OpenAI’s o3 model, and co-authored analyses showing AI’s value in rapid research synthesis and planning. Manuscript in progress.
  • Awards & Recognition: Honored as a CDC AI Champion with the Generative AI Early Adopter Badge. Elected Executive Co-Chair of the CDC Fellows Collective, contributing to strategy and professional development across the agency.
Nationwide Surveillance AI-Driven Informatics SQL & Data Engineering Pipeline Reliability Palantir Foundry Public Health Impact

Junior Data Scientist, Dept. of Anesthesiology

Indiana University School of MedicineJan 2024 - May 2024

Conducted advanced perioperative analytics, engineering novel hemodynamic features from intraoperative MAP time-series to uncover mortality risk signatures and inform clinical decision-making.

  • Hemodynamic Feature Engineering: Extracted 42 novel features (variability, entropy, changepoints, Poincaré metrics) from 10+ years of intraoperative MAP series covering noncardiac, non-obstetric surgeries.
  • Advanced Time-Series Methods: Applied changepoint detection (PELT-Pruned Exact Linear Time) algorithm, dynamic time warping, and complexity analysis to characterize instability patterns predictive of 30-day mortality.
  • Predictive Modeling: Integrated 20 statistically significant predictors into logistic regression and survival models, highlighting dynamic MAP instability as critical risk markers.
  • Scientific Impact: Findings accepted for presentation at American Medical Informatics Association 2024 conference, strengthening evidence for novel intraoperative risk indicators and their role in clinical mortality prediction models.
Time-Series Analytics Change Point Detection Pruned Exact Linear Time L1 Regularization Survival Analysis Clinical Informatics

Technical Skills

Biomedical data science • anesthesiology analytics • reproducible AI pipelines

Programming & Data Engineering

Python (pandas, NumPy, PySpark, scikit-learn, PyTorch) 95%
R (tidyverse, caret, ggplot2, stats) 88%
SQL (T-SQL, PostgreSQL, Snowflake, dynamic SQL) 92%
Data Pipelines & APIs (ETL/ELT, REST, JSON, Foundry Pipeline Builder, FHIR APIs) 90%
Data Ops & Versioning (lineage, QA, CI/CD, automated testing, reproducibility) 87%

Statistical Modeling & Machine Learning

Classical Statistics (parametric/non-parametric, ANOVA, χ², regression, mixed models) 91%
Predictive Modeling (Logit, RF, XGBoost, CatBoost, LightGBM, ensemble stacking, causal inference) 93%
Time-Series & Physiologic Signals (ARIMA, Prophet, LSTM, TFT, PELT changepoints, entropy, Poincaré) 90%
Biostatistics (survival analysis, calibration, Bayesian modeling, multivariable inference) 88%
Explainability & Model Evaluation (SHAP, permutation, subgroup bias/fairness, calibration curves) 84%

Healthcare & Clinical Informatics

Clinical Data Standards (HL7 v2, ICD-10, LOINC, SNOMED, FHIR) 86%
EHR & Clinical Analytics (perioperative, sepsis, delirium, outcomes) 91%
Public Health Informatics (surveillance, DCIPHER, NNDSS ODSE, FungiSurv) 89%
Data Governance & Quality (auditability, lineage, de-identification, FAIR data principles) 88%

Imaging, Signals & Text Analytics

Medical Imaging (DICOM, OHIF, DICOM-SR, segmentation, U-Net, CNNs) 82%
Physiologic Monitoring (MAP, ECG, pulse, variability, entropy) 90%
Speech & Voice Analysis (MFCCs, LSTM, ResNet-1D, Transformer architectures) 82%
Natural Language Processing (clinical text, embeddings, topic models, transformers) 80%

Reproducibility, Infrastructure & Tools

Version Control (Git, GitLab, CI/CD, documentation) 90%
Model Deployment & MLOps (Docker, APIs, reproducible pipelines) 85%
Palantir Foundry / 1CDP (Workshop, transforms, data health) 92%
Cloud & Compute (AWS EC2/S3, GCP, HPC clusters) 78%

Visualization & Scientific Communication

Data Visualization (Tableau, Power BI, Plotly, Matplotlib) 90%
Manuscript-Ready Reporting (figures, dashboards, reproducible notebooks) 88%
Collaboration & Communication (interdisciplinary teams, PIs, clinicians) 92%

Certifications

Validations across Analytics, AI, and Clinical Research

CDC AI Excellence certificate thumbnail

AI Excellence

US Centers for Disease Control and Prevention

2025

View Certificate
Palantir Certificate

Foundry and AIP Builder Foundations

Palantir Technologies

2024

View Certificate
CME Credits

CDC Epidemic Intelligence Service (EIS) Conference 2025 - CME Credits (ACCME, ACPE, ANCC)

U.S. Centers for Disease Control and Prevention (CDC)

2025

View Certificates
NASA Open Science 101 Badge

NASA Open Science 101

Issued by NASA Open Science

2024

View Badge
IEEE CBMS 2024 Conference

Author - IEEE CBMS 2024 Conference (Peer-Reviewed Paper)

IEEE Computer-Based Medical Systems (CBMS), Guadalajara, Mexico

2024

View Certificates
ANSI IACET CEU cert

CDC EIS Conference 2025 - Continuing Education Units (ANSI/IACET Accredited)

U.S. Centers for Disease Control and Prevention (CDC)

2025

View Certificate
citi_biomedR

Biomedical Responsible Conduct of Research

CITI Program

2023

View Certificate
Biomedical Researcher

Human Research - Biomedical Researcher

CITI Program

2022

View Certificate
GCP – Social and Behavioral Research Best Practices for Clinical Research

GCP – Social and Behavioral Research Best Practices for Clinical Research

CITI Program

2023

View Certificate

Publications & Abstracts

Peer-reviewed papers, conference abstracts, and ongoing manuscripts in biomedical informatics & applied AI

AIME 2023 Paper

A General-Purpose AI Assistant Embedded in an Open-Source Radiology Information System

AIME 2023 (Springer LNCS) Author. Integrated an AI assistant into LibreHealth RIS with OHIF/DICOM-SR for in-workflow annotation, active learning, and few-shot retraining.

IEEE CBMS 2024

Interpolating and Forecasting Wound Trajectory using Machine Learning Approaches

IEEE CBMS 2024 Author. Built a forecasting pipeline (imputation + Prophet/LSTM/BiLSTM) for chronic wound healing trajectories using registry data.

AMIA 2024 Poster

Intraoperative MAP Time-Series Features Predictive of 30-Day Mortality

AMIA Annual Symposium 2024 Accepted poster. Engineered entropy, changepoints (PELT), and variability features to characterize hemodynamic instability linked to outcomes.

DHIS2 + OpenCPU

Enabling Data-Driven Exploration: DHIS2 + OpenCPU for In-Platform Statistics

DHIS2 Annual Conference 2024 Primary author. Delivered secure, in-app statistical testing within DHIS2 via OpenCPU, eliminating manual exports and improving reproducibility.

DHIS2 US Community Health

DHIS2 for U.S. Community & Public Health Management

DHIS2 Annual Conference 2024 Author. Demonstrated adaptation of DHIS2 for U.S. community health: surveillance, case management, and reporting pipelines.

Working Papers

Working Papers (In Preparation)

  • Evaluation of OpenAI “Deep Research” for Public Health - CDC AI Tiger Team (Core Contributor). Designed and refined 2,556 prompts across six domains (Research, Planning, Epidemiology, Communications, Policy, Legal); executed and troubleshot runs with OpenAI’s o3 Deep Research; supported prompt iteration, results capture, and analysis. From 207 prompts executed, 195 AI-generated reports were evaluated by 61 SMEs against 10 criteria, demonstrating promise for rapid literature review, communications, and strategic planning. Manuscript in preparation.
  • Modeling Postoperative Delirium Risk - Indiana University School of Medicine. Machine learning on perioperative EHR and intraoperative MAP time-series to identify physiologic signatures predictive of delirium.
  • Outcome Analysis of Therapeutic Intraoperative Interventions Retrospective study quantifying associations between vasopressor/antihypertensive use, intraoperative hypotension/hypertension, and postoperative mortality risk.
  • Leadership & Service

    Elected, technical, and program leadership across public health, research, and clinical practice

    Executive Co-Chair, Fellows Collective

    U.S. Centers for Disease Control and Prevention · 2025–2026

    Elected to represent CDC fellows agency-wide; co-led strategy, programming, and professional development to strengthen early-career scientific impact.

    • Coordinated cross-center initiatives, mentorship, and skills programming.
    • Advocated for equitable access to training, tools, and research opportunities.
    GovernanceStrategyMentorship

    Clinic Director & Head Dentist

    Family Dental Clinic, India · 2013–2022

    Directed operations across two practices; led clinical quality, training, and outreach while introducing analytics to improve outcomes and access.

    • Trained junior clinicians; standardized workflows and safety protocols.
    • Launched community screening programs informed by data analysis.
    People OpsQualityCommunity Health

    Informatics Lead - FungiSurv AI Schema Mapping

    CDC Mycotic Diseases Branch · 2024–2025

    Led development of an LLM-assisted schema mapping system

    • Architected a modular system with frontend upload, AI-assisted mapping, and human-in-the-loop review.
    • Developed Python-based transforms and logging in 1CDP.
    • Engineered structured JSON prompts for consistent outputs.
    • Implemented versioning, edit/review panels, and rollback.
    • Tested chunking & validation strategies for token limits.

    Combines AI innovation with reproducible informatics engineering …

    AI IntegrationPublic Health Informatics FoundryGPT-4

    Informatics Lead - Fungal Outbreak Response Tracker Tool (FORT)

    CDC Mycotic Diseases Branch · 2024–2025

    Led continuous enhancement of the Fungal Outbreak Response Tracker (FORT) - a national Palantir Foundry-based tool supporting real-time documentation, monitoring, and management of fungal outbreak responses. Delivered iterative improvements that enhanced usability, efficiency, and epidemiologic alignment.

    • Debugged and optimized filter logic and object-table sync for accurate data querying.
    • Added Active/Closed status columns and advanced filtering by pathogen, time, and location.
    • Implemented toggle controls for rapid response closure and re-opening within object cards.
    • Enabled PDF and Excel export for 200k+ outbreak records, improving reporting efficiency.
    • Expanded support for Donor-Derived Infection (DDI) tracking with conditional form logic.

    Practiced Agile, user-centric development with MDB epidemiologists, delivering small, testable updates informed by field feedback. Each iteration was documentation-driven and compliant with 1CDP Workshop standards.

    Product Leadership Agile Delivery UX for Epi Public Health Impact

    Data Steward - Pipeline Reliability & Data Health

    CDC Mycotic Diseases Branch · 2024–2025

    Serve as data steward ensuring uninterrupted operation, validation, and quality of surveillance pipelines, critical to maintaining national data integrity and reliability.

    • Monitor and troubleshoot build, schedule, and data health failures daily.
    • Perform root-cause analyses for logic, data, or platform-level errors.
    • Implement proactive validations to detect schema drift and missing fields.
    • Reduce downtime and manual triage through systematic pipeline improvements.
    Data ReliabilityStewardshipFoundryMonitoring

    Organizer & Trainer - “Data Huddle”

    CDC Mycotic Diseases Branch · 2024–2025

    Designed and led hands-on sessions to upskill epidemiologists and analysts in Palantir Foundry (1CDP) tools, pipeline troubleshooting, and data lineage.

    • Delivered live demos; built internal know-how.
    • Standardized guidance via GitLab tracker for branch-wide requests.
    EnablementTrainingTooling

    Program Co-Designer - HBCU/MSI Datathons

    NIH CIMDAR-HIVE (Indiana University) · 2025–Present

    Co-built datathon notebooks and benchmarking assets spanning ICU time-series and speech biomarkers, embedding fairness and reproducibility.

    • Packaged datasets, scoring, and baseline models for applied learning.
    • Integrated bias checks and documentation to promote equitable AI.
    EducationReproducibilityFairness

    Let's Collaborate

    Open to collaborations and career opportunities at the intersection of AI, health data science, and clinical research.

    Send a Message

    0 / 2000 characters

    Message Sent Successfully!

    Thanks for reaching out. I'll get back to you shortly.