CDC Certificate of Outstanding Contribution
Advancing Artificial Intelligence in Public Health.
Clinician-Turned Data Scientist | Ex-CDC | Advancing AI & Analytics for Public Health and Clinical Research
Healthcare & AI professional with 10+ years of combined clinical and data science expertise, bridging hands-on dental practice with advanced biomedical informatics. Former CDC Public Health Informatics Fellow, experienced in designing reproducible pipelines and scalable AI/ML solutions across multimodal health data - including EHRs, imaging, physiologic signals, and voice. Skilled in statistical modeling, time-series forecasting, and deep learning for risk prediction, clinical decision support, and public health surveillance. Strong track record of peer-reviewed publications, international conference presentations, and collaborative research with clinicians, public health agencies, and academic institutions. Passionate about translating AI innovations into equitable, real-world healthcare impact.
Health informatics researcher bridging clinical expertise with advanced data science to create reproducible, equitable, and deployable AI solutions for healthcare and public health.
Clinician-turned-data scientist with a decade of clinical experience, translating patient-level complexity into structured data problems. Skilled in interpreting biomedical, epidemiologic, and physiologic data to inform outcome prediction and health equity research.
Design and validate predictive pipelines using regression, ensemble methods, and deep learning architectures, including recurrent and transformer-based models for temporal and multimodal data, focused on interpretability, calibration, and fairness.
Build scalable and version-controlled data pipelines using Python, SQL, PySpark, and cloud-based platforms. Experienced in ETL design, schema mapping, and high-volume data processing that ensure reproducibility and auditability from raw data to model output.
Embed fairness and transparency into model development through bias analysis, stratified evaluation, and open documentation, advancing trustworthy, generalizable AI in real-world healthcare settings.
Clinical + public health data systems, human-in-the-loop AI, and scalable pipelines.
I architected the core ingestion flow for CDC’s fungal disease surveillance (FungiSurv) to handle diverse state CSV/Excel formats. The system validates uploads, uses GPT-4 to propose column-to-standard mappings, lets epidemiologists review/approve edits, then standardizes and integrates data with full versioning and audit logs. Outcome: less manual mapping, faster partner onboarding, and higher-quality, reproducible surveillance datasets.
Developed a semi-automated pipeline to standardize and match healthcare facilities across state surveillance data. Combined proportional/partial/fixed string distance metrics with CMS and NPI identifiers, then applied geocoding to resolve ambiguous matches. Outcome: higher-confidence matches, reduced manual review, and more precise spatial data for antimicrobial resistance risk analysis.
Built an end-to-end AI assistant within LibreHealth’s open-source Radiology Information System (RIS), integrating directly with the OHIF viewer. Enabled radiologists to train, retrain, and apply models in workflow using DICOM-SR overlays, active learning (few-shot retraining on feedback), and swarm learning to share model weights across users. Demonstrated with CheXNet on chest X-rays and presented at AIME 2023 (NSF-supported). Outcome: proof of concept for scalable, standards-based AI embedded in radiology workflows.
Built a reproducible pipeline to impute, interpolate, and forecast chronic wound healing trajectories across 14,571 wounds from 6,171 patients. Used XGBoost for demographic imputation, AIC-driven selection across interpolation methods (Linear, Krogh, Akima, RBF), and forecasting with Holt-Winters, ARIMA, Prophet, and deep models (LSTM, BiLSTM). Integrated subgroup fairness checks by gender, race, and ethnicity. Outcome: Published as second-author IEEE CBMS 2024 paper; demonstrated interpretable, equitable wound outcome forecasting.
Designed and built a DHIS2 web app that runs R code inside the platform via OpenCPU. Users select DHIS2 datasets through the API, write/execute R in an editor (RStudio-like layout), and view results and plots in real time, no CSV exports, preserving data governance and reducing friction for analysts and epidemiologists.
Led analysis of 42 intraoperative MAP (Mean Arterial Pressure) features from 10 years of EHR data in noncardiac surgeries. Identified dynamic blood pressure variability markers-percent shifts, prolonged hypotension, frequent changepoints, and entropy measures- as strong predictors of 30-day mortality, beyond conventional threshold methods. Outcome: Identified 20 high-value intraoperative hemodynamic predictors of mortality risk; presented at AMIA 2024 Annual Symposium.
Reverse-engineered the NNDSS Operational Data Store (ODSE) to recover priority elements missing from standard views. Built a full ERD across 200+ SQL tables and a dynamic SQL scanner to locate target values across free-text columns. Mapped recovered elements to US state's case IDs and NNDSS identifiers, establishing a validated pathway to enrich Candida auris surveillance without waiting for upstream schema changes. Outcome: Established validated linkage pathways between state case IDs and national identifiers, enabling more complete surveillance reporting.
Co-developed and transformed CDC’s internal outbreak tracker into a fast, analyst-friendly product in Palantir Foundry (1CDP). Fixed broken filtering, added inline response toggles for rapid status updates, enabled one-click PDF & large Excel exports, and extended the ontology to include Donor-Derived Infections (DDI). Outcome: faster and easier outbreak tracking for epidemiologists, enabling CDC to respond to multi-state fungal outbreaks more quickly and on time.
Developed and validated machine learning models for detecting Parkinson’s Disease from voice-derived Mel-Frequency Cepstral Coefficient (MFCC) time-series data using the NIH Bridge2AI Voice dataset. Designed a custom 1D ResNet architecture for temporal feature learning and compared its performance with traditional and transformer-based baselines. Manuscipt in progress. Outcome: The ResNet-1D model achieved AUC > 0.80, demonstrating robust discrimination of Parkinson’s Disease from control participants using non-invasive voice biomarkers.
Developed a reproducible ML framework to predict postoperative delirium using perioperative EHR data, integrating intraoperative Mean Arterial Pressure (MAP) dynamics, frailty indicators, anesthesia exposures, surgical domain, and comorbidities. Implemented domain-specific models combined through a calibrated ensemble super-learner to balance performance and interpretability. Outcome: Achieved ROC-AUC > 0.85 with well-calibrated predictions and clinically coherent feature effects, supporting explainable and domain-aware AI for perioperative cognitive risk assessment.
Built one of the first integrated datasets combining clinical, behavioral, and socioeconomic indicators from a longitudinal dental population (>2,000 patients). Applied rigorous statistical and regression methods (logistic + ordinal models) to identify multifactorial determinants of caries, periodontal disease, and treatment outcomes. Designed reproducible R pipelines with validated outputs, enabling evidence-based preventive strategies and community outreach for underserved populations. Outcome: Early integration of clinical practice and data science; informed evidence-based outreach strategies for underserved populations.
From clinical practice to AI-driven health informatics and public health innovation
Delivered data-driven insights to optimize hospital operations, staffing, and patient care efficiency through advanced analytics and visualization projects.
Key Highlights & Projects:
Graduate training in biomedical data science, clinical informatics, and public health analytics - combining coursework with applied projects in healthcare operations, prediction, and AI research.
Key Highlights & Projects:
Coursework: Clinical Decision Support Systems, Applied Statistical Methods, Health Informatics Standards, Clinical Information Systems, Project Management, Machine Learning in Bioinformatics, Health Information Exchange.
Comprehensive undergraduate training in biomedical sciences, clinical practice, and public health. Built a strong foundation in healthcare delivery and patient care, later integrating data science approaches to analyze outcomes and inform prevention strategies.
Key Highlights & Works:
Coursework:
General Anatomy & Physiology, Dental Materials, Pathology & Microbiology, Dental Pharmacology, Oral Anatomy & Histology, General Medicine & Surgery, Oral Pathology, Community Dentistry, Prosthodontics, Endodontics, Oral Surgery & Anesthesia, Periodontics, Orthodontics, Pedodontics, Oral Medicine & Radiology.
Contributing to NIH-funded multimodal health data science initiatives, developing reproducible pipelines and benchmarking workflows for sepsis prediction, clinical time-series forecasting, and speech-based disease detection - advancing biomedical informatics through applied AI methods.
At CDC’s Mycotic Diseases Branch, I engineered national-scale informatics systems that modernized fungal surveillance, automated data stewardship, and advanced AI evaluation. My work combined deep technical skill (Python, SQL, Palantir Foundry, GPT-4, R, geocoding) with measurable public health impact, directly strengthening outbreak response and national data infrastructure.
Conducted advanced perioperative analytics, engineering novel hemodynamic features from intraoperative MAP time-series to uncover mortality risk signatures and inform clinical decision-making.
Biomedical data science • anesthesiology analytics • reproducible AI pipelines
Validations across Analytics, AI, and Clinical Research



U.S. Centers for Disease Control and Prevention (CDC)
2025
View Certificates

IEEE Computer-Based Medical Systems (CBMS), Guadalajara, Mexico
2024
View Certificates
U.S. Centers for Disease Control and Prevention (CDC)
2025
View Certificate


CITI Program
2023
View CertificatePeer-reviewed papers, conference abstracts, and ongoing manuscripts in biomedical informatics & applied AI
AIME 2023 (Springer LNCS) Author. Integrated an AI assistant into LibreHealth RIS with OHIF/DICOM-SR for in-workflow annotation, active learning, and few-shot retraining.
IEEE CBMS 2024 Author. Built a forecasting pipeline (imputation + Prophet/LSTM/BiLSTM) for chronic wound healing trajectories using registry data.
AMIA Annual Symposium 2024 Accepted poster. Engineered entropy, changepoints (PELT), and variability features to characterize hemodynamic instability linked to outcomes.
DHIS2 Annual Conference 2024 Primary author. Delivered secure, in-app statistical testing within DHIS2 via OpenCPU, eliminating manual exports and improving reproducibility.
DHIS2 Annual Conference 2024 Author. Demonstrated adaptation of DHIS2 for U.S. community health: surveillance, case management, and reporting pipelines.
Elected, technical, and program leadership across public health, research, and clinical practice
Elected to represent CDC fellows agency-wide; co-led strategy, programming, and professional development to strengthen early-career scientific impact.
Directed operations across two practices; led clinical quality, training, and outreach while introducing analytics to improve outcomes and access.
Led development of an LLM-assisted schema mapping system
Combines AI innovation with reproducible informatics engineering …
Led continuous enhancement of the Fungal Outbreak Response Tracker (FORT) - a national Palantir Foundry-based tool supporting real-time documentation, monitoring, and management of fungal outbreak responses. Delivered iterative improvements that enhanced usability, efficiency, and epidemiologic alignment.
Practiced Agile, user-centric development with MDB epidemiologists, delivering small, testable updates informed by field feedback. Each iteration was documentation-driven and compliant with 1CDP Workshop standards.
Serve as data steward ensuring uninterrupted operation, validation, and quality of surveillance pipelines, critical to maintaining national data integrity and reliability.
Designed and led hands-on sessions to upskill epidemiologists and analysts in Palantir Foundry (1CDP) tools, pipeline troubleshooting, and data lineage.
Co-built datathon notebooks and benchmarking assets spanning ICU time-series and speech biomarkers, embedding fairness and reproducibility.
Open to collaborations and career opportunities at the intersection of AI, health data science, and clinical research.