The Clinical AI Arc

Two flagship systems. One coherent clinical capability.

Read these as a sequence, not a list. Each solved a different clinical problem and transferred a different capability to what came next.

01
ASD Detection from Structural MRI
Clinical classification from NIfTI brain volumes. Learned: multi-site validation, quality gating, dual XAI, the limit of a high-AUC model with no explainability or uncertainty.
Also: established the 8-stage clinical inference pipeline used in all subsequent systems.
02
Fetal Head Circumference Measurement
Clinical measurement from ultrasound. Learned: segmentation, temporal cine reasoning, Hadlock GA estimation, clinical threshold validation. Exposed the deployment gap → led directly to CNN pruning research.
Also: Hybrid Crossover pruning (see Research section) runs in parallel, motivated by this system.
★ Origin Project · B.Tech 2023 + MS 2026 Rebuild · Neuroimaging AI
ASD Detection from Structural Brain MRI
1,067 subjects · 17 acquisition sites · ABIDE-I · NIfTI volumes
Live on HuggingFace
0.994
AUC-ROC
95.6%
Sensitivity
97.2%
Specificity
0.027
Brier Score
17
Acquisition Sites

The clinical problem: ASD diagnosis typically takes 18–24 months from referral to confirmed diagnosis — limited by specialist availability. Structural MRI is already routinely acquired. This system analyses that existing data algorithmically, flagging high-probability cases for expedited review.

In clinical terms: 95.6% sensitivity means 956 of every 1,000 children with ASD are correctly identified. 97.2% specificity means only 28 false alarms per 1,000 non-ASD children. The Brier score of 0.027 means stated confidence levels are genuinely calibrated — when the system says 85% probable, 85% is what it means.

Site variance is the critical deployment finding: Sensitivity spans 88.5% (PITT) to 98.5% (UM_1) — a 10 percentage-point gap driven by scanner heterogeneity. Site-specific calibration or ComBat harmonisation would be required before production deployment. This is documented in the Model Card with explicit implications.

The 2023→2026 rebuild: B.Tech baseline had no explainability, no uncertainty, no quality gating, no clinical output. The 2026 system added GradCAM + LIME dual explanation, MC-Dropout (30 stochastic passes), 4-metric quality gate, LLM clinical report, site reliability indicators, and FDA SaMD Class II governance documentation.

Clinical proposition
Pre-assessment triage layer targeting the 18–24 month diagnostic delay — surfaces high-probability cases for expedited specialist review
What a clinician receives
P(ASD) with CI, GradCAM + LIME spatial heatmaps, MC-Dropout uncertainty σ, site reliability badge, LLM-generated PDF report with regulatory framing
Documented failure mode
Confident false positives (σ ≈ 0.005) — the dangerous mode where the model is wrong and certain. Human override protocol required. Documented explicitly in Model Card.
Regulatory status
Research-grade decision support only. FDA SaMD Class II / De Novo pathway, sex and site bias audits, and known failure modes documented in full Model Card.
→ What the ASD project couldn't do — and what came next

The ASD system classified from whole-brain structural MRI but measured nothing. Clinical practice often needs a precise numerical output — not just a probability. The fetal head project moved from classification to clinical measurement, adding temporal reasoning (cine-loop sequences) and a hard deployment constraint: the model needs to run in a busy ultrasound suite on shared hardware, not a research server. That deployment constraint was what motivated the CNN pruning work running in parallel. The question "how do we make this small enough to actually deploy?" became its own research thread.

★ Capstone System · Obstetric AI · Deployed · CSCE 6260 + Post-course rebuild
Fetal Head Circumference Measurement
HC18 dataset · Static + cine-loop · ISUOG ±3mm threshold · Hadlock 1984 GA estimation
Live on HuggingFace
1.75mm
HC Error (ISUOG ≤3mm)
97.36%
Dice (Static)
2.10mm
HC Error (Cine-loop)
3.4×
Better than SOTA
153 hrs
Saved / unit / year

The clinical problem: HC measurement is mandatory at every routine antenatal scan. Manual calliper placement takes 2–4 minutes, introduces up to 7mm inter-observer variation, and accumulates to ~153 sonographer hours per year at a unit doing 20 scans per day. This system replaces that step with a reproducible measurement to 1.75mm — 40% inside the ISUOG ±3mm acceptability threshold.

Course baseline → clinical system: The course project achieved 17.25mm MAE — over the ISUOG threshold by 5.75×. The post-course rebuild closed that gap to 1.75mm through three targeted changes: flood-fill correction of hollow-ellipse annotations (single biggest fix), boundary-weighted loss with distance-transform upweighting, and clinically-motivated augmentation (Rician speckle, not Gaussian). No architecture change — engineering, not a better model.

The cine-loop system: Sonographers assess HC over a probe sweep, not a single frozen frame. The temporal model processes 16-frame sequences via shared 2D U-Net encoder + temporal self-attention (MAE 2.10mm, ISUOG compliant). Training data synthesized via Pseudo-LDDM v2 (Ornstein-Uhlenbeck probe motion, per-frame skull variation, Rician speckle) when real cine acquisitions were clinically held.

Documented limitation: Third-trimester MAE 7.60mm exceeds the ISUOG threshold. Cause: acoustic shadowing from the ossified skull. Explicitly documented in Model Card with recommendation for manual verification at >30 weeks GA.

Clinical proposition
Replaces 2–4 min manual calliper placement with reproducible automated measurement. 153 sonographer hours saved per unit per year at 20 scans/day.
What a clinician receives
HC in mm, gestational age ± 2-week CI, trimester classification, GradCAM++ overlay, frame-level uncertainty map, dual-mode PDF report (LLM + template)
Deployment connection
Hybrid Crossover pruning (see Research section) delivers 2× compression + accuracy improvement for this backbone — enabling clinical hardware deployment without accuracy penalty
Governance
GA-trimester bias audit, FDA SaMD Class II + EU IVDR Class B framing, Model Card with acoustic-shadowing limitation explanation and trimester-specific reliability indicators

Research Contribution

A novel compression method — motivated by clinical deployment.

A directed study with a clear research question, a novel contribution, rigorous evaluation, and an IEEE-format report. Originated from the deployment constraint in the fetal HC system.

★ Novel Method · Directed Study · CSCE 5934 · Prof. Russel Pears
CNN Filter Pruning — Hybrid Crossover Method
VGG-16 · CelebA · CIFAR-10 · IEEE-format report · Individual contribution
Channel Compression
+0.37%
Accuracy Improvement
Faster Runtime
2.05×
Best Latency Speedup

The research question: When two convolutional filters are found to be redundant, standard structured pruning discards the weaker one permanently. Is it better to delete — or to synthesise a new filter that preserves the information from both?

The contribution: Hybrid Crossover replaces deletion with regression-based synthesis. Given two redundant filters A and B, a new filter is learned via least-squares regression to match the element-wise max of their activation maps — the peak activation from each parent, in a single filter. Integrated into a Global ILR scoring pipeline with hard accuracy guard rails (≤2pp overall, ≤6pp per-class on CelebA).

The counterintuitive result: Hybrid Crossover achieves 2× more compression AND higher accuracy AND 2× faster runtime than standard drop under identical constraints. Faster because regression-synthesized filters satisfy guard rails on the first attempt — standard drop under aggressive compression triggers expensive rollback loops that dominate total runtime.

Clinical deployment relevance: The CIFAR-10 5-CNN result (2.05× latency speedup, 1.2pp accuracy cost, 75% of B4 channels removed) demonstrates the method on a realistic inference backbone. Applied to the fetal HC segmentation backbone: same clinical accuracy, half the GPU memory, twice the inference speed — the difference between requiring a dedicated GPU server and running on existing radiology workstation infrastructure.

The novel idea
Synthesis over deletion: create a new filter from two redundant parents rather than discarding one. Reframes compression as an information-preservation problem, not a removal problem.
Why it's faster AND better
Synthesized filters satisfy accuracy guard rails on the first attempt. Standard drop under aggressive compression repeatedly violates constraints and triggers rollbacks — those rollback loops dominate runtime.
Limitations
Middle layers (B2–B3) resist compression due to low filter redundancy. Evaluated on binary classification only — multi-class and medical imaging domain extension are the natural next steps.
PyTorchVGG-16ILR SaliencyGuard RailsCelebACIFAR-10

Clinical Systems

Deployed systems from course and team work.

Substantial team projects with real datasets and documented clinical relevance. Individual contributions are clearly marked.

Edge AI · Wearable · 2-person team
WECARE — Cardiac & Fall Detection
ECG F1: 0.9864 · 0.033ms inference · 8 missed falls / 228

On-device arrhythmia and fall detection — entirely on-device, no cloud round-trip in a cardiac emergency. ECG stream: 1D CNN on MIT-BIH, class-weighted sampling for 75% normal-beat imbalance, TorchScript export at 0.033ms — 1,200× below the 40ms real-time clinical threshold. Fall stream: MobiFall, threshold set at 0.65 (not 0.50) to deliberately minimise missed falls at the cost of more false alarms. Threshold as clinical safety policy, not hyperparameter.

Individual contribution: full ECG pipeline (bandpass filter, R-peak segmentation, 1D CNN, class weighting, evaluation) + TorchScript export. Teammate: full IMU pipeline.
Digital Pathology · Full-Stack · 5-person team
Histopathologic Cancer Detection
AUC 0.921 · F1 0.819 · 187K patches · CADe compliance

Binary cancer detection on PCam (Camelyon16). Key finding: 4-stage data-volume study showed non-monotonic AUC scaling — adding more data initially degraded performance before recovering. This is directly informative for clinical AI validation practice. Post-course additions: GradCAM spatial explainability (FDA CADe audit requirement), confidence-tiered decision zones (auto-clear at p<0.10 removes 24.5% of patches at 2.6% miss rate), full Model Card. Full-stack Django REST API + React frontend.

Individual contribution: one of two model developers in team of five. Led CNN architecture, training, hyperparameter tuning, and post-course GradCAM + governance additions.

Technical Breadth

Skills that transfer to clinical AI pipelines.

Each project below built a capability used somewhere in the clinical systems above. Included for breadth, not as clinical work.

Audio · Cloud
BirdCLEF Audio Classification

107-feature librosa pipeline, hierarchical XGBoost (Order → Family), live AWS Streamlit deployment. Acoustic signal processing and cloud deployment skills — directly transferable to auscultation, phonocardiography, respiratory AI.

Case Study →
Multimodal · Fusion
Multimodal Emotion Recognition

VGG-16 + BiLSTM late fusion on MELD. Fusion underperformed individual streams — diagnosing why (temporal misalignment) taught more than a success would. Multimodal failure analysis is directly applicable to imaging + EHR pipelines.

Case Study →
Big Data · Cloud
Customer Segmentation — PySpark

541K transactions on AWS S3 + EMR. RFM feature engineering + K-Means / GMM. Large-scale clinical data (EHR, PACS, claims) uses the same infrastructure pattern — PySpark on distributed compute, not pandas.

Case Study →
Classical Vision · MATLAB
Industrial Quality Classification

Sobel + morphological operations, Gabor + Wavelet features, mIoU evaluation in MATLAB. Classical image processing foundations remain relevant for interpretable feature engineering where training data is scarce.

Case Study →

Course Collections

Theoretical foundations and regulatory context.

Multi-project course pages — each covers the assignments within a single course that collectively built the groundwork for the clinical systems above.

Machine Learning · Fall 2024 · Prof. Russel Pears
Machine Learning — 3 Projects
Glass ID (RBF SVM), CelebA attribute recognition (VGG-16, 0.9594 acc), Bayesian network construction. Kernel methods, transfer learning, probabilistic inference.
View →
Fundamentals of AI · Spring 2025 · Prof. Russel Pears
AI Algorithms — 3 Projects
Warehouse robot (Dijkstra + A*), genetic algorithm for job-shop scheduling (makespan 298), value iteration RL (100% goal success, 22 iterations).
View →
AI in Wearables & Healthcare · Fall 2025 · Prof. Mahdi Pedram
Clinical AI Stack — 15 Activities
FDA/SaMD frameworks, Seeed XIAO nRF52840 hardware (PPG + IMU at 50Hz), YOLOv8 food detection (mAP@50 = 0.913), MIT App Inventor mobile deployment, MIMIC-IV ICU analysis. Real hardware, real clinical regulatory context.
View →
Scientific Data Visualization · Summer 2025 · Prof. Zeenat Tariq
Data Visualization — 2 Assignments
Tableau (Global Superstore), D3.js via Observable (Iris), Power BI (tech layoffs). Clinical dashboards and decision-support interfaces require these same skills.
View →
Applications of AI in Health · Spring 2026 · Prof. Haihua Chen · Health Informatics Dept.
HINF 5506 — In Progress
Clinical decision support, health informatics, AI integration into care workflows, clinical necessity vs. technical feasibility. Bridges the gap between notebook model output and clinical deployment requirements.
Prof. Chen, Health Informatics · Completing Spring 2026
In Progress
Ready to talk about clinical AI?
MS AI (Biomedical) · UNT · Available June 2026 · STEM OPT · Open to relocation
Get in Touch → ← Home