Complete Project Portfolio
Fourteen projects. Two deployed clinical AI systems that pass real clinical thresholds. One novel research contribution in model compression. A body of work that runs from a B.Tech capstone in 2023 to a complete medical imaging AI stack in 2026 — each project building something the next one needed.
The Clinical AI Arc
Read these as a sequence, not a list. Each solved a different clinical problem and transferred a different capability to what came next.
The clinical problem: ASD diagnosis typically takes 18–24 months from referral to confirmed diagnosis — limited by specialist availability. Structural MRI is already routinely acquired. This system analyses that existing data algorithmically, flagging high-probability cases for expedited review.
In clinical terms: 95.6% sensitivity means 956 of every 1,000 children with ASD are correctly identified. 97.2% specificity means only 28 false alarms per 1,000 non-ASD children. The Brier score of 0.027 means stated confidence levels are genuinely calibrated — when the system says 85% probable, 85% is what it means.
Site variance is the critical deployment finding: Sensitivity spans 88.5% (PITT) to 98.5% (UM_1) — a 10 percentage-point gap driven by scanner heterogeneity. Site-specific calibration or ComBat harmonisation would be required before production deployment. This is documented in the Model Card with explicit implications.
The 2023→2026 rebuild: B.Tech baseline had no explainability, no uncertainty, no quality gating, no clinical output. The 2026 system added GradCAM + LIME dual explanation, MC-Dropout (30 stochastic passes), 4-metric quality gate, LLM clinical report, site reliability indicators, and FDA SaMD Class II governance documentation.
The ASD system classified from whole-brain structural MRI but measured nothing. Clinical practice often needs a precise numerical output — not just a probability. The fetal head project moved from classification to clinical measurement, adding temporal reasoning (cine-loop sequences) and a hard deployment constraint: the model needs to run in a busy ultrasound suite on shared hardware, not a research server. That deployment constraint was what motivated the CNN pruning work running in parallel. The question "how do we make this small enough to actually deploy?" became its own research thread.
The clinical problem: HC measurement is mandatory at every routine antenatal scan. Manual calliper placement takes 2–4 minutes, introduces up to 7mm inter-observer variation, and accumulates to ~153 sonographer hours per year at a unit doing 20 scans per day. This system replaces that step with a reproducible measurement to 1.75mm — 40% inside the ISUOG ±3mm acceptability threshold.
Course baseline → clinical system: The course project achieved 17.25mm MAE — over the ISUOG threshold by 5.75×. The post-course rebuild closed that gap to 1.75mm through three targeted changes: flood-fill correction of hollow-ellipse annotations (single biggest fix), boundary-weighted loss with distance-transform upweighting, and clinically-motivated augmentation (Rician speckle, not Gaussian). No architecture change — engineering, not a better model.
The cine-loop system: Sonographers assess HC over a probe sweep, not a single frozen frame. The temporal model processes 16-frame sequences via shared 2D U-Net encoder + temporal self-attention (MAE 2.10mm, ISUOG compliant). Training data synthesized via Pseudo-LDDM v2 (Ornstein-Uhlenbeck probe motion, per-frame skull variation, Rician speckle) when real cine acquisitions were clinically held.
Documented limitation: Third-trimester MAE 7.60mm exceeds the ISUOG threshold. Cause: acoustic shadowing from the ossified skull. Explicitly documented in Model Card with recommendation for manual verification at >30 weeks GA.
Research Contribution
A directed study with a clear research question, a novel contribution, rigorous evaluation, and an IEEE-format report. Originated from the deployment constraint in the fetal HC system.
The research question: When two convolutional filters are found to be redundant, standard structured pruning discards the weaker one permanently. Is it better to delete — or to synthesise a new filter that preserves the information from both?
The contribution: Hybrid Crossover replaces deletion with regression-based synthesis. Given two redundant filters A and B, a new filter is learned via least-squares regression to match the element-wise max of their activation maps — the peak activation from each parent, in a single filter. Integrated into a Global ILR scoring pipeline with hard accuracy guard rails (≤2pp overall, ≤6pp per-class on CelebA).
The counterintuitive result: Hybrid Crossover achieves 2× more compression AND higher accuracy AND 2× faster runtime than standard drop under identical constraints. Faster because regression-synthesized filters satisfy guard rails on the first attempt — standard drop under aggressive compression triggers expensive rollback loops that dominate total runtime.
Clinical deployment relevance: The CIFAR-10 5-CNN result (2.05× latency speedup, 1.2pp accuracy cost, 75% of B4 channels removed) demonstrates the method on a realistic inference backbone. Applied to the fetal HC segmentation backbone: same clinical accuracy, half the GPU memory, twice the inference speed — the difference between requiring a dedicated GPU server and running on existing radiology workstation infrastructure.
Clinical Systems
Substantial team projects with real datasets and documented clinical relevance. Individual contributions are clearly marked.
On-device arrhythmia and fall detection — entirely on-device, no cloud round-trip in a cardiac emergency. ECG stream: 1D CNN on MIT-BIH, class-weighted sampling for 75% normal-beat imbalance, TorchScript export at 0.033ms — 1,200× below the 40ms real-time clinical threshold. Fall stream: MobiFall, threshold set at 0.65 (not 0.50) to deliberately minimise missed falls at the cost of more false alarms. Threshold as clinical safety policy, not hyperparameter.
Binary cancer detection on PCam (Camelyon16). Key finding: 4-stage data-volume study showed non-monotonic AUC scaling — adding more data initially degraded performance before recovering. This is directly informative for clinical AI validation practice. Post-course additions: GradCAM spatial explainability (FDA CADe audit requirement), confidence-tiered decision zones (auto-clear at p<0.10 removes 24.5% of patches at 2.6% miss rate), full Model Card. Full-stack Django REST API + React frontend.
Technical Breadth
Each project below built a capability used somewhere in the clinical systems above. Included for breadth, not as clinical work.
107-feature librosa pipeline, hierarchical XGBoost (Order → Family), live AWS Streamlit deployment. Acoustic signal processing and cloud deployment skills — directly transferable to auscultation, phonocardiography, respiratory AI.
Case Study →VGG-16 + BiLSTM late fusion on MELD. Fusion underperformed individual streams — diagnosing why (temporal misalignment) taught more than a success would. Multimodal failure analysis is directly applicable to imaging + EHR pipelines.
Case Study →541K transactions on AWS S3 + EMR. RFM feature engineering + K-Means / GMM. Large-scale clinical data (EHR, PACS, claims) uses the same infrastructure pattern — PySpark on distributed compute, not pandas.
Case Study →Sobel + morphological operations, Gabor + Wavelet features, mIoU evaluation in MATLAB. Classical image processing foundations remain relevant for interpretable feature engineering where training data is scarce.
Case Study →Course Collections
Multi-project course pages — each covers the assignments within a single course that collectively built the groundwork for the clinical systems above.