Course Overview
Three projects — kernel methods, VGG-16 at scale, Bayesian networks
Machine Learning · Fall 2024 · UNT · Prof. Russel Pears · with Ramyasri Murugesan
Multi-class classification of 6 glass types from 9 chemical composition features (UCI dataset, 214 samples). Central experiment: systematic study of how RBF kernel transformation affects class separability — comparing the same classifiers before and after kernel mapping using t-SNE visualisation. Feature importance analysis via Logistic Regression coefficients identified and ablated the two least significant chemical features.
Why this matters for clinical AI: Kernel methods allow clinically meaningful non-linear separability without deep networks — useful when training data is scarce (common in rare disease classification). The RBF kernel experiment made the kernel trick concrete: seeing t-SNE before/after transformation demonstrated how non-linear boundaries become linearly separable.
Key learning
Feature importance analysis beyond permutation: Using Logistic Regression coefficients to identify and ablate the two least significant features — then measuring the actual performance change — showed that "feature importance" measured statistically doesn't always match importance for model performance. A useful calibration for any feature engineering pipeline.
Fine-tuning VGG-16 on 162,770 training images (CelebA) for gender classification. Automated hyperparameter search via KerasTuner across neuron count (256/512), dropout (0.3/0.4/0.5), and optimiser (Adam/SGD). Best config: Dense(256), Dropout(0.5), Adam. Test accuracy: 0.9594. Confusion matrix: 10,844 TN · 565 FP · 242 FN · 8,216 TP. Additional experiment: mouth width index computed from CelebA landmark coordinates, split into quartiles, tested as predictor of smiling classification.
Key learnings
Automated hyperparameter search at scale: KerasTuner on a 162K-image dataset — learning how to define a search space, run trials efficiently, and interpret results without overfitting to the validation set during search. This is the production-grade alternative to manual grid search.
Facial landmark analysis as continuous variable: Computing mouth width from raw pixel coordinates, converting to an index, dividing into quartiles, and testing interaction with a classification task — multi-variable experimental design directly applicable to clinical measurement-based subgroup analysis.
Construction and probabilistic querying of an 11-node DAG-based Bayesian Network for car fault diagnosis. Causal structure: battery age → battery dead → battery flat → {lights, gas gauge, car won't start}; alternator broken → no charging → battery flat; fanbelt broken → no charging. Probabilistic inference via marginalization — computing P(car won't start | +battery age) and multi-evidence queries by summing over intermediate variables.
Key learning
Probabilistic inference through marginalization by hand: Every other project uses gradient-based optimisation. This required manual computation of conditional probability distributions — implementing the chain rule of probability, not calling a library. Bayesian networks represent causal independence assumptions; building a DAG where direction matters gave a fundamentally different mental model of probabilistic reasoning that applies to clinical decision support (diagnosis as Bayesian inference).