Machine Learning — 3 Projects

Course Overview

Three projects — kernel methods, VGG-16 at scale, Bayesian networks

Machine Learning · Fall 2024 · UNT · Prof. Russel Pears · with Ramyasri Murugesan

Project 01 Glass Identification — Classical ML + RBF Kernel UCI Glass · KNN · Perceptron · Kernel Trick

Multi-class classification of 6 glass types from 9 chemical composition features (UCI dataset, 214 samples). Central experiment: systematic study of how RBF kernel transformation affects class separability — comparing the same classifiers before and after kernel mapping using t-SNE visualisation. Feature importance analysis via Logistic Regression coefficients identified and ablated the two least significant chemical features.

Why this matters for clinical AI: Kernel methods allow clinically meaningful non-linear separability without deep networks — useful when training data is scarce (common in rare disease classification). The RBF kernel experiment made the kernel trick concrete: seeing t-SNE before/after transformation demonstrated how non-linear boundaries become linearly separable.

Key learning

Feature importance analysis beyond permutation: Using Logistic Regression coefficients to identify and ablate the two least significant features — then measuring the actual performance change — showed that "feature importance" measured statistically doesn't always match importance for model performance. A useful calibration for any feature engineering pipeline.

Project 02 CelebA Facial Attribute Recognition — VGG-16 Fine-Tuning 162K Images · Gender Classification · KerasTuner · 0.9594 Accuracy

Fine-tuning VGG-16 on 162,770 training images (CelebA) for gender classification. Automated hyperparameter search via KerasTuner across neuron count (256/512), dropout (0.3/0.4/0.5), and optimiser (Adam/SGD). Best config: Dense(256), Dropout(0.5), Adam. Test accuracy: 0.9594. Confusion matrix: 10,844 TN · 565 FP · 242 FN · 8,216 TP. Additional experiment: mouth width index computed from CelebA landmark coordinates, split into quartiles, tested as predictor of smiling classification.

Key learnings

Automated hyperparameter search at scale: KerasTuner on a 162K-image dataset — learning how to define a search space, run trials efficiently, and interpret results without overfitting to the validation set during search. This is the production-grade alternative to manual grid search.

Facial landmark analysis as continuous variable: Computing mouth width from raw pixel coordinates, converting to an index, dividing into quartiles, and testing interaction with a classification task — multi-variable experimental design directly applicable to clinical measurement-based subgroup analysis.

Project 03 Bayesian Network — Car Fault Diagnosis DAG · Conditional Probability Tables · Probabilistic Inference · networkx

Construction and probabilistic querying of an 11-node DAG-based Bayesian Network for car fault diagnosis. Causal structure: battery age → battery dead → battery flat → {lights, gas gauge, car won't start}; alternator broken → no charging → battery flat; fanbelt broken → no charging. Probabilistic inference via marginalization — computing P(car won't start | +battery age) and multi-evidence queries by summing over intermediate variables.

Key learning

Probabilistic inference through marginalization by hand: Every other project uses gradient-based optimisation. This required manual computation of conditional probability distributions — implementing the chain rule of probability, not calling a library. Bayesian networks represent causal independence assumptions; building a DAG where direction matters gave a fundamentally different mental model of probabilistic reasoning that applies to clinical decision support (diagnosis as Bayesian inference).

scikit-learnTensorFlow/KerasKerasTunerVGG-16networkxCelebAUCI Glass

Machine Learning
Course Projects — 3 Studies

Three projects — kernel methods, VGG-16 at scale, Bayesian networks

Key learning

Key learnings

Key learning

View all three notebooks and reports.

Machine LearningCourse Projects — 3 Studies

Three projects — kernel methods, VGG-16 at scale, Bayesian networks

Key learning

Key learnings

Key learning

View all three notebooks and reports.

Machine Learning
Course Projects — 3 Studies