// Audio Classification · Feature Engineering · Hierarchical ML · AWS

BirdCLEF 2022 — Hierarchical
Bird Species Classification

152 Hawaiian bird species from passive audio recordings. 107-feature librosa acoustic pipeline, hierarchical XGBoost classification at Order and Family taxonomic levels, and live AWS Streamlit deployment — demonstrating signal processing and cloud infrastructure transferable to clinical audio AI.

152 Species107 Acoustic FeaturesHierarchical ClassificationAWS Deployedlibrosa · XGBoost · BayesSearchCV · SDAI Fall 2024

Overview

152 bird species from passive audio — hierarchical classification at Order and Family levels

A deliberate choice to use structured acoustic feature engineering rather than end-to-end deep learning — demonstrating signal processing transferable to clinical audio.

The BirdCLEF 2022 Kaggle competition presents passive audio recordings from Hawaiian bird monitoring stations under real field conditions: background noise, wind, and overlapping calls across 152 species. Rather than raw spectrogram deep learning, this project centers on systematic acoustic feature engineering with librosa and hierarchical XGBoost classification — predicting taxonomic Order first (17 classes), then narrowing to Family (47 classes), rather than direct species prediction from 107 tabular features.

Why hierarchical classification matters clinically: The same decomposition principle applies to clinical diagnostic AI — predicting broad clinical category first (e.g., cardiac vs. respiratory), then narrowing. It's more interpretable than a single 152-class head and makes the AI's reasoning auditable at each level. The acoustic feature engineering pipeline (MFCC, Chroma, Spectral Contrast, Spectral Rolloff) is directly transferable to clinical audio signals: auscultation, phonocardiography, lung sound analysis, and wearable biosignal processing.

152
Bird Species
Hawaiian endemic
107
Acoustic Features
Augmented librosa pipeline
AWS
Live Deployment
Streamlit + confidence scores

Feature Engineering

107-feature acoustic pipeline — built with librosa from scratch

All features extracted from raw .ogg audio with a 1–8 kHz Butterworth bandpass filter to isolate typical birdsong frequencies and suppress environmental noise. Base set: 13 MFCC means, 12 Chroma pitch-class means, 7 Spectral Contrast means, Spectral Centroid, Bandwidth, Zero Crossing Rate, and geographic coordinates (encoded as Cartesian x/y/z on a unit sphere to preserve continuity at ±180°). Augmented set adds temporal derivatives (delta-MFCC), standard deviations, and additional energy/rolloff features — 107 total.

Hierarchical classification pipeline

XGBoost with Bayesian hyperparameter search (BayesSearchCV) at both Order and Family level. SMOTE class-balancing before fitting. Stratified 70/15/15 split. Feedforward neural network comparison (grid search over depth/width/dropout). AWS Streamlit deployment with per-class confidence scores.

What this project built that transfers to clinical AI

Acoustic signal processing from raw audio: Understanding what MFCC, Chroma, and Spectral Rolloff physically represent — not just calling them as functions — is directly transferable to clinical audio signal analysis (auscultation, ECG, respiratory monitoring).

Hierarchical taxonomic classification design: Decomposing a 152-class problem into a two-level structure with separate models at each level introduced structured multi-stage inference — the same architecture used in clinical decision support systems that triage before diagnosing.

Spherical geographic encoding: Raw latitude/longitude values create a discontinuity at ±180° that breaks distance-based computation. Cartesian (x,y,z) encoding on a unit sphere solves this. Feature engineering decisions rooted in domain geometry, not just statistics.

AWS Streamlit deployment: First project in this portfolio with live cloud deployment. End-to-end pipeline from feature extraction to deployed inference endpoint — the deployment pattern carried into all subsequent clinical AI systems.

My contributions

librosaXGBoostBayesSearchCVSMOTEAWS Streamlitscikit-learnPyTorch (FNN)

View notebooks, feature datasets, and AWS demo.

EDA, feature engineering, XGBoost, FNN notebooks with reports.

GitHub → Get in Touch