// Model Compression · Clinical Deployment · Directed Study · CSCE 5934

CNN Filter Pruning —
Hybrid Crossover
vs. Standard Drop

Research question: When two neural network filters are redundant, is it better to discard the weaker one — or synthesise an entirely new filter that captures the best of both?

Structured CNN pruning under hard accuracy guard rails · Hybrid Crossover regression-based synthesis vs. standard drop · VGG-16-BN + 5-layer CNN · CelebA + CIFAR-10 · CSCE 5934 Directed Study, UNT · Prof. Russel Pears

More channels pruned
+0.37%
Accuracy improvement
Faster runtime
2.05×
Best latency speedup
More Channels Pruned
vs. standard drop approach
+0.37%
Accuracy Improvement
higher than standard drop
Faster Runtime
2,154s vs 4,060s
2.05×
Best Latency Speedup
5-CNN on CIFAR-10

Background & Clinical Motivation

Why smaller, faster models matter for clinical deployment

A model that achieves 97% accuracy but requires a dedicated GPU server is not a clinical product. This work addresses the deployment gap directly.

🌐 For everyone

When an AI model is trained to read medical images, it develops thousands of internal components called filters — each looking for a specific pattern like an edge, a texture, or a shape. Many filters are redundant: they detect nearly the same thing as another filter. This wastes memory, slows inference, and makes it harder to deploy AI on hospital workstations or wearable devices.

The obvious fix is to find redundant filters and delete them. But deletion is permanent — any information encoded by the removed filter is lost. This project asks: what if instead of deleting, we synthesise? What if we create a new filter that captures the best of both redundant ones, replacing both with a single synthesised one? Same compression — two become one — but far less information loss.

⚙️ Technical context

This work is directly motivated by the fetal head circumference deployment challenge in this portfolio. A U-Net achieving 97.36% Dice on a research GPU needs to run on existing hospital workstation hardware — which may have a shared GPU serving PACS, reporting tools, and the AI system simultaneously. Structured pruning produces architecturally smaller networks: fewer channels, fewer multiply-accumulate operations (MACs), real latency reductions — without sparse weight matrices that hardware cannot efficiently exploit.

Origin — Ashwini Sharma's founding observation

This project is rooted in prior MS thesis work at UNT. Ashwini Sharma studied gender classification on CelebA using a custom 5-layer CNN and found that only a small subset of final-layer filters contributed meaningfully to predictions — the rest were largely redundant. That observation raised the question this directed study was designed to answer.

How the methodology evolved week by week

Week 1

From IoU to Cosine Similarity

Initial filter similarity metric — IoU on thresholded activation maps — replaced with cosine similarity. Hierarchical clustering with medoid replacement introduced as the grouping strategy.

Week 2

The Stopping Criterion Problem

Central open problem identified: how do you decide when to stop pruning? Any fixed similarity threshold is arbitrary. Establishing a principled, data-driven stopping criterion became the key research challenge.

Week 3

Correction-Probability Breakthrough

Merging decisions grounded in a new metric: does filter B actually correct the errors that filter A makes? Moving from geometric similarity to functional contribution — a fundamentally more meaningful criterion.

Week 4

Two Tracks Emerge

Local HC (teammate's layer-local medoid merge) and Global Scoring (cross-layer ranking) diverged into parallel approaches — enabling a genuine head-to-head comparison on identical guard rails.

Later

Hybrid Crossover Introduced

My core novel contribution: replacing the standard "drop the weaker filter" action with regression-based synthesis. ILR scoring, accuracy guard rails, and the full Global pipeline built and validated across both architectures and both datasets.


Novel Contribution

The Hybrid Crossover — synthesise, don't discard

The core idea I introduced and implemented for this directed study.

🌐 For everyone

Standard pruning: find two filters that do basically the same thing, keep the better one, discard the other. Simple — but permanent. The unique information in the discarded filter is gone. My approach creates a brand new filter designed to capture the most important activations from both parents — a "best of both worlds" filter through mathematical optimisation, then replacing both originals with this single synthesised one.

⚙️ Technical detail

Given two candidate filters A and B identified as redundant by the Global scoring pipeline:

Step 01
Feature Map Generation
Run both filters on calibration data → activation maps FM_A and FM_B
Step 02
Hybrid Target
Target[i,j] = max(FM_A[i,j], FM_B[i,j]) — preserves peak activations from both parents
Step 03 ★
Weight Regression
F_new = argmin_W ‖Conv(X,W) − Target‖² — solved via gradient descent
Step 04
Replacement
Replace both filters A and B with the single synthesised F_new
# Hybrid target: preserve peak activations from both parents
target_hybrid = torch.max(fm_a, fm_b) # element-wise max

# Solve for new filter weights via gradient descent
f_new = nn.Parameter(torch.randn_like(filter_a))
optimizer = torch.optim.Adam([f_new], lr=1e-3)
for _ in range(n_steps):
    pred = F.conv2d(calibration_data, f_new)
    loss = F.mse_loss(pred, target_hybrid)
    loss.backward(); optimizer.step()

# Replace both originals with synthesised filter
layer.weight.data[idx_a] = f_new.data
layer.weight.data = delete_channel(layer.weight.data, idx_b)

Why Hybrid Crossover is faster, not slower: Standard drop under aggressive compression frequently violates accuracy guard rails and triggers rollback — the algorithm undoes the change, soft-locks the channel, and searches for an alternative. This is expensive. The Hybrid Crossover avoids rollback because the synthesised filter approximates the pre-pruning representation by construction. Fewer rollbacks → 2× faster overall runtime despite the additional regression step.

The Global Scoring pipeline — ILR importance scoring

⚙️ Technical detail

Filters are scored using ILR (Inside-Layer Ranking) — a weighted fusion of three signals computed from forward passes only, making it fast, stable, and reproducible across long runs with many architectural changes:

Activation RMS
0.6
Post-ReLU activation magnitude — highest weight because it directly measures filter contribution to downstream activations
BN-γ Magnitude
0.4
Batch Normalisation scale parameter — reflects how much the network has learned to use this channel
HRank / Frobenius
0.2
Feature map energy — lower Frobenius norm indicates low-rank, less informative filters

Filters ranked globally across all layers simultaneously — not layer by layer. Cross-layer ranking concentrates pruning on compute-intensive blocks (B4, B5), producing disproportionate latency speedup.

Accuracy guard rails — clinical-grade compression safety

🌐 For clinical context

Every structural change is validated before being committed. If the model's accuracy drops beyond the defined tolerance, the change is rolled back. This is the engineering discipline that separates research-grade compression from deployment-grade compression — in clinical AI, a compressed model that quietly loses sensitivity on a subgroup is more dangerous than a large model that runs slowly.

# After every structural change (prune or merge):
if (baseline_acc - current_acc) > Δ_max_overall: ROLLBACK
if (baseline_class_acc[c] - current_class_acc[c]) > Δ_max_class: ROLLBACK

# CelebA limits: Δ_max_overall = 2pp | Δ_max_class = 6pp
# CIFAR-10 limits: Δ_max_overall = 3pp | Δ_max_class = 9pp

Results

Head-to-head: Hybrid Crossover vs. Standard Drop

Same architecture, same dataset, identical guard rails. Every other variable held constant.

MetricStandard Drop (Teammate)Hybrid Crossover (Tarun) ★Delta
Final Top-1 Accuracy92.42%92.79%+0.37%
Total Channels Pruned196372+176 (~2×)
Optimisation Runtime~4,060s~2,154s~2× faster
Layer 16 Compression35.5% pruned68.75% pruned+33.2pp
Accuracy drop at Layer 16−0.98%−0.87%Less drop, more pruning

The counterintuitive result: The Hybrid Crossover is both more aggressive (2× more channels removed) AND more accurate AND faster. Regression-synthesised filters satisfy guard rails on the first or second attempt — standard drop under aggressive compression triggers expensive rollback loops that dominate total runtime.

Full canonical results — all 6 experiments

ApproachArchitectureDatasetAcc. BeforeAcc. AfterΔ OverallParams PrunedLatency
Local HC5-CNNCelebA94.15%92.41%1.74pp46.0%1.04×
Local HCVGG16CelebA97.93%96.45%1.48pp30.5%1.03×
Global5-CNNCelebA94.15%92.15%1.99pp70.1%1.28×
GlobalVGG16CelebA97.93%95.94%1.99pp45.6%1.21×
Global5-CNNCIFAR-1091.45%90.25%1.20pp7.9%2.05×
GlobalVGG16CIFAR-1097.95%95.35%2.60pp5.2%1.52×

All runs satisfy dataset-specific guard rails. CIFAR-10 5-CNN: 2.05× latency speedup with 1.2pp accuracy drop — by concentrating pruning on compute-intensive blocks (B4: 75% of channels removed) while leaving early, low-redundancy blocks intact.

Global beats Local HC
On every experiment, Global scoring achieves higher compression and speedup than Local HC under identical guard rails — cross-layer ranking concentrates pruning where the compute cost actually is.
🔬
Synthesis beats deletion
Hybrid Crossover achieves 2× more channel compression at higher accuracy and 2× faster runtime than Standard Drop. The regression-synthesised filter preserves information that deletion permanently loses.
🏥
Clinical deployment insight
2.05× latency speedup on a realistic classification task demonstrates that structured pruning can bring large pretrained networks into deployment-feasible territory on existing clinical hardware — without retraining from scratch.

Team Contributions

2-person directed study — clear division of work

Supervised by Prof. Russel Pears, UNT Computer Science & Engineering.

Tarun Sadarla — this portfolio

  • Hybrid Crossover (Tier 2) merging strategy — core novel contribution. Full design, implementation, and evaluation of regression-based filter synthesis.
  • Merge Strategies analysis — 30-page systematic analysis of 6 alternative merging strategies evaluated against Hybrid Crossover
  • Threshold Design Strategies — 35-page analysis of 7 principled threshold selection methods for cosine similarity-based pruning
  • Shared baseline (Phase 1–2), hard floor configuration, IEEE-format final report — jointly with teammate

Sai Naga Chaithanya Aavula — teammate

  • Full Global scoring pipeline — ILR scoring, guard rail enforcement, benefit-aware pruning loop (8,694 lines for CelebA alone)
  • Full Local HC pipeline — hierarchical clustering pruner (2,966 lines)
  • Shared baseline, hard floor design, IEEE report — jointly with Tarun

Limitations & Future Work

What remains open

Current limitations

Middle layers (B3, B2) resist compression due to low filter redundancy — Hybrid Crossover, like Standard Drop, finds few valid merge candidates there. This limits overall compression ratios on deeper architectures.

All experiments are on binary classification tasks (CelebA gender, CIFAR-10). Multi-class extension and medical imaging domain validation (histopathology, ultrasound) remain open — filter redundancy patterns may differ substantially from binary tasks.

Next steps

Phase 4 fine-tuning — short retraining after pruning to recover accuracy in middle layers that currently resist compression.

Medical imaging domain — extend evaluation to histopathology patch classification and U-Net segmentation backbones, where the deployment motivation for this work originates.

INT8 quantisation — combine structured pruning with quantisation for embedded clinical deployment, targeting both model size and inference precision simultaneously.

What this project built that transfers to clinical AI deployment

Synthesis over deletion as a compression philosophy: Reframing structured pruning as a synthesis problem — rather than a removal problem — produced empirically better results. Any compression task where information retention matters more than maximum removal benefits from this inversion. Directly applicable to compressing clinical imaging backbones for hospital hardware.

Guard-rail engineering as responsible compression: Hard accuracy limits (≤2pp overall, ≤6pp per-class) enforced clinical-grade reliability constraints during compression — every structural change validated before committed, with rollback if violated. A compressed model that quietly loses sensitivity on a subgroup is more dangerous than a large model that runs slowly.

Why the faster method is also the better method: The Hybrid Crossover is 2× faster not despite being better, but because of it. Regression-synthesised filters satisfy accuracy constraints on the first attempt; Standard Drop triggers expensive rollback loops. Quality and efficiency are not always in tension — accuracy-preserving design choices can also reduce compute cost.

ILR over Taylor saliency for production stability: Taylor-based saliency requires gradient computation — sensitive to training mode, mixed-precision settings, and compiler behaviour. ILR uses only forward-pass statistics: cheaper, reproducible, and stable across long runs. Choosing a simpler, more stable signal over a theoretically richer but fragile one is a production engineering skill that applies directly to any clinical AI deployment pipeline.

PyTorchVGG-16-BN5-layer CNNILR Scoring Cosine SimilarityRegression SynthesisCelebACIFAR-10 IEEE Report Format

View the full paper and implementation.

IEEE-format final report, Global + Local HC pipelines, and Hybrid Crossover code in the repository.

GitHub → Get in Touch