Course Overview
AI algorithms — pathfinding, evolutionary optimisation, reinforcement learning
Fundamentals of AI · Spring 2025 · UNT · Prof. Russel Pears · with Ramyasri Murugesan
Utility-based agent controlling a robot fleet in a weighted graph warehouse (10 stations, 15 links) to service customer orders across a 1,440-tick 24-hour simulation. Robot capacity: out-tray holds 5 items. A* search for robot-to-order assignment; Dijkstra for shortest path computation. Metrics: average processing time and total completed orders per simulation run.
Key learning
Dispatch decisions under physical constraints: When a robot should break from its current path to service a higher-priority order — balancing completion of current delivery against capturing a closer demand — is a real-world constraint optimisation problem. The same reasoning appears in radiology AI worklist prioritisation: does the AI triage the next scan in the queue or reorder based on urgency signal?
Genetic Algorithm for minimising makespan in an NP-hard Job Shop Scheduling problem. Population: 300 permutations. Selection: roulette wheel. Crossover: interval-based (preserves feasibility on permutation chromosomes). Mutation: swap at rate 0.03. Results: makespan 30 (5 jobs, 2 machines — matches optimal), 298 (10 jobs, 3 ops, 5 machines vs random=302).
Key learning
Evolutionary optimisation on discrete permutations: All other projects optimise continuous parameters via gradient descent. GAs operate on discrete permutations with no gradient — fitness is computed by simulation. Implementing interval crossover (where naive splicing produces invalid schedules) required understanding why feasibility-preserving crossover exists. This is a fundamentally different optimisation paradigm, applicable to clinical scheduling and resource allocation problems where solutions must be discrete and feasible.
Value iteration for a stochastic grid world agent — 80-10-10-0 transition model, γ=0.99. Validated on 4×3 benchmark and 10×10 extended grid. Default convergence: 22 iterations, 86/86 = 100% goal success. Key experiment: negative living reward (r=−1) accelerated convergence to 18 iterations — the agent's incentive to reach terminal state quickly creates sharper value gradients.
Key learning
The Bellman equation as a working algorithm: Every other project learns from data — there is no model of the environment, only examples. Value iteration works from a complete environment model and computes the optimal policy by iterating the Bellman equation to fixed point — implemented from scratch, not a library call. Understanding the relationship between the Bellman equation, discount factor, and convergence rate is foundational for any clinical AI system that must reason about sequential decisions under uncertainty.