Back to all projects
AI/ML2025

model-lab

On-device audio AI — MLX, Whisper & model evaluation infrastructure

PythonMLXWhisperApple SiliconCUDAJupyter

Problem

Selecting the right audio AI model requires systematic evaluation across accuracy, speed, cost, and hardware — not just running a few tests. Teams need infrastructure to make deployment decisions with data.

Approach

Built model-lab as a repeatable evaluation framework with shared harness, identical metrics, and automated comparison dashboards. Tested across Apple Silicon MPS, NVIDIA CUDA, and CPU. Produced production readiness grades for Whisper variants and LFM2.5-Audio.

Result

Production-ready evaluation framework for ASR/TTS models with automated scorecards and multi-device benchmarking.