Projects

This project implements a complete CNN inference pipeline in SystemVerilog. I built it to understand the tradeoffs of fixed-function hardware accelerators: the performance gains they deliver, and the flexibility they give up when data movement, arithmetic precision, and control timing are handled explicitly in hardware.

Compute latency (frame loaded → TX start)
~466k cycles
≈ 4.7 ms @ 100 MHz
End-to-end latency (UART RX + compute + UART TX)
~7.28M cycles
≈ 73 ms @ 115,200 baud
MNIST accuracy
~92.2%
Float baseline: ~94.3% (1 epoch)