PRJ-04
SHIPPED

GPU/ASIC Hash Optimization

Kernel-level compute optimization. Custom CUDA kernels, ASIC firmware tuning, thermal management, multi-device orchestration.

CUDAC++VerilogFPGARTLAssembly

THE PROBLEM

Needed to maximize hash throughput on heterogeneous compute hardware - a mix of GPUs and ASICs - while keeping thermal profiles within safe operating bounds across sustained 24/7 operation.

OUR APPROACH

We wrote custom CUDA kernels optimized for specific algorithm memory access patterns, tuned ASIC firmware parameters at the register level, and built an orchestration layer that dynamically throttles devices based on real-time thermal telemetry.

TECHNICAL DEPTH

01Custom CUDA kernels with optimized memory coalescing
02ASIC firmware tuning at register transfer level
03FPGA prototyping for algorithm validation
04Thermal-aware scheduling across heterogeneous devices
05Assembly-level hot-path optimization for critical loops

OUTCOME

Sustained 74.5 Mh/s aggregate throughput. 0.26 Mh/W power efficiency across the fleet.

Have a similar problem?

Start a conversation