What this interview will probe
This role writes and optimizes custom GPU kernels to accelerate training and inference of Grok, integrating hand-tuned kernels into the JAX/XLA stack via pybind. Engineers profile and rewrite the hottest paths in the model to extract maximum performance from the GPU. A technical interview would probe CUDA programming and the GPU execution/memory model (warps, shared memory, coalescing, occupancy), kernel profiling and optimization, and how to fuse or replace operations in an XLA-based training pipeline for measurable speedups.
ExoForm is not affiliated with xAI. This is an independent practice page.