Что будет проверяться
Build and run the inference engine behind every Perplexity query, writing high-performance kernels and serving infrastructure that keeps answer latency low at search-engine scale. The stack is Rust, Python, CUDA, and CuTe DSL, with a focus on squeezing maximum throughput out of each GPU. A technical interview would probe GPU kernel optimization, attention and KV-cache implementation details, and how you'd profile and eliminate bottlenecks in a continuous-batching inference server.
ExoForm не аффилирован с Perplexity. Это независимая тренировочная страница.