Что будет проверяться
This role builds and optimizes the systems that serve OpenAI's models in production, working alongside researchers to improve inference performance, throughput, and reliability for models powering ChatGPT and the API. Engineers introduce new techniques for low-latency, high-utilization serving of large transformers across GPU fleets. An interview would probe how inference differs from training (KV caching, batching/continuous batching, quantization), GPU memory and latency tradeoffs, and designing a serving stack that maximizes tokens-per-second under tight tail-latency constraints.
ExoForm не аффилирован с OpenAI. Это независимая тренировочная страница.