Are these based on real engineering roles?
Yes. The catalog is built around real engineering role patterns so the practice round feels closer to a live interview.
Role practice
Pick a role, answer follow-up questions out loud, and get a scored verdict after the interview.
Anthropic
ML / AI
This role optimizes the systems that serve Claude to millions of users, squeezing maximum throughput and minimum latency from large GPU clusters running frontier transformers. Work spans kernel-level optimization, batching and scheduling, and end-to-end profiling of the inference path. A technical interview would probe GPU performance fundamentals, the mechanics of LLM serving (KV cache management, speculative decoding, continuous batching), and how to find and fix the bottleneck limiting tokens-per-second in a production serving stack.
Cerebras
ML / AI
Owns quality and performance for Cerebras' inference offerings by designing automated eval suites, mining customer workload data to build representative test datasets, and forecasting how those workloads will run on wafer-scale hardware. Builds agent-in-the-loop pipelines and dashboards that consolidate quality and performance metrics across model releases. A technical interview would probe eval design for LLMs (coding, agentic, multimodal), statistical reasoning about benchmark variance, and how you architect a self-running evaluation pipeline.
Cognition
ML / AI
Design and ship the systems that power Devin's long-horizon task execution: tool use, context management, multi-step planning, subagent orchestration, and sandboxed code-execution environments. This is applied-AI systems work on getting an agent to reason reliably across thousands of lines of code, not feature plumbing. A technical interview would probe agent architecture and tool-use design, strategies for managing context over long-running tasks, and how you'd make multi-step agent behavior reliable and recoverable when individual steps fail.
Cursor
ML / AI
Train and fine-tune the proprietary models behind Cursor's autocomplete and agent features, reducing reliance on third-party APIs by improving code-completion quality, latency, and cost. You'll work across data curation, model training, and evaluation loops tied directly to product metrics. A technical interview would probe transformer internals, fine-tuning and RL techniques for code models, and how you'd design evals that correlate offline model quality with real editor acceptance rates.
Cursor
ML / AI
Own the inference and routing layer that decides which model serves each request and runs it efficiently at scale, optimizing throughput, batching, and GPU utilization across Cursor's model fleet. You'll balance quality, latency, and cost in a system serving constant high-volume LLM traffic. A technical interview would probe inference optimization (KV caching, batching, quantization), GPU performance tradeoffs, and how you'd build a routing policy that picks the cheapest model meeting a quality bar.
Databricks
ML / AI
Own end-to-end model development, from research and prototyping through deployment and monitoring, building scalable ML systems and pipelines that ship into Databricks products. The role requires fluency with modern deep-learning frameworks and the data and serving infrastructure to operationalize models in production. A technical interview would probe ML system design, feature pipelines and training/serving consistency, and practical tradeoffs in model evaluation, deployment, and monitoring at scale.
Databricks
ML / AI
Build the LLM serving infrastructure that processes trillions of tokens per week across partner models (OpenAI, Anthropic, Gemini) and self-hosted open models, improving reliability, latency, and efficiency of distributed AI workloads. The role spans scalable APIs, GPU orchestration, and real-time serving systems built with tools like vLLM, Ray, and MLflow. An interview would explore designing high-throughput low-latency serving systems, GPU resource scheduling and batching strategies, and service-oriented backend architecture for inference at massive scale.
Mercury
ML / AI
Build AI-powered features into Mercury's product, integrating large language models into workflows like transaction categorization, support automation, and financial insights while keeping outputs safe and reliable in a regulated banking context. You'd design prompts, evaluation pipelines, and the surrounding application plumbing in Haskell and TypeScript. A technical interview would probe how you structure LLM workflows, guard against hallucination and prompt injection in a fintech setting, and build evaluations to measure whether an AI feature is actually correct and trustworthy.
OpenAI
ML / AI
This role builds and optimizes the systems that serve OpenAI's models in production, working alongside researchers to improve inference performance, throughput, and reliability for models powering ChatGPT and the API. Engineers introduce new techniques for low-latency, high-utilization serving of large transformers across GPU fleets. An interview would probe how inference differs from training (KV caching, batching/continuous batching, quantization), GPU memory and latency tradeoffs, and designing a serving stack that maximizes tokens-per-second under tight tail-latency constraints.
OpenAI
ML / AI
Training Performance Engineers maximize the efficiency, speed, and hardware utilization of OpenAI's large-scale training runs, profiling and eliminating bottlenecks across compute, memory, and the network fabric. They work hands-on with communication libraries (NCCL, MPI, UCX), checkpointing, and large-scale data loading on multi-thousand-GPU clusters. A technical interview would probe GPU architecture and the memory hierarchy, profiling and roofline analysis, collective-communication patterns, and how to diagnose why a distributed run is achieving low Model FLOPs Utilization.
Palantir
ML / AI
Own generative-AI strategy and implementation with customers, building LLM-powered agent workflows that run over Palantir's object-action-link Ontology in AIP, including fine-tuning models that drive AIP actions and designing evaluation harnesses for deployments across commercial and classified environments. You'd write production Python and TypeScript while setting customers' AI direction. A technical interview would probe your grasp of the modern Gen AI landscape, how you design and evaluate LLM agent workflows, and how you decompose a real business problem into a reliable, measurable AI solution.
Perplexity
ML / AI
Build and run the inference engine behind every Perplexity query, writing high-performance kernels and serving infrastructure that keeps answer latency low at search-engine scale. The stack is Rust, Python, CUDA, and CuTe DSL, with a focus on squeezing maximum throughput out of each GPU. A technical interview would probe GPU kernel optimization, attention and KV-cache implementation details, and how you'd profile and eliminate bottlenecks in a continuous-batching inference server.
Perplexity
ML / AI
Build the agentic experiences inside Perplexity's Comet browser, creating AI agents that autonomously navigate and act on the web using context engineering, tool interfaces, and browser automation (CDP, Playwright, extensions). You'll work across AI/ML, backend, and full-stack with a high bar on both agent performance and user experience. A technical interview would probe context-window and tool-calling design for frontier models, browser-automation reliability, and how you'd evaluate and harden a web-navigating agent against flaky, adversarial pages.
Ramp
ML / AI
Embeds LLM-driven intelligence directly into Ramp's finance workflows — categorizing spend, extracting data from receipts and invoices, and automating bookkeeping inside the flow of every dollar a business spends. Builds and evaluates production AI features, prompts, and retrieval pipelines against real transaction data. A technical interview would probe practical LLM application design (retrieval, evaluation, guardrails), how you measure and improve accuracy on noisy financial documents, and integrating model outputs reliably into transactional product flows.
Replit
ML / AI
Build the platform and tooling that lets product engineers rapidly iterate on Replit Agent, the AI system that turns natural-language prompts into working software, while improving the core agent loop itself. The work spans prompt and tool orchestration, evaluation harnesses, and the feedback loops that make an autonomous coding agent more reliable. An interview would probe LLM agent architecture, designing evals and guardrails for non-deterministic systems, and reasoning about tool-use, context management, and failure recovery in long-running agent runs.
xAI
ML / AI
This role writes and optimizes custom GPU kernels to accelerate training and inference of Grok, integrating hand-tuned kernels into the JAX/XLA stack via pybind. Engineers profile and rewrite the hottest paths in the model to extract maximum performance from the GPU. A technical interview would probe CUDA programming and the GPU execution/memory model (warps, shared memory, coalescing, occupancy), kernel profiling and optimization, and how to fuse or replace operations in an XLA-based training pipeline for measurable speedups.
xAI
ML / AI
The Pre-Training team builds and scales the systems and methods that train Grok's foundation models, optimizing multi-GPU training efficiency and experimenting with architecture and data at frontier scale. The work requires deep familiarity with distributed, large-scale neural network training. A technical interview would probe distributed training parallelism strategies, optimizing Model FLOPs Utilization on large clusters, debugging unstable or diverging training runs, and tradeoffs in scaling data, model size, and compute under a fixed budget.
Yes. The catalog is built around real engineering role patterns so the practice round feels closer to a live interview.
Yes. ExoForm runs a live voice interview, asks follow-ups, and produces structured feedback after the session.
Yes. You can start with the free interview allowance before upgrading for more practice.