Role practice

ML / AI interview practice

Pick a role, answer follow-up questions out loud, and get a scored verdict after the interview.

17 rolesPythonLLMsCUDAC++

Create custom interview

Roles to practice

Anthropic
ML / AI
Performance Engineer, Inference Systems
This role optimizes the systems that serve Claude to millions of users, squeezing maximum throughput and minimum latency from large GPU clusters running frontier transformers. Work spans kernel-level optimization, batching and scheduling, and end-to-end profiling of the inference path. A technical interview would probe GPU performance fundamentals, the mechanics of LLM serving (KV cache management, speculative decoding, continuous batching), and how to find and fix the bottleneck limiting tokens-per-second in a production serving stack.
CUDAC++Triton
Cerebras
ML / AI
AI Engineer, Model Quality and Performance
Owns quality and performance for Cerebras' inference offerings by designing automated eval suites, mining customer workload data to build representative test datasets, and forecasting how those workloads will run on wafer-scale hardware. Builds agent-in-the-loop pipelines and dashboards that consolidate quality and performance metrics across model releases. A technical interview would probe eval design for LLMs (coding, agentic, multimodal), statistical reasoning about benchmark variance, and how you architect a self-running evaluation pipeline.
PythonDockerlm-eval-harness
Cognition
ML / AI
Software Engineer
Design and ship the systems that power Devin's long-horizon task execution: tool use, context management, multi-step planning, subagent orchestration, and sandboxed code-execution environments. This is applied-AI systems work on getting an agent to reason reliably across thousands of lines of code, not feature plumbing. A technical interview would probe agent architecture and tool-use design, strategies for managing context over long-running tasks, and how you'd make multi-step agent behavior reliable and recoverable when individual steps fail.
PythonLLMsAgents
Cursor
ML / AI
Software Engineer, ML Research
Train and fine-tune the proprietary models behind Cursor's autocomplete and agent features, reducing reliance on third-party APIs by improving code-completion quality, latency, and cost. You'll work across data curation, model training, and evaluation loops tied directly to product metrics. A technical interview would probe transformer internals, fine-tuning and RL techniques for code models, and how you'd design evals that correlate offline model quality with real editor acceptance rates.
PythonPyTorchLLMs
Cursor
ML / AI
Software Engineer, Model Routing & Inference
Own the inference and routing layer that decides which model serves each request and runs it efficiently at scale, optimizing throughput, batching, and GPU utilization across Cursor's model fleet. You'll balance quality, latency, and cost in a system serving constant high-volume LLM traffic. A technical interview would probe inference optimization (KV caching, batching, quantization), GPU performance tradeoffs, and how you'd build a routing policy that picks the cheapest model meeting a quality bar.
PythonCUDALLMs
Databricks
ML / AI
Senior Machine Learning Engineer
Own end-to-end model development, from research and prototyping through deployment and monitoring, building scalable ML systems and pipelines that ship into Databricks products. The role requires fluency with modern deep-learning frameworks and the data and serving infrastructure to operationalize models in production. A technical interview would probe ML system design, feature pipelines and training/serving consistency, and practical tradeoffs in model evaluation, deployment, and monitoring at scale.
PythonPyTorchMLflow
Databricks
ML / AI
Staff Backend Software Engineer, LLM Infrastructure
Build the LLM serving infrastructure that processes trillions of tokens per week across partner models (OpenAI, Anthropic, Gemini) and self-hosted open models, improving reliability, latency, and efficiency of distributed AI workloads. The role spans scalable APIs, GPU orchestration, and real-time serving systems built with tools like vLLM, Ray, and MLflow. An interview would explore designing high-throughput low-latency serving systems, GPU resource scheduling and batching strategies, and service-oriented backend architecture for inference at massive scale.
ScalaGoPython
Mercury
ML / AI
Senior Software Engineer - AI Engineering
Build AI-powered features into Mercury's product, integrating large language models into workflows like transaction categorization, support automation, and financial insights while keeping outputs safe and reliable in a regulated banking context. You'd design prompts, evaluation pipelines, and the surrounding application plumbing in Haskell and TypeScript. A technical interview would probe how you structure LLM workflows, guard against hallucination and prompt injection in a fintech setting, and build evaluations to measure whether an AI feature is actually correct and trustworthy.
LLMsHaskellTypeScript
OpenAI
ML / AI
Software Engineer, Model Inference
This role builds and optimizes the systems that serve OpenAI's models in production, working alongside researchers to improve inference performance, throughput, and reliability for models powering ChatGPT and the API. Engineers introduce new techniques for low-latency, high-utilization serving of large transformers across GPU fleets. An interview would probe how inference differs from training (KV caching, batching/continuous batching, quantization), GPU memory and latency tradeoffs, and designing a serving stack that maximizes tokens-per-second under tight tail-latency constraints.
PythonC++CUDA
OpenAI
ML / AI
Training Performance Engineer
Training Performance Engineers maximize the efficiency, speed, and hardware utilization of OpenAI's large-scale training runs, profiling and eliminating bottlenecks across compute, memory, and the network fabric. They work hands-on with communication libraries (NCCL, MPI, UCX), checkpointing, and large-scale data loading on multi-thousand-GPU clusters. A technical interview would probe GPU architecture and the memory hierarchy, profiling and roofline analysis, collective-communication patterns, and how to diagnose why a distributed run is achieving low Model FLOPs Utilization.
CUDAC++NCCL
Palantir
ML / AI
Forward Deployed AI Engineer
Own generative-AI strategy and implementation with customers, building LLM-powered agent workflows that run over Palantir's object-action-link Ontology in AIP, including fine-tuning models that drive AIP actions and designing evaluation harnesses for deployments across commercial and classified environments. You'd write production Python and TypeScript while setting customers' AI direction. A technical interview would probe your grasp of the modern Gen AI landscape, how you design and evaluate LLM agent workflows, and how you decompose a real business problem into a reliable, measurable AI solution.
LLMsPythonAIP
Perplexity
ML / AI
AI Inference Engineer (Member of Technical Staff)
Build and run the inference engine behind every Perplexity query, writing high-performance kernels and serving infrastructure that keeps answer latency low at search-engine scale. The stack is Rust, Python, CUDA, and CuTe DSL, with a focus on squeezing maximum throughput out of each GPU. A technical interview would probe GPU kernel optimization, attention and KV-cache implementation details, and how you'd profile and eliminate bottlenecks in a continuous-batching inference server.
RustCUDAPython
Perplexity
ML / AI
AI Software Engineer - Comet Agents
Build the agentic experiences inside Perplexity's Comet browser, creating AI agents that autonomously navigate and act on the web using context engineering, tool interfaces, and browser automation (CDP, Playwright, extensions). You'll work across AI/ML, backend, and full-stack with a high bar on both agent performance and user experience. A technical interview would probe context-window and tool-calling design for frontier models, browser-automation reliability, and how you'd evaluate and harden a web-navigating agent against flaky, adversarial pages.
PythonTypeScriptPlaywright
Ramp
ML / AI
Applied AI Engineer
Embeds LLM-driven intelligence directly into Ramp's finance workflows — categorizing spend, extracting data from receipts and invoices, and automating bookkeeping inside the flow of every dollar a business spends. Builds and evaluates production AI features, prompts, and retrieval pipelines against real transaction data. A technical interview would probe practical LLM application design (retrieval, evaluation, guardrails), how you measure and improve accuracy on noisy financial documents, and integrating model outputs reliably into transactional product flows.
PythonLLMsRAG
Replit
ML / AI
Senior Software Engineer, Agent Platform
Build the platform and tooling that lets product engineers rapidly iterate on Replit Agent, the AI system that turns natural-language prompts into working software, while improving the core agent loop itself. The work spans prompt and tool orchestration, evaluation harnesses, and the feedback loops that make an autonomous coding agent more reliable. An interview would probe LLM agent architecture, designing evals and guardrails for non-deterministic systems, and reasoning about tool-use, context management, and failure recovery in long-running agent runs.
TypeScriptPythonLLMs
xAI
ML / AI
Member of Technical Staff, CUDA/GPU Kernel
This role writes and optimizes custom GPU kernels to accelerate training and inference of Grok, integrating hand-tuned kernels into the JAX/XLA stack via pybind. Engineers profile and rewrite the hottest paths in the model to extract maximum performance from the GPU. A technical interview would probe CUDA programming and the GPU execution/memory model (warps, shared memory, coalescing, occupancy), kernel profiling and optimization, and how to fuse or replace operations in an XLA-based training pipeline for measurable speedups.
CUDAC++JAX
xAI
ML / AI
Member of Technical Staff, Pre-Training
The Pre-Training team builds and scales the systems and methods that train Grok's foundation models, optimizing multi-GPU training efficiency and experimenting with architecture and data at frontier scale. The work requires deep familiarity with distributed, large-scale neural network training. A technical interview would probe distributed training parallelism strategies, optimizing Model FLOPs Utilization on large clusters, debugging unstable or diverging training runs, and tradeoffs in scaling data, model size, and compute under a fixed budget.
PythonJAXDistributed Training

FAQ

Are these based on real engineering roles?

Yes. The catalog is built around real engineering role patterns so the practice round feels closer to a live interview.

Is the interview voice-based?

Yes. ExoForm runs a live voice interview, asks follow-ups, and produces structured feedback after the session.

Can I try it for free?

Yes. You can start with the free interview allowance before upgrading for more practice.

ML / AI interview practice

Roles to practice

Performance Engineer, Inference Systems

AI Engineer, Model Quality and Performance

Software Engineer

Software Engineer, ML Research

Software Engineer, Model Routing & Inference

Senior Machine Learning Engineer

Staff Backend Software Engineer, LLM Infrastructure

Senior Software Engineer - AI Engineering

Software Engineer, Model Inference

Training Performance Engineer

Forward Deployed AI Engineer

AI Inference Engineer (Member of Technical Staff)

AI Software Engineer - Comet Agents

Applied AI Engineer

Senior Software Engineer, Agent Platform

Member of Technical Staff, CUDA/GPU Kernel

Member of Technical Staff, Pre-Training

FAQ

Are these based on real engineering roles?

Is the interview voice-based?

Can I try it for free?