ML Engineer with Large Models Experience

Posted 17 hours 28 minutes ago by Talent Search PRO

Permanent
Full Time
Other
London, United Kingdom
Job Description
Requirements
  • Significant hands on experience optimizing deep learning models
  • Proven ability to profile and debug performance bottlenecks
  • Experience with distributed or large scale training and inference
  • Familiarity with techniques such as mixed precision, quantization, distillation, pruning, caching, and batching
  • Experience with large models (e.g., transformers)
  • Practical CUDA development experience
  • Deep understanding of at least one major deep learning framework (ideally PyTorch)
  • Experience building and operating ML systems on cloud platforms (AWS, Azure, or GCP)
  • Comfort working with experiment tracking, monitoring, and evaluation pipelines
Job Description

Optimize and own performance of AI/ML foundation model, design GPU components, reduce latency, and work with founders on optimization goals. Requires CUDA, Python, and deep learning expertise.

- Own the performance, scalability, and reliability of the company's foundation model in both training and inference.
- Profile and optimize the end-to-end ML stack: data pipelines, training loops, inference serving, and deployment.
- Design and implement GPU accelerated components, including custom CUDA kernels where off the shelf libraries are insufficient.
- Reduce latency and cost per inference token while maximizing throughput and hardware utilization.
- Work closely with the founders to translate product requirements into concrete optimization goals and technical roadmaps.
- Build internal tooling, benchmarks, and evaluation harnesses to help the team experiment, debug, and ship safely.
- Contribute to model architecture and system design where it impacts performance and robustness.