ML Engineer with Large Models Experience

Posted 17 hours 28 minutes ago by Talent Search PRO

Permanent

Full Time

Other

London, United Kingdom

Job Description

Requirements

Significant hands on experience optimizing deep learning models
Proven ability to profile and debug performance bottlenecks
Experience with distributed or large scale training and inference
Familiarity with techniques such as mixed precision, quantization, distillation, pruning, caching, and batching
Experience with large models (e.g., transformers)
Practical CUDA development experience
Deep understanding of at least one major deep learning framework (ideally PyTorch)
Experience building and operating ML systems on cloud platforms (AWS, Azure, or GCP)
Comfort working with experiment tracking, monitoring, and evaluation pipelines

Job Description

Optimize and own performance of AI/ML foundation model, design GPU components, reduce latency, and work with founders on optimization goals. Requires CUDA, Python, and deep learning expertise.

- Own the performance, scalability, and reliability of the company's foundation model in both training and inference.
- Profile and optimize the end-to-end ML stack: data pipelines, training loops, inference serving, and deployment.
- Design and implement GPU accelerated components, including custom CUDA kernels where off the shelf libraries are insufficient.
- Reduce latency and cost per inference token while maximizing throughput and hardware utilization.
- Work closely with the founders to translate product requirements into concrete optimization goals and technical roadmaps.
- Build internal tooling, benchmarks, and evaluation harnesses to help the team experiment, debug, and ship safely.
- Contribute to model architecture and system design where it impacts performance and robustness.