ML Engineer with Large Models Experience
Posted 17 hours 28 minutes ago by Talent Search PRO
- Significant hands on experience optimizing deep learning models
- Proven ability to profile and debug performance bottlenecks
- Experience with distributed or large scale training and inference
- Familiarity with techniques such as mixed precision, quantization, distillation, pruning, caching, and batching
- Experience with large models (e.g., transformers)
- Practical CUDA development experience
- Deep understanding of at least one major deep learning framework (ideally PyTorch)
- Experience building and operating ML systems on cloud platforms (AWS, Azure, or GCP)
- Comfort working with experiment tracking, monitoring, and evaluation pipelines
Optimize and own performance of AI/ML foundation model, design GPU components, reduce latency, and work with founders on optimization goals. Requires CUDA, Python, and deep learning expertise.
- Own the performance, scalability, and reliability of the company's foundation model in both training and inference.
- Profile and optimize the end-to-end ML stack: data pipelines, training loops, inference serving, and deployment.
- Design and implement GPU accelerated components, including custom CUDA kernels where off the shelf libraries are insufficient.
- Reduce latency and cost per inference token while maximizing throughput and hardware utilization.
- Work closely with the founders to translate product requirements into concrete optimization goals and technical roadmaps.
- Build internal tooling, benchmarks, and evaluation harnesses to help the team experiment, debug, and ship safely.
- Contribute to model architecture and system design where it impacts performance and robustness.