Leave us your email address and we'll send you all the new jobs according to your preferences.
Senior Audio AI Engineer - TTS / Speech Synthesis
Posted 2 days 6 hours ago by Awarri
At Awarri, our mission is to enable the development and adoption of frontier technology across Africa, starting in Nigeria. We are building inclusive AI technologies-from LLMs to speech models-that reflect and empower African languages and cultural contexts.
Why Join Awarri?
- Be part of a pioneering initiative shaping the future of AI in Africa.
- Work on impactful projects that center real-world representation and inclusivity.
- Collaborate with a passionate, globally distributed team of engineers, linguists, and researchers.
As a Senior Audio AI Engineer at Awarri, you will play a pivotal role in advancing the naturalness and quality of our Text-to-Speech (TTS) systems, focused on African languages and accents. We're seeking an engineer who understands the intricacies of prosody, rhythm, and speech alignment-and is excited to push the boundaries of audio AI in a meaningful cultural context.
This role is best suited for a specialist with deep experience in speech technologies and a passion for building expressive, production-ready TTS models. You'll be joining a collaborative, mission-driven team dedicated to shaping the future of generative audio systems in Africa.
Responsibilities Model Development & Fine-Tuning- Optimize neural TTS models for prosody, pacing, and expressiveness (e.g., Tacotron 2, FastSpeech 2, Glow-TTS, VITS).
- Improve duration prediction and phoneme-to-frame alignment using forced aligners or prosody-aware training.
- Incorporate punctuation and linguistic markers into the model pipeline to improve natural flow.
- Implement and fine-tune transformer-based architectures for speech synthesis and text-to-speech tasks.
- Evaluate and fine-tune neural vocoders (e.g., HiFi-GAN, WaveGlow) to match desired voice characteristics and audio quality.
- Identify and correct audio artifacts or inconsistencies in generated speech.
- Optimize speech processing pipelines for efficiency and real-time performance.
- Lead both objective (e.g., duration errors, pitch contours) and subjective (e.g., MOS scoring) evaluations of TTS quality.
- Collaborate with linguistic teams to benchmark pronunciation accuracy in Nigerian languages.
- Develop automated testing frameworks to validate speech synthesis quality at scale.
- Prepare the TTS system for product integration by improving inference speed and robustness.
- Support the deployment of models across various platforms (cloud, mobile, embedded).
- Optimize model inference using VLLM for efficient deployment.
- Build APIs and backend services for TTS deployment using FastAPI and Flask.
- Implement and manage data pipelines and storage solutions using MongoDB and MySQL.
- Proficiency in Python and TypeScript for model development and backend integration.
- Experience with transformer-based models for speech synthesis and NLP.
- Strong background in machine learning frameworks such as TensorFlow or PyTorch.
- Experience in designing scalable AI-driven applications.
- Familiarity with FastAPI, Flask, and cloud-based deployment environments.
- Knowledge of database management using MongoDB and MySQL.
- 3+ years of experience developing and deploying TTS or speech generation systems (bonus for low-resource languages).
- Deep knowledge of at least one neural TTS architecture and related vocoders.
- Proficiency with PyTorch, TensorFlow, or JAX for building and training models.
- Experience with audio processing tools (e.g., librosa, Praat, torchaudio).
- Experience working with multilingual or low-resource speech data.
- Familiarity with phonetics/phonology, especially as it relates to prosody and rhythm.
- Experience building scalable training and evaluation pipelines.
- Ability to debug complex model behavior and iterate quickly toward product quality.
- Comfort working remotely and asynchronously with interdisciplinary teams.
- Prior work on African language speech systems or expressive TTS in non-English languages.
- Interest in linguistic or cultural technology in the African context.
- Contributions to open-source TTS or audio AI tools.
- Experience with emotion modeling or speaker adaptation.
Awarri
Related Jobs
Senior Backend Engineer (f/m/x) - Hybrid
- 65 000,00 € - 80 000,00 € Annual
- Hessen, Frankfurt am Main, Germany, 60486
AI Engineer (f/m/x) - Berlin - Hybrid
- 50 000,00 € - 90 000,00 € Annual
- Berlin, Mitte, Germany, 10117
Senior Engineer Data, AI & Analytics (m/w/d) - Hybrid
- 50 000,00 € - 75 000,00 € Annual
- Berlin, Charlottenburg, Germany, 10623
Senior Azure Cloud Architect - Remote
- 85 000,00 € - 120 000,00 € Annual
- Bayern, München, Germany, 80807
Senior SAP PP Consultant (w/m/d)
- 90 000,00 € - 105 000,00 € Annual
- Baden-Württemberg, Asbach, Germany, 89522