Senior Software Engineer (Azure HPC/AI)
Posted 1 day 14 hours ago by Microsoft Corporation
Do you want to help power the world's most demanding Artificial Intelligence (AI) and High Performance Computing (HPC) workloads on Azure? The Azure HPC/AI Software team is expanding our engineering presence in Dublin to accelerate innovation in Azure HPC/AI Images and Microsoft HPC Pack - powering some of the world's most demanding HPC and AI workloads - from large-scale physics simulations, climate modeling, and computational chemistry to AI model training and inferencing on thousands of Graphics Processing Units (GPUs).
Azure HPC/AI Images are pre-configured, performance-optimized Operating System (OS) images that integrate the latest HPC and AI software stacks - including Message Passing Interface (MPI) libraries, GPU drivers, communication runtimes, container runtimes, and high-speed networking drivers - enabling customers to run cutting-edge HPC/AI workloads on Azure with minimal setup time and maximum performance. Microsoft HPC Pack provides enterprise-class cluster management, job scheduling, and monitoring for hybrid and cloud-native HPC environments, enabling customers to orchestrate workloads across on-premises and Azure resources seamlessly.
As a Senior Software Engineer, you will be responsible for building and optimizing Azure HPC/AI Images, integrating the latest MPI libraries (Intel MPI, Open MPI, MVAPICH2), GPU computing frameworks (CUDA, NCCL, ROCm, RCCL), high-speed networking (NVLink, InfiniBand, RDMA, Libfabric, UCX), and parallel file systems (Azure Managed Lustre) to deliver maximum performance and reliability. In addition, you will architect and evolve a state-of-the-art test infrastructure that ensures Azure HPC/AI Images consistently deliver peak performance and reliability across various host-level and containerized benchmarks. You will work hands-on with cutting-edge hardware and collaborate closely with industry-leading partners to integrate and optimize these technologies for Azure customers. You will enhance the support of Microsoft HPC Pack for its job scheduling and cluster management capabilities serving global enterprise customers. Through your work, you will directly contribute to the global-scale Azure HPC/AI infrastructure that powers scientific breakthroughs, drives AI innovation, and supports mission-critical workloads across industries.
As a senior member of our team, you will participate in the hiring, training, mentoring, and development of junior staff and lead virtual teams of engineers to solve complex problems. You will work closely with team leaders and other senior developers across the organization in our mission to deliver the best experience to our customers.
Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond.
QualificationsRequired Qualifications:
- Bachelor's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, Bash, C, C++, C#, Python, or Go, OR equivalent experience.
- Proven experience in Linux or Windows system programming, distributed systems, or cloud service development.
- Demonstrated experience with HPC/AI clusters and workloads, MPI libraries, GPU computing, or large-scale cluster environments.
- Ability to work with global teams across multiple time zones.
Other Requirements:
- Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to, the following specialized security screenings: Microsoft Cloud Background Check. This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
Preferred Qualifications:
- Master's Degree in Computer Science or related technical field AND technical engineering experience with coding in languages including, but not limited to, C, C++, C#, Python, or Go, OR equivalent experience.
- Experience with Microsoft Azure or other cloud platforms.
- Familiarity with kernel debugging, driver troubleshooting and performance tuning on Linux or Windows.
- Familiarity with container technologies (Docker, Kubernetes) and cloud-native HPC/AI workflows.
- Knowledge of high-speed networking concepts (InfiniBand, RDMA, NVLink) and parallel file systems (Lustre, BeeGFS).
- Collaborate with stakeholders to define requirements for Azure HPC/AI Images and Microsoft HPC Pack features, ensuring alignment with customer needs and Azure HPC/AI strategy.
- Design and implement OS-level optimizations, HPC library integrations, GPU driver updates, and InfiniBand/RDMA configurations for HPC/AI Images.
- Develop and maintain automation pipelines for building, testing, and releasing HPC/AI VM Images to the Azure Marketplace.
- Enhance Microsoft HPC Pack's cluster management, job scheduling, and monitoring capabilities to support global enterprise customers.
- Act as a Designated Responsible Individual (DRI) for HPC/AI Images and HPC Pack, diagnosing and resolving complex performance, compatibility, and reliability issues.
- Partner with hardware vendors and open-source communities to integrate and validate next-generation HPC/AI technologies in Azure.
- Mentor junior engineers, operate with high autonomy, lead virtual feature teams, and contribute to technical design reviews and architectural decisions.
- Industry leading healthcare
- Educational resources
- Discounts on products and services
- Savings and investments
- Maternity and paternity leave
- Generous time away
- Giving programs
- Opportunities to network and connect
Microsoft is an equal opportunity employer. All qualified applicants will receive consideration for employment without regard to age, ancestry, citizenship, color, family or medical care leave, gender identity or expression, genetic information, immigration status, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran or military status, race, ethnicity, religion, sex (including pregnancy), sexual orientation, or any other characteristic protected by applicable local laws, regulations and ordinances. If you need assistance and/or a reasonable accommodation due to a disability during the application process, read more about requesting accommodations.
