Leave us your email address and we'll send you all the new jobs according to your preferences.

Director, Site Reliability Engineering

Posted 21 hours ago by Scholtastic

Permanent

Full Time

Other

Not Specified, Ireland

Job Description

Title and Summary

Director, Site Reliability Engineering

Our Purpose

Mastercard powers economies and empowers people in more than 200 countries and territories worldwide. We are committed to building an inclusive, digital economy that benefits everyone, everywhere-by making transactions safe, simple, smart, and accessible.

About the Role

Mastercard's Program aligned Site Reliability Engineering (SRE) teams deliver a seamless experience for our customers by maintaining every aspect of our Programs infrastructure and technology ecosystem to the highest standards and ensuring compliance with rigorous security requirements. In this role you will lead a team of highly skilled SRE infrastructure engineers focused on the reliability and performance of core infrastructure supporting Payment Networks applications.

Key Responsibilities

Lead the vision, strategy, and execution of the Infrastructure SRE organization supporting mission critical Payment Networks applications, ensuring alignment with business and platform roadmaps.
Provide strong technical leadership by driving high level architectural discussions, influencing cross functional engineering teams, and shaping scalable, secure, and highly available infrastructure solutions.
Mentor, develop, and support engineers across skill levels through team meetings, one on ones, performance management, and long term career development plans.
Establish, track, and report on key team OKRs and KPIs that support broader business objectives, infrastructure health, and operational maturity.
Foster a culture of innovation, collaboration, and continuous improvement across engineering and operational teams.
Drive governance, enterprise standards, compliance requirements, and operational excellence to increase platform scalability, uptime, availability, and resiliency.
Advance observability and telemetry capabilities to enable proactive monitoring, intelligent alerting, automated remediation, and improved root cause analysis (RCA).
Champion reliability engineering best practices-including chaos engineering, capacity planning, incident management, and service readiness processes-to reduce operational risk and service disruption.
Partner closely with Product, Architecture, Security, and Development teams to ensure infrastructure design, operational frameworks, and run time practices support both current and future business needs.
Own and optimize incident response frameworks, post incident reviews, and reliability KPIs to continuously reduce incident frequency, impact, and mean time to recovery (MTTR).
Oversee budget planning, resource allocation, and vendor/technology evaluations to ensure cost effective and scalable infrastructure investment decisions.
Participate in periodic on call duties as required to support 24/7 operations.

All About You

5-10 years of experience as a technology leader in Site Reliability Engineering, Infrastructure Operations, or large scale infrastructure solutions.
Strong people and performance management skills, with a track record of coaching, mentoring, and motivating high performing technical teams.
Proven experience driving a culture of accountability, continuous improvement, and operational excellence.
Deep knowledge of core infrastructure technologies-database, compute, storage, networking, cloud platforms, virtualization, and containerization.
Strong understanding of infrastructure architecture principles, including lifecycle management, governance, and operational readiness.
Ability to lead teams through complex technical problems with a proven history of root cause analysis across multi disciplinary engineering groups.
Strong working knowledge of ITIL best practices-Change, Incident, Problem, and Service Management-and experience improving operational processes.
Skilled in driving data driven operational decisions, using SLIs/SLOs, KPIs, and service health metrics.
Knowledge of SRE principles-automation, observability, monitoring, capacity management, and resilience engineering-and experience implementing infrastructure as code and automation frameworks.
Excellent communication skills with the ability to translate complex technical issues into clear, actionable information for senior leaders and non technical stakeholders.
Strong collaboration mindset with a history of partnering effectively across Product, Engineering, Architecture, and Security teams.
Demonstrated success leading teams through large scale change initiatives such as platform migrations, cloud adoption, or major service transformations.

Corporate Security Responsibility

Abide by Mastercard's security policies and practices.
Ensure the confidentiality and integrity of the information being accessed.
Report any suspected information security violation or breach.
Complete all periodic mandatory security trainings in accordance with Mastercard's guidelines.

Email this Job

Apply Now

ShortList

Recommend to a friend