Site Reliability Engineer III - Support Engineering
Posted 1 day 3 hours ago by JPMorgan Chase & Co.
There's nothing more exciting than being at the center of a rapidly growing field in technology and applying your skillsets to drive innovation and modernize the world's most complex and mission critical systems.
As a Site Reliability Engineer III at JPMorgan Chase within the Chief Technology Office, you will solve complex and broad business problems with simple and straightforward solutions. We are seeking a Site Reliability Engineer (SRE) to help drive reliable, scalable, and intelligent platform operations in a global financial environment. This role combines technical support, DevOps practices, and SRE principles-including on call incident response, automation, and a customer first mindset. You will work with modern tools to ensure our applications and services remain robust and available.
Job Responsibilities- Collaborate with engineering, support, and operations teams to maintain and improve the reliability of mission critical applications.
- Participate in incident management, troubleshooting, and continuous improvement.
- Help implement automation and monitoring solutions.
- Be part of an on call rotation, requiring effective action during production incidents.
- Share knowledge, follow best practices, and contribute to a culture of learning and innovation.
- Communicate clearly, solve problems proactively, and focus on customer needs.
- Formal training or certification on SRE & Application Support concepts and proficient applied experience
- SRE & Application Support: Experience in SRE, DevOps, or application support roles, with knowledge of SLIs/SLOs, incident response, and troubleshooting.
- Observability & Monitoring: Familiarity with monitoring and observability tools (e.g., Grafana, Prometheus, Splunk, Open Telemetry).
- DevOps Tooling: Hands on experience with CI/CD pipelines (Jenkins, including global libraries), infrastructure as code (Terraform), version control (Git), containerization (Docker), and orchestration (Kubernetes).
- Cloud & Automation: Exposure to cloud platforms (AWS, GCP, or Azure) and automating infrastructure and deployments.
- On Call & Incident Management: Willingness to participate in on call rotation and respond to production incidents.
- Problem Solving & Communication: Ability to break down issues, document solutions, and communicate effectively with team members and customers.
- Financial/Regulated Experience: Experience in banking, fintech, or regulated environments.
- Resilience Engineering: Participation in game days or chaos engineering.
- Mentorship: Interest in sharing knowledge and best practices with peers.