Senior Site Reliability Engineer - Glasgow
Posted 3 hours 6 minutes ago by Caspian One Ltd
We're hiring several Senior Site Reliability Engineers to help shape a Centre of Excellence for SRE practices across a global tech estate. This is a high-impact, hands on role where you'll engineer automation frameworks, elevate observability, and transform incident response at scale.
You'll be the go to expert guiding strategy, influencing culture, and driving adoption of SRE principles across diverse teams. From Scripting to architecting resilient systems, your technical leadership will directly improve performance, scalability, and availability.
What you'll do:
System Reliability & Performance: Ensure high availability, optimal performance, and scalability of services through proactive monitoring, maintenance, and capacity planning.
Incident Response & Prevention: Lead resolution and analysis of system outages. Implement preventative measures to reduce recurrence and improve system resilience.
Automation & Tooling: Develop scripts in Python or Go and tools to automate operational processes, reduce manual effort, and enhance efficiency.
Performance Optimization: Monitor system metrics, identify bottlenecks, and apply best practices for performance tuning and resource utilization.
Cross-Team Collaboration: Partner with development and infrastructure teams to embed reliability and scalability into the software development life cycle.