Senior Site Reliability Engineer

Posted 14 hours 43 minutes ago by Stott and May

Permanent
Full Time
Other
South Glamorgan, United Kingdom
Job Description
Senior Site Reliability Engineer

Start: ASAP

Duration: 6 months +

Location: 3 days per week in central London

Rate: negotiable DoE, inside IR35

We're seeking a skilled Site Reliability Engineer to join a high-performing technology team within a leading professional services environment. You'll design and maintain reliable, secure, and scalable cloud infrastructure while driving automation and DevOps best practices.

Key Responsibilities
  • Build and manage CI/CD pipelines, release automation, and infrastructure as code (IaC).
  • Develop resilient cloud environments and optimize monitoring, alerting, and performance.
  • Maintain observability tools (Prometheus, Grafana, Datadog, Splunk, etc.).
  • Manage incident response, troubleshooting, and root cause analysis.
  • Collaborate with teams to improve reliability, scalability, and deployment efficiency.
  • Document and standardize technical processes and runbooks.
Skills & Experience
  • 4+ years in SRE, DevOps, or related roles.
  • Advanced Kubernetes (EKS, GKE, AKS, or RKE); proficient with Kubectl and Helm.
  • Strong containerization (Docker, microservices with Java/Spring Boot).
  • CI/CD tools: Jenkins, GitHub Actions, Azure DevOps, ArgoCD.
  • IaC: Terraform or Pulumi (module development preferred).
  • Observability & monitoring: Prometheus/Grafana, Datadog, OpsGenie, or similar.
  • Familiar with Git workflows, Python/Go scripting, and security tools (Vault, Qualys, etc.).