Leave us your email address and we'll send you all the new jobs according to your preferences.

Lead Cloud Infrastructure & Site Reliability Engineer

Posted 1 hour 53 minutes ago by Caspian One Ltd

£500 - £600 Daily
Contract
Not Specified
Factory Jobs
Yorkshire, Sheffield, United Kingdom, S5 9
Job Description

The Opportunity

We're partnering with a leading global organisation undergoing significant investment in its cloud and data platforms. As part of a high-performing engineering team, you'll play a key role in operating, improving, and automating a large scale Azure-based platform that supports critical cybersecurity and analytics capabilities.

This is an excellent opportunity for a hands-on Site Reliability Engineer or Cloud Infrastructure Engineer who enjoys solving complex platform challenges, driving automation, improving reliability, and reducing operational overhead across a modern Azure estate.

You'll work alongside senior engineers and platform specialists in an environment focused on continuous improvement, cloud engineering best practice, and platform resilience.

What You'll Be Doing

  • Engineering and supporting cloud infrastructure within Microsoft Azure
  • Building and managing Infrastructure-as-Code solutions using Terraform
  • Improving platform reliability, availability, scalability, and performance
  • Automating operational processes through PowerShell, Azure CLI, and other Scripting tools
  • Supporting CI/CD pipelines and deployment automation
  • Managing and troubleshooting Azure networking, connectivity, and security services
  • Supporting Kubernetes and containerised workloads
  • Monitoring platform health and driving proactive improvements
  • Working closely with development, data, and platform engineering teams
  • Reducing technical debt and improving operational efficiency
  • Providing production support and incident resolution across critical services
  • Maintaining engineering standards, documentation, and change controls

Required Experience

We're particularly interested in candidates with:

  • Strong Site Reliability Engineering (SRE) or Cloud Infrastructure Engineering experience
  • Deep Azure platform knowledge
  • Proven Terraform and Infrastructure-as-Code expertise
  • Experience with Azure DevOps and CI/CD practices
  • Strong PowerShell Scripting skills
  • Experience operating and supporting production cloud environments
  • Azure networking knowledge, including security controls, routing, DNS, and connectivity
  • Experience with monitoring and observability tools
  • Troubleshooting expertise across infrastructure, platform, and application layers
  • Strong automation mindset and passion for continuous improvement

Desirable Skills

Experience with any of the following would be advantageous:

  • Kubernetes
  • Azure Data Factory
  • Databricks
  • Synapse Analytics
  • Azure Data Lake Storage
  • Python development
  • Linux administration and Scripting
  • Grafana, Prometheus, Elasticsearch
  • Kafka or Event Hubs
  • Cybersecurity-focused environments
  • Financial services or other highly regulated sectors

What We're Looking For

  • A genuine SRE mindset with a focus on reliability and automation
  • Strong problem-solving and troubleshooting abilities
  • Excellent stakeholder engagement skills
  • A proactive approach to identifying and implementing improvements
  • Someone who enjoys working in a complex, enterprise-scale cloud environment
Email this Job