Senior Azure SaaS Reliability & Support Engineer
Posted 2 hours 37 minutes ago by Boss Professional Services
Senior Azure SaaS Reliability & Support Engineer - Hybrid (2 days a week in Kingston) - ASAP Start
You will be the bridge between support, engineering, and cloud operations
- Investigating and fixing complex application and infrastructure issues.
- Monitoring capacity, performance, and error budgets across all deployments.
- Designing automation and tooling to improve reliability and reduce manual work.
Your Responsibilities and Tasks
1. Environment Health & Incident Response
- Monitor ST and MT environments for server performance, response times, error rates, and application health.
- Detect and resolve database issues, stalled file processing, or misplaced storage objects.
- Use Azure diagnostics and telemetry to troubleshoot and resolve complex incidents.
- Provide third-line support for escalated customer cases, collaborating with development for code-level fixes.
2. Reliability Engineering (Fleet Level)
- Maintain uptime, performance, and scalability across all ST and MT deployments.
- Define and track service-level objectives (SLOs) and error budgets for different environment types.
- Perform capacity planning for Servers, databases, and storage, scaling resources before issues occur.
- Identify systemic patterns causing downtime and implement fixes at scale.
3. Automation & Tooling
- Build scripts and automation (PowerShell, C#, Azure Functions, Logic Apps) to detect and remediate common application or infrastructure issues.
- Automate environment health checks and reporting.
- Develop self-healing routines for recurring problems.
4. Monitoring & Reporting
- Implement and maintain Azure Monitor/Application Insights/Log Analytics dashboards for:
- Environment uptime & performance
- SLA compliance & error budget tracking
- Incident trends and recurring issue analysis
- Provide regular reliability reports and improvement recommendations to stakeholders.
5. Continuous Improvement & Knowledge Sharing
- Feed recurring issues and systemic risks into the continuous improvement programme.
- Contribute to post-incident reviews with actionable follow-ups.
- Maintain troubleshooting guides and technical runbooks for common issues.
Success Measures (KPIs)
- Uptime: = target SLO % for ST and MT environments.
- Error Budget Burn Rate: Maintain within agreed thresholds.
- Incident Metrics:
- Reduce MTTR for P1/P2 incidents.
- Reduce recurrence rate of common issues.
- Automation Impact:
- Number of recurring issues automated/self-healed.
- Hours saved through automation vs manual intervention.
- Customer Impact:
- Reduced escalations from L1/L2 support.
- Improved customer satisfaction for technical cases.
Your Qualifications, Technical Skills and Experience
Essential
Technical Skills
- 3+ years in third-line support, SRE, or cloud operations for enterprise SaaS.
- Proven track record in incident resolution and root cause analysis.
- Experience working with both multi-tenant and single-tenant cloud architectures.
- Strong background in supporting C#/.NET Core/MVC web applications with SQL Server backends and Azure Blob Storage.
- Advanced Azure diagnostics (Application Insights, Log Analytics, Kusto Query Language).
- Proficient in SQL for investigation and remediation.
- Scripting and automation skills in PowerShell and/or C#.
- Understanding of Azure components: App Services, VMs, SQL DB, Blob Storage, scaling strategies.
- Experience in capacity planning, SLOs, and error budget management
- Azure Monitor, Application Insights, Log Analytics, Azure Data Explorer (KQL), Azure Functions, Logic Apps, PowerShell, C#, SQL Server Management Studio, Azure Storage Explorer, Power BI (for reporting).
Desirable
Your Personal Skills and Attributes
- Exceptional problem-solving skills with strong attention to detail.
- Ability to clearly document findings and communicate with technical and non-technical audiences.
- Calm under pressure during high-priority incidents.
- Collaborative mindset, working closely with support, dev, and ops teams.
This job description is not intended to be an exhaustive list of duties and responsibilities. You may be expected to perform different tasks as the needs of the business and your role evolve. Your job description will be reviewed and updated accordingly.
Your Benefits
- Private Medical Insurance: Your health matters, and we've got you covered.
- Birthday Off: Celebrate your day your way - it's on us.
- Holiday Purchase: Need more downtime? Purchase up to an additional 5 days of holiday.
- Employee Assistance Programme: Confidential 24/7 helpline and support for you and your immediate family.
- Time for You: We value your personal time. That's why we aim to finish work at 2pm on Fridays.
- Better Working: We embrace hybrid working and where it is operationally practicable, we support employees splitting their working time between the office and home.
- Pension: Plan for tomorrow with our pension scheme via NEST.
If you are looking for your next opportunity please contact me