Senior Site Reliability Engineer

Posted 3 hours 14 minutes ago by Prolific - UK Job Board?

Permanent
Full Time
Other
Not Specified, United Kingdom
Job Description

Prolific is not just another player in the AI space - we are the architects of the human data infrastructure that's reshaping the landscape of AI development. In a world where foundational AI technologies are increasingly commoditized, it's the quality and diversity of human generated data that truly differentiates products and models.

The role

As a Site Reliability Engineer, you will focus on ensuring that the Prolific platform is resilient, scalable and highly performant for our customers. You'll ensure stability and reliability across our platform and ensure our observability is at the right standard, and dive into incident remediation where needed in collaboration with service delivery and teams. You will work with cross functional teams to embed SRE principles, upskill teams in key areas such as Kubernetes and observability.

What you'll bring to the role
  • 5+ years with Google Cloud Platform, GKE and the Kubernetes ecosystem, with experience with Terraform and Terragrunt
  • Strong programming skills in Python
  • Strong experience in observability principles and tooling
  • Experience in GitOps flows and platforms for Kubernetes, such as ArgoCD
  • Deep understanding of system architecture and scalability principles
  • Strong collaboration and communication skills to work with cross functional teams.
What you'll be doing in the role
  • Develop and maintain highly available infrastructure using modern infra as code techniques, with a focus on Terragrunt and Terraform.
  • Manage and optimise Kubernetes clusters and their workloads with a focus on reliability and performance.
  • Participate in incident response and remediation, working with relevant product teams and stakeholders to resolve production issues efficiently, including creating and maintaining runbooks.
  • Review and optimise other areas of our tooling stack, such as CICD or release strategies.
  • Foster a culture of continuous improvement, such as enhancing documentation and upskilling teams in cloud architecture and Kubernetes.
  • Improve observability and alerting systems across our application and infrastructure, ensuring proactive detection of system degradation.
  • Collaborate with Engineering teams to foster an SRE culture, including contributing to defining SLOs, SLAs and error budgets.
  • Design and implement automation strategies to ensure managed services remain up to date, secure and performant.
  • Lead and support initiatives that automate processes to improve system efficiency, resilience and reduce toil.
  • Organise, support and respond to on call incidents.
Key Technologies
  • Cloud Platforms: Google Cloud Platform and AWS
  • Frameworks: Vue.js, Django Rest Framework, container based and serverless architectures
  • Databases: MongoDB and DynamoDB
  • DevOps and Monitoring: CircleCI, GitHub Actions, Kubernetes, Celery, EventBridge and DataDog
Why Prolific is a great place to work

We've built a unique platform that connects researchers and companies with a global pool of participants, enabling the collection of high quality, ethically sourced human behavioural data and feedback. This data is the cornerstone of developing more accurate, nuanced, and aligned AI systems.

We believe that the next leap in AI capabilities won't come solely from scaling existing models, but from integrating diverse human perspectives and behaviours into AI development. By providing this crucial human data infrastructure, Prolific is positioning itself at the forefront of the next wave of AI innovation - one that reflects the breadth and best of humanity.

Working for us will place you at the forefront of AI innovation, providing access to our unique human data platform and opportunities for groundbreaking research. Join us to enjoy a competitive salary, benefits and remote working within our impactful, mission driven culture.

By submitting your application, you agree that Prolific may collect your personal data for recruiting and global organisation planning. Prolific's Candidate Privacy Notice explains what personal information Prolific may process, where Prolific may process your personal information, its purposes for processing your personal information, and the rights you can exercise over Prolific use of your personal information.