Leave us your email address and we'll send you all the new jobs according to your preferences.

QA Engineer - Load Testing Specialist (2 months contract)

Posted 2 days 3 hours ago by Monolithai

£100,000 - £125,000 Annual
Permanent
Full Time
Other
London, United Kingdom
Job Description
Position Overview

Monolith AI is seeking an experienced QA Engineer to lead load testing efforts for a critical system

release focused on improving concurrency and high request load handling.

This fast-paced, short-term engagement requires someone who can quickly understand complex distributed systems, design comprehensive load tests, and work collaboratively with a rapidly growing engineering team to ensure our new environment meets performance requirements.

Primary Responsibilities
  1. Design and Implement Automated Load Testing Framework

    • Develop comprehensive load tests for FastAPI endpoints, Temporal workflows/activities, and AWS service interactions
    • Create realistic test scenarios simulating concurrent workflow execution patterns, including graph-based workflow orchestration
    • Build automated test suites that measure system behavior under varying concurrency levels and request loads
  2. Performance Analysis and Bottleneck Identification

    • Monitor and analyze system performance across the entire stack (API layer, Temporal workers, AWS services)
    • Identify concurrency limitations in Temporal workflow execution, AWS service limits (Athena, ECS), and inter component communication
    • Document performance characteristics including response times, throughput limits, and failure modes under load
  3. Collaborate on Non Functional Requirements (NFR) Definition

    • Work with Customer Success and Product teams to understand business requirements and translate them into measurable performance criteria
    • Iterate on acceptable concurrency thresholds, latency targets, and throughput requirements
    • Validate that proposed NFRs are realistic and achievable given architectural constraints
  4. System Documentation and Knowledge Extraction

    • Understanding of the existing system through code review, discussions with the development team, and exploratory testing
    • Create clear documentation of test methodologies, results, and recommendations for future testing
  5. Recommendation and Optimization Guidance

    • Provide actionable recommendations for removing identified bottlenecks
    • Suggest configuration optimizations for Temporal (worker pools, task queues) and AWS services (Athena concurrency, ECS capacity)
  6. Rapid Communication and Status Reporting

    • Maintain daily/frequent communication with the Tech Lead regarding project progress, blockers, and findings
    • Quickly elevate issues that could impact the aggressive timeline
    • Present findings belo recommendations to technical and non technical stakeholders
  7. Cross Component Integration Testing

    • Test complex scenarios involving graph execution triggering node workflows across multiple system boundaries
    • Validate S3 read/write operations under concurrent load
    • Ensure inter component communication (API Temporal, Temporal Activity API triggers) performs reliably at scale
Key Performance Indicators
  1. Test Coverage and Execution

    • Complete automated load test suite covering all critical components within first 3 weeks
    • Execute baseline and progressive load tests identifying maximum sustainable concurrency levels
  2. Bottleneck Identification and Impact

    • Identify and document top 5 7 performance bottlenecks with clear impact analysis
    • Provide actionable remediation recommendations with estimated effort and impact for each bottleneck
  3. NFR Definition and Validation

    • Collaborate with stakeholders to define measurable NFRs within first 2 weeks
    • Validate that the system meets or document gaps against agreed NFR criteria by project end
  4. Documentation and Knowledge Transfer

    • Deliver comprehensive test documentation, results analysis, and system performance characteristics
    • Conduct knowledge transfer593 sessions ensuring team can maintain and extend testing framework
  5. Project Velocity and Communication

    • Meet weekly milestone targets in this fast paced 2 month engagement
    • Maintain proactive communication rhythm (daily stand ups, weekly detailed reports to Tech Lead)
Required Qualifications

Experience:

  • 4+ years of experience in QA/performance testing roles
  • 2+ years of hands on experience with load testing distributed systems and microengeanceamp; architectures
  • Proven experience with load testing tools (e.g., k6, JMeter, Locust, Gatling, Artillery)
  • Experience testing workflow orchestration systems (Temporal, Airflow, Prefect, or similar)
  • Demonstrated ability to test systems integrating with AWS services (particularly Athena, ECS, S3)

Technical Skills:

  • Strong proficiency in Python (required for test automation and working with FastAPI, Temporal)
  • Experience with REST APIகர் testing and performance validation
  • Understanding of distributed systems concepts: concurrency, queueing询,eturn, backpressure, rate limiting
  • Familiarity with AWS infrastructure and service limits
  • ateway
  • Experience with monitoring and observability tools (Prometheus, Grafana, Datadog, or similar)
  • Proficiency sombra with Git and CI/CD pipelines
  • Ability to read and understand code in order to design effective tests

Immediate Availability:

  • Ability to start in early January 2025 and commit to focused 3 month engagement
  • Availability for full time contract work during project duration
Preferred Qualifications
  • Direct experience with Temporal (workflows, activities, workers)
  • Experience with containerized workloads and Docker/ECS
  • Prior歎work in fast paced startup or scale up environments
  • Experience with infrastructure as code (Terraform, CloudFormation)
  • Background in Site Reliability Engineering (SRE) or DevOps practices
  • Previous contract/consulting experience with rapid knowledge acquisition
  • Experience with graph based workflow systems or DAG execution engines
  • Knowledge of AWS service limits and optimization strategies
Essential Soft Skills
  • Self Direction and Initiative - Ability to operate independently in an ambiguous, fast moving environment with minimal documentation; Proactive problem solving mindset; Comfortable making pragmatic decisions quickly in a time constrained project
  • Communication and Collaboration - Exceptional communication skills for extracting knowledge through conversations with existing team members; Ability to translate technical findings into clear, actionable recommendations for diverse audiences; Comfortable asking clarifying questions and challenging assumptions respectfully; Strong written communication for documentation and status updates
  • Adaptability and Learning Agility - Quick learner who can rapidly understand complex, poorly documented systems; Flexible and comfortable with changing priorities in a 15 person team that is doubling in size; Thrives in fast paced environments with aggressive timelines; Comfortable with "good enough" when perfection isn't achievable under constraints
  • Pragmatism and Results Orientation - Focused on delivering practical, actionable outcomes within tight timeframes; Understands balance between thoroughness and speed in a 2 month engagement; Comfortable with "good enough" when perfect isn't achievable within constraints
  • Stakeholder Management - Skilled at managing expectations with technical leadership about realistic timelines and trade offs; Diplomatic when delivering difficult news about performance limitations or bottlenecks; Collaborative approach when working with CS and Product on NFR definition
Key Challenges in This Role
  1. Rapid Knowledge Acquisition with Limited Documentation

    • The existing system lacks comprehensive documentation; requires quick building of understanding загруз through code review, system exploration, and frequent discussions with the development team
    • Success requires comfort with ambiguity and strong investigative skills
  2. Aggressive Timeline with High Impact

    • A 3 month timeline to design tests, execute comprehensive load testing, identify bottlenecks, and deliver actionable recommendations is extremely tight
    • Must balance thoroughness with pragmatism; prioritize ruthlessly to ensure critical areas are covered
  3. Complex Distributed System with Multiple Integration Points

    • The system involves multiple layers (FastAPI, Temporal, AWS services) with complex inter component communication patterns (graph node workflows)
    • Must understand the entire stack to design realistic, comprehensive load tests that expose real world bottlenecks
Email this Job