Lead PySpark Engineer

Posted 2 hours 24 minutes ago by SKILLFINDER INTERNATIONAL

Contract

Not Specified

Other

London, United Kingdom

Job Description

Skill Profile

Key Responsibilities Technical Delivery

Design, develop, and maintain complex PySpark solutions for ETL/ELT and data mart workloads.
Convert and refactor Legacy SAS code into optimized PySpark solutions using automated tooling and manual refactoring techniques.
Build scalable, maintainable, and production-ready data pipelines.
Modernize Legacy data workflows into cloud-native architectures.
Ensure data accuracy, quality, integrity, and reliability across transformation processes.

Cloud & Data Engineering (AWS-Focused)

Develop and deploy data pipelines using AWS services such as EMR, Glue, S3, and Athena.
Optimize Spark workloads for performance, scalability, partitioning strategy, and cost efficiency.
Implement CI/CD pipelines and Git-based version control for automated deployment.
Collaborate with architects, engineers, and business stakeholders to deliver high-quality cloud data solutions.

Core Technical Skills PySpark & Data Engineering

Spark Performance & Optimization

Expertise in Spark execution planning, partitioning strategies, and performance tuning.
Experience troubleshooting distributed data pipelines at scale.

Python & Engineering Quality

Strong Python programming skills with emphasis on clean, modular, and maintainable code.
Experience applying engineering best practices including:
- Parameterization
- Configuration management
- Structured logging
- Exception handling
- Modular design principles

SAS & Legacy Analytics (Foundational)

Data Engineering & Testing

Understanding of end-to-end data flows, orchestration frameworks, pipelines, and change data capture (CDC).
Experience creating ETL test cases, unit tests, and data comparison/validation frameworks.

Engineering Practices

Proficient in Git workflows, branching strategies, pull requests, and code reviews.
Ability to document technical decisions, architecture, and data flows.
Experience with CI/CD tooling for data engineering pipelines.

AWS & Platform Expertise (Advanced)

Strong hands-on experience with:

Amazon S3
EMR and AWS Glue
Glue Workflows
Amazon Athena
IAM
Solid understanding of distributed computing and big data processing in AWS environments.
Experience deploying and operating large-scale data pipelines in the cloud.

Desirable Experience

Experience within banking, financial services, or other regulated industries.
Background in SAS modernization or cloud migration programs.
Familiarity with DevOps practices and infrastructure-as-code tools such as Terraform or CloudFormation.
Experience working in Agile or Scrum delivery environments.