Lead PySpark Engineer
Posted 2 hours 24 minutes ago by SKILLFINDER INTERNATIONAL
Skill Profile
-
PySpark - Advanced (P3)
-
AWS - Advanced (P3)
-
SAS - Foundational (P1)
Key Responsibilities Technical Delivery
-
Design, develop, and maintain complex PySpark solutions for ETL/ELT and data mart workloads.
-
Convert and refactor Legacy SAS code into optimized PySpark solutions using automated tooling and manual refactoring techniques.
-
Build scalable, maintainable, and production-ready data pipelines.
-
Modernize Legacy data workflows into cloud-native architectures.
-
Ensure data accuracy, quality, integrity, and reliability across transformation processes.
Cloud & Data Engineering (AWS-Focused)
-
Develop and deploy data pipelines using AWS services such as EMR, Glue, S3, and Athena.
-
Optimize Spark workloads for performance, scalability, partitioning strategy, and cost efficiency.
-
Implement CI/CD pipelines and Git-based version control for automated deployment.
-
Collaborate with architects, engineers, and business stakeholders to deliver high-quality cloud data solutions.
Core Technical Skills PySpark & Data Engineering
-
5+ years of hands-on PySpark experience (Advanced level).
-
Strong ability to write production-grade, maintainable data engineering code.
-
Solid understanding of:
-
ETL/ELT design patterns
-
Data modelling concepts
-
Fact and dimension modelling
-
Data marts
-
Slowly Changing Dimensions (SCDs)
-
Spark Performance & Optimization
-
Expertise in Spark execution planning, partitioning strategies, and performance tuning.
-
Experience troubleshooting distributed data pipelines at scale.
Python & Engineering Quality
-
Strong Python programming skills with emphasis on clean, modular, and maintainable code.
-
Experience applying engineering best practices including:
-
Parameterization
-
Configuration management
-
Structured logging
-
Exception handling
-
Modular design principles
-
SAS & Legacy Analytics (Foundational)
-
Working knowledge of Base SAS, Macros, and DI Studio.
-
Ability to interpret and analyze Legacy SAS code for migration to PySpark.
Data Engineering & Testing
-
Understanding of end-to-end data flows, orchestration frameworks, pipelines, and change data capture (CDC).
-
Experience creating ETL test cases, unit tests, and data comparison/validation frameworks.
Engineering Practices
-
Proficient in Git workflows, branching strategies, pull requests, and code reviews.
-
Ability to document technical decisions, architecture, and data flows.
-
Experience with CI/CD tooling for data engineering pipelines.
AWS & Platform Expertise (Advanced)
Strong hands-on experience with:
-
Amazon S3
-
EMR and AWS Glue
-
Glue Workflows
-
Amazon Athena
-
IAM
-
Solid understanding of distributed computing and big data processing in AWS environments.
-
Experience deploying and operating large-scale data pipelines in the cloud.
Desirable Experience
-
Experience within banking, financial services, or other regulated industries.
-
Background in SAS modernization or cloud migration programs.
-
Familiarity with DevOps practices and infrastructure-as-code tools such as Terraform or CloudFormation.
-
Experience working in Agile or Scrum delivery environments.