Location
Gurgaon | India
Job description
Job Description : ETL Data Engineer
Responsibilities
Design, Develop and maintain ETL pipelines using Pyspark in Azure Databricks using delta tables.
Create build from Github and release pipeline for Ingestion and Databricks using Azure Devops / Harness
Monitor Performance of ETL Jobs, resolve any issue that arose and improve the performance metrics as needed.
Diagnose system performance issue related to data processing and implement solution to address them.
Collaborate with other teams to ensure successful integration of data pipelines into larger system architecture requirement.
Maintain integrity and quality across all pipelines and environments.
Understand and follow secure coding practice to make sure code is not vulnerable
Skills
Should have Strong background in Software development with experience in ingest, transform and store data from large datasets using Pyspark in Azure Databricks with strong knowledge on distributed computing concepts.
Must have hands on experience in designing and developing ETL Pipelines in Pyspark in Azure Databricks with strong python scripting exposure like list comprehensions, Dictionary variables etc.
Must have minimum 5 years of exposure and good proficiency in data warehousing concepts.
Proficient in SQl and database Design concepts.
Hands on experience with good proficiency in Delta table and delta file operations like merge, Insert override, Partition overrides etc.
Hands on Experience in CICD in Azure DevOps/Harness and ADF/Stone branch for orchestration.
Knowledge of Azure cloud computing platform with Azure Synapse, ADLS.
Knowledge on GitHub and build management.
Passion for data and experience working within a data driven organization
Excellent presentation, communication (oral & written), and relationship building skills, across all levels of management
Job tags
Salary
Rs 22 - 28 lakhs p.a.