EXL IT service management
Location
Gurgaon | India
Job description
Role: Data Engineer - Databricks
Job Location: Bangalore/Gurgaon (hybrid)
Shift Timing: 12:00PM IST - 10:30 PM IST
Experience: 3+ years
Job Summary:
Data Engineer (DE) is responsible for designing, developing, and maintaining data assets and data related products by liaising with multiple stakeholders.
Responsibilities:
- Collaborate with project stakeholders (client) to identify product and technical requirements. Conduct analysis to determine integration needs.
- Use different data warehousing concepts to build a data warehouse for reporting purpose.
- Build data pipelines to ingest and transform the data into our Data platform.
- Apply best approaches for large scale data movement, capture data changes and apply incremental data load strategies.
- Develop, implement, and tune large-scale distributed systems and pipelines that process large volume of data.
- Assist Data Science / Modelling teams in setting up data pipelines & monitoring daily jobs.
- Develop and test ETL components to high standards of data quality and act as hands-on development lead.
- Oversee and contribute to the creation and maintenance of relevant data artifacts (data lineages, source to target mappings, high level designs, interface agreements, etc.).
- Ensuring that developer responsibilities are being met by mentoring, reviewing code and test plans, verifying that design best practices as well as coding and architectural guidelines, standards, and frameworks.
- Work with stakeholders to understand the data requirements to design, develop, and maintain complex ETL processes.
- Create the data integration and data diagram documentation.
Qualifications (Must have):
- 3+ years as Data Engineer with proficiency in SQL, Python & PySpark programming.
- Strong knowledge on Databricks and related services/functionalities and how to utilize them across the DE & Analytics spectrum
- Strong knowledge on Hadoop, Hive, Databricks andRDBMS like Oracle, Teradata, SQL server etc
- Expectation is to write SQL to query metadata and tables from different data management system such as, Oracle, Hive, Databricks and Greenplum.
- Familiarity with big data technologies like Hadoop, Spark, and distributed computing frameworks.
- Expectation is to use Hue and run Hive SQL queries, schedule Apache Oozie jobs to automate the data workflows.
- Degree in Data Science, Statistics, Computer Science or other related fields or an equivalent combination of education and experience.
- Proficiency in at least one cloud platform (AWS, Azure, GCP) & developing ETL processes using Azure Data Factory, big data processing and analytics with Databricks.
- Expertise in building data pipelines in big data platforms Good understanding of Data warehousing concepts
- Good working experience of communicating with the stakeholders and collaborate effectively with the business team for data testing.
- Expectation is to have strong problem-solving and troubleshooting skills.
- Expectation is to establish comprehensive data quality test cases, procedures and implement automated data validation processes.
- Strong communication, problem solving and analytical skills with the ability to do time management and multi-tasking with attention to detail and accuracy.
- Strong business acumen & demonstrated aptitude for analytics that incite action.
Job tags
Salary