Reviewing designing developing ETL jobs to ingest data into Data Lake load data to data marts;
extract data to integrate with various business applications.
Parse unstructured data semi structured data such XML etc.
Design and develop efficient Mapping and workflows to load data to Data Marts
Map XML DTD schema in Python (customized table definitions)
Write efficient queries and reports in Hive or Impala to extract data on ad hoc basis for data analysis.
Identify the performance bottlenecks in ETL Jobs and tune their performance by enhancing or redesigning them.
Responsible for performance tuning of ETL mappings and queries.
import tables and all necessary lookup tables to facilitate the ETL process required to process daily XML files in addition to processing the very large (multiterabytes) historical XML data files