Location
Secunderabad | India
Job description
Â
- Building and Implementing data ingestion and curation process developed using Big data tools such as Spark (Scala/python), Data bricks, Delta lake, Hive, Pig, Spark, HDFS, Oozie, Sqoop, Flume, Zookeeper, Kerberos, Sentry, Impala etc.
- Ingesting huge volumes data from various platforms for Analytics needs and writing high-performance, reliable and maintainable ETL code.
- Monitoring performance and advising any necessary infrastructure changes.
- Defining data security principals and policies using Ranger and Kerberos.
- Assisting application developers and advising on efficient big data application development using cutting edge technologies.
Knowledge, Skills and Abilities
Education
- Bachelors degree in Computer Science, Engineering, or related discipline
Experience - 4 years of solutions development experience
- Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUST
- Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
- Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.
- Strong analytic skills related to working with unstructured datasets.
- Experience with building stream-processing systems, using solutions such as Storm or Spark-Streaming
- Experience in any model management methodologies.
Knowledge and skills (general and technical)
Required:
- Proficiency and extensive experience in HDFS, Hive, Spark, Scala, Python, Databricks/Delta Lake, Flume, Kafka etc.
- Analytical skills to analyze situations and come to optimal and efficient solution based on requirements.
- Performance tuning and problem-solving skills is a must
- Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
- Hands on development experience and high proficiency in Java or, Python, Scala and SQL
- Experience designing multi-tenant, containerized Hadoop architecture for memory/CPU management/sharing across different LOBs
Preferred:
- Proficiency and extensive Experience with Spark & Scala, Python and performance tuning is a MUST
- Hive database management and Performance tuning is a MUST. (Partitioning / Bucketing)
- Strong SQL knowledge and data analysis skills for data anomaly detection and data quality assurance.
- Knowledge in data science is a plus
- Experience with Informatica PC/BDM 10 and implemented push down processing into Hadoop platform, is a huge plus.
- Proficiency is using tools Git, Bamboo and other continuous integration and deployment tools
- Exposure to data governance principles such as Metadata, Lineage ( Colibra /Atlas) etc.
Other Requirements (licenses, certifications, specialized training - if required) - Experience in any model management methodologies.
- Experience working on Data Science projects and exposure to data science.
Working Relationships
Internal Contacts (and purpose of relationship): MetLife GTO Engineering , Digital Office and Global Data and analytics
External Contacts
(and purpose of relationship) - If Applicable
Job tags
Salary