logo

JobNob

Your Career. Our Passion.

Data Engineer: Spark/Scala/Python


Marc Ellis


Location

UAE | United Kingdom


Job description

Job Title: Data Engineer: Spark/Scala/Python

Job Location: Dubai – UAE

Job Duration: 12 months extendable

Job Description:
We are looking for a highly skilled and motivated Spark Data Engineer to join our team. The ideal
candidate will have a strong background in Apache Spark, data ingestion, data processing, and data
integration, and will be responsible for developing and maintaining our dynamic data ingestion framework
using the Spark framework. The candidate should have expertise in building scalable, high-performance,
and fault-tolerant data processing pipelines using Spark, and be able to optimize Spark jobs for
performance and scalability. The candidate should also have experience in designing and implementing
data models, handling data errors, implementing data quality and validation processes, and integrating
Spark applications with other big data technologies in the Hadoop ecosystem.


Responsibilities:
• Develop and maintain a dynamic data ingestion framework using Apache Spark
• Implement data ingestion pipelines for batch processing and real-time streaming using Spark's

data ingestion APIs.
• Design and implement data models using Spark's DataFrame and Dataset APIs
• Optimize Spark jobs for performance and scalability, including caching, broadcasting, and data

partitioning techniques.
• Implement error handling and fault tolerance mechanisms to handle data errors, processing

failures, and system failures in Spark applications.
• Implement data quality and validation processes, including data profiling, data cleansing, and data

validation rules using Spark's data processing and data validation APIs.
• Integrate Spark applications with other big data technologies in the Hadoop ecosystem, such as

Hadoop, Hive, HBase, Kafka, and others.
• Ensure data security by implementing data encryption, data masking, and data access controls in

Spark applications.
• Use version control systems, such as Git, for source code management, and implement DevOps

practices, such as continuous integration, continuous delivery, and automated deployments, in
Spark application development workflows.


Qualifications:
• Bachelor's or master’s degree in computer science, Data Engineering, or a related field
• Strong proficiency in Apache Spark, including Spark Core, Spark SQL, Spark Streaming, and Spark

MLlib, with multiple production developments and deployment experience.
• Proficiency in either Scala or Python programming languages, with knowledge of functional

programming concepts
• Experience in developing and maintaining dynamic data ingestion frameworks using Spark

• Experience in data processing, data integration, and data modeling using Spark's DataFrame and
Dataset APIs

• Knowledge of performance optimization techniques in Spark, including caching, broadcasting, and
data partitioning

• Experience in implementing error handling and fault tolerance mechanisms in Spark applications
• Knowledge of data quality and validation techniques using Spark's data processing and data

validation APIs
• Familiarity with other big data technologies in the Hadoop ecosystem, such as Hadoop, Hive,

HBase, Kafka, etc.
• Experience in implementing data security measures in Spark applications, such as data encryption,

data masking, and data access controls.
• Strong problem-solving skills and ability to troubleshoot and resolve issues related to Spark

applications.
• Proficiency in using version control systems, such as Git, and implementing DevOps practices in

Spark application development workflows.
• Excellent communication and collaboration skills, with the ability to work effectively in a team-

oriented environment.


Job tags

Full time


Salary

All rights reserved