She/He is should be good problem solver and proficient at least in any one of language - Java/Scala/Python.
She/He should be good in any relational database system along with hands on experience. Added advantage is having experience in NoSQL, Hive etc.
She/He also contributes to the elaboration of the data policy and the structuring of its life cycle within the regulatory framework in force, in collaboration with the Chief Data Officer.
Her/His intervention scope centers on application systems in the data management and processing domain, and on platforms such as Big Data, IoT, etc.
She/He is responsible for overseeing and integrating data of a variety of types originating from these different sources and confirms the quality of the Data entering the Data Lake (she/he receives data, deletes duplicates, etc.).
Captures the structured and unstructured data produced within different applications or outside the entity
Integrates the components
Structures the data (semantics, etc.)
Maps the available components
Cleans up the data (deleting duplicates, etc.)
Validates the data
Where appropriate, he creates the data repository.
SKILLS PREFERRED :-
Technical :-
Technical expertise related to data.
Expert in writing Spark code with Python.
Experience in using Python module like Pandas/NumPy/Sci-kit
Expert in writing SQL queries.
Specialist in data coming from a database management system.
Technical expertise on building ETL data pipeline, extraction from different source like Azure storage, HDFS, Kafka topics, structured and unstructured files, Hive.
A Data Engineer understands how to apply technologies to solve big data problems and to develop innovative big data solutions. In order to be able to do this, the Data Engineer should have extensive knowledge in different programming or scripting languages [Python]
Good understanding in streaming (Kafka/Storm/Kinesis)
Good understanding on Big data components (HDFS, YARN, Map Reduce, Spark, Oozie)
Good understanding on Azure components (ADF, ADB, ADLS)
Good understanding Version controlling (Git, GitHub, azure DevOps).
Competency in Cloud environment is must [preferred, Azure]
Should have experience in transforming raw/unstructured data to clean data [Data Quality]