Degree in computer science, engineering, or similar fields
Skill Set: Python, PySpark
Primary Responsibilities
* Responsible for designing, developing, testing and supporting data pipelines and applications
* Industrialize data feeds
* Creates data pipelines into existing systems
* Improves data cleansing and facilitates connectivity of data and applied technologies between both external and internal data sources.
Collaboration with Data Scientist
Establishes a continuous quality improvement process and to systematically optimizes data quality
Translates data requirements from data users to ingestion activities
B.Tech/ B.Sc./M.Sc. in Computer Science or related field and 3+ years of relevant industry experience
Agile mindset and a spirit of initiative
Interest in solving challenging technical problems
Experience with test driven development and CI/CD workflows
Knowledge of version control software such as Git and experience in working with major hosting services (e. g. Azure DevOps, Github, Bitbucket, Gitlab)
Experience in working with cloud environments such as AWSe especially creating serverless architectures and using infrastructure as code facilities such as CloudFormation/CDK, Terraform, ARM.
Hands-on experience in working with various frontend and backend languages (e.g., Python, R, Java, Scala, C/C++, Rust, Typescript, …)
Knowledge of container technologies such as Docker and Kubernetes
Experience with and understanding of Apache Spark and the Hadoop ecosystem
Knowledge of workflow engines such as Apache Airflow, Oozie, Kubeflow
Knowledge of protocols for authentication such as SAML and OAuth2
Experience in creating and interfacing with RESTful APIs
Experience in creating productive and robust ETL pipelines for batch as well as streaming ingestion
Basic knowledge of Statistics and Machine Learning is favorable