logo

JobNob

Your Career. Our Passion.

Data Scientist


Numentica LLC


Location

Coimbatore | India


Job description

This is a remote position.

Position Summary:

As a lead engineer, responsible to engineer efficient, adaptable, and scalable data pipelines to process structured and unstructured data, for cloud-based machine learning applications that enhance our Constituent experience will have a unique opportunity to use your quantitative skills in statistics, machine learning to answer high impact open questions, prototype cutting edge algorithms, engage in large scale experimentation, and drive business insights through data Understand company's data strategy, business priorities, and success measures to provide context for designing and building models. Essential Functions/Responsibility: Hands-on development on Large language models, particularly Flan-T5.
Designing machine learning pipelines that curate datasets, train, test, and validate models,  compare model performance to existing or previously cataloged models, and then package the  models and either deploy them to a real-time serving layer or handle batch scoring activities Conduct data profiling, cataloging, and data mapping for technical design using a use case-based approach that drives the construction of technical data flows. Assess the effectiveness and accuracy of data sources and data gathering techniques Design and develop Machine Learning and Statistical Models (preferably using R & Python), and other data analysis techniques to collect, explore, and extract insights from structured and unstructured data Collaborate with Microsoft Cloud Solution Architects and Data Platform Engineers in developing complex end-to-end Enterprise solutions on Microsoft Azure platform.

Minimum Qualifications

Bachelor’s degree, or MS degree in Computer Science, Informatics, Statistics, Applied Mathematics, Data Science, Machine Learning or equivalent. Ph. D. preferred.

Hands-on experience on Large language models, particularly Flan-T5 is mandatory. Extensive experience with NLP data processing, flan-t5 finetuning, peft-lora and Azure ML are mandatory. At least 7 years hands-on experience with data science, AI, and big data. Experience with data  engineering.

The candidate should be well-skilled working on AI models and ability to perform parameter-efficient fine-tuning (PEFT), LoRA, and preferably background in Azure ML. Also, we require an overlap of 4 hours (PST hours)  for this project. Experience with one or more Data Science/Machine Learning tools and frameworks (i.e., Python, Scikit, NLTK, NumPy, PyMongo, Pandas, TensorFlow, R, Spark) Machine learning using k-NN, naive bayes, decision trees, SVM experience required Knowledge and experience with Model Management, ideally using Azure ML service and/or MLFlow as well as deployment of models using Azure Kubernetes Service Deep understanding of statistical and machine learning modeling with experience applying these modeling techniques to business problems. Experience using data mining and statistical tools. Solid pattern recognition and predictive modeling skills. Knowledge of recommendation engines, scoring systems, A/B testing. Experience leveraging a variety of services to act as data sources such as Azure Data Lake, Azure Synapse, Analytics, Azure SQL, Azure EventHub/IoT Hub, etc. Knowledge of using automated machine learning (AutoML) frameworks to enhance productivity Extensive experience connecting to Data Platforms including data lakes, data warehouses, NoSQL databases, and APIs. Ability to set up data and experimental platforms Knowledge of data query and data processing tools (T- SQL) Communication is essential for this role, must be able to listen and understand the question and develop and deliver clear insights. Outstanding team player. Azure Certifications (DP-100 would be a HUGE Plus)




Job tags



Salary

All rights reserved