Data Sourcing & Curation Associate
Location
Bangalore | India
Job description
As part of an ambitious India-wide program, you will help create unique, high-quality open-source speech and text datasets spanning every district to accelerate the state-of-the-art in NLP (Natural Language Processing). You will be part of the quality program, working closely with leading NLP researchers at IISc and the leadership of ARTPARK, in addition to NLP researchers at world s top tech companies. Specifically, you will
- Search and list all contacts (e.g., NGOs, local institutes etc.) in a certain set of districts and broadcast the audio and transcription validation task to them.
- Design task flyers and find out all ways to reach the individuals (local in a district) who could be interested to do it.
- Design task flyers and find out all ways to reach language experts (local in a district) who could be interested to do it.
- Contact (through phone call and WhatsApp) to applicants as well as those who did not apply
- Explain the task to the applicants through webinar and call
- Provide pilot task to the applicant and get it reviewed anonymously
- Select final set of applicants for both individual QC and language experts and provide them tasks
- Follow up with both individual QC and language experts to achieve the target QC done in the language
Skills and background:
- Should be a native and local language speaker of the local language of the districts
- Bengali, (primarily from Paschim Medinipur, Malda, Jalpaiguri, Purulia, Kolkata, Dakshin Dinajpur)
- Bhojpuri (primarily from Saran, East Champaran , Sitamarhi, Varanasi, Muzzaffarnagar, Etah, Nagaur)
- Hindi (primarily from Muzzaffarnagar, Etah, Nagaur)
- Santali, Surjapuri (primarily from Sahebganj, Jamtara, Kishanganj, Purnia)
- Bajjika, Angika (primarily from Bhagalpur, Vaishali, Lakhisarai, Supaul)
- Maithili, Magahi (primarily from Gaya, Jahanabad, Darbhanga, Madhubani)
- Marathi (primarily from Sindhudurga, Dhule, Nagpur, Pune, Aurangabad, Chandrapur, Solapur)
- Kannada (primarily from Mangalore, Gulbarga, Dharwad, Bellary, Mysore, Bijapur, Belgaum, Raichur)
- Telugu (primarily from Guntur, Chittoor, Vizag, Krishna, Anantpur, Srikakulam, Karimnagar, Nalgonda)
- Konkani (from Goa)
- Should be good at verbal and written communication both in English and local language
- Should be good with handling multiple people (remotely working) and get the task done by them.
Skills:
- Microsoft Office (Excel, Word, PowerPoint) and Google Office (Docs, Sheets, Slides)
- Office
- Google drive and form, google scripts, API usage.
Job tags
Salary