In this role, you will be working with other data scientists and engineers to help build our unstructured data extraction pipeline. You will be responsible for leveraging the latest machine learning and natural language processing technology to structure and normalize data from medical records. The ideal candidate will have no issue digging into messy data, working with clinical subject matter experts to develop annotation guidelines to produce high-quality machine learning models. As a data scientist at Ciitizen, you will have the opportunity to touch all parts of the machine learning project lifecycle from dataset curation to model deployment.
- 2+ years experience prototyping and deploying production NLP / Machine Learning models
- 3+ years experience with Python and solid software development skills
- Familiarity with common NLP/ML frameworks (Spacy, Pytorch, TensorFlow, Keras)
- Fluency in state-of-the-art machine learning techniques including Transformers, CNNs, RNNs to solve NLP tasks like Named Entity Recognition, Information Extraction, Named Entity Linking.
- Excellent ability to communicate technical information to non-technical audiences
Nice to have:
- Experience with medical ontologies (SNOMED CT, LOINC, RxNorm, etc)
- Experience working with clinical data
- Experience in Kubernetes / cloud-based micro-services
- Experience working with document images (OCR)
- Familiarity with Java
Ciitizen is a health technology platform that enables patients with cancer and rare neurologic disorders to collect, digitize, and share their health information. Funded by Andreessen-Horowitz and founded by serial entrepreneur and former Head of Health at Apple (Anil Sethi), it began with the simple idea that anyone should be able to manage their own medical records, but was motivated by the harder reality of helping Anil’s own sister fight her battle with cancer. Our mission is for patients to control their healthcare data so that they can control their healthcare options.