(associate) Data Engineer

Engineering · Full-time · New York, United States

Job description

The New York Stem Cell Foundation (NYSCF) Research Institute is a rapidly growing and highly successful nonprofit whose mission is to accelerate cures through stem cell research.

We are seeking an (Associate) Data Engineer who will be responsible for optimizing our in-house data processing, handling and storage workflows. Through the optimization and continued development of centralized workflows for data retrieval, this role will be responsible for deploying and building custom pipelines to ingest and process biological data generated by teams within the NYSCF Research Institute.

You describe yourself as a skilled data engineer who has knowledge and experience in working with large datasets in Python and wide experience with databases including SQL. You will report directly to the Principal Scientist, AI and Data Science and though you’ll primarily interact with our data science and software engineering teams, you’ll be part of a larger team composed of hardware engineers and biologists. Level will be commensurate with experience.

What you'll do:

Develop, deploy, and document software that supports the analysis, annotation, and quality control pipelines for data
Work with both data science and software engineering teams, as well as end user biologists, on requirements for processing, analyzing, and generate appropriate logs and reports of data
Ingest, process, and perform first quality and balance controls of large datasets of microscopy images, both on premise and on cloud servers
Deploy existing pipelines for image processing for data standardization, quality control, characterization and feature extraction, both on premise and on cloud servers
Develop and implement novel data visualization strategies to summarize results and QC features
Ingest data from screens and process it to look for first impressions, outliers and nuances
Optimize the pipelines to be operated on different clusters and virtual machines
Centralize and optimize our existing workflows, manage data migration and distribution

What we're looking for:

B.S. or M.S. in computer science, engineering, data science, or mathematics
2+ years of experience developing data models and implementing data mining, migration and management
Experience with pipeline deploying and resource optimization
Strong database experience, must know SQL
Strong programming experience, must know Python
Strong experience with cloud computing infrastructures on AWS, Google Cloud, etc. (AWS preferred)
Familiarity with Python libraries for data framing and visualization (eg Pandas, Seaborn, pyplot)
Familiarity with GPU programming and resource optimization and parallel computing
Experience with Git repository systems
Knowledge of image processing techniques (preferable)
Experience with microscopy images and fluorescence images (preferable)

Org chart

View in org chart

Manager

Bianca Migliori

Principal Scientist, Data Science

Peers

View in org chart

A panel showing how The Org can help with contacting the right person.

(associate) Data Engineer

Job description

Org chart

Manager

Peers

Related jobs

Senior Validation Engineer

Field Engineer

Sr. Sustaining Firmware Engineer