(associate) Data Engineer

Engineering · Full-time · New York, United States

Job description

The New York Stem Cell Foundation (NYSCF) Research Institute is a rapidly growing and highly successful nonprofit whose mission is to accelerate cures through stem cell research.

We are seeking an (Associate) Data Engineer who will be responsible for optimizing our in-house data processing, handling and storage workflows. Through the optimization and continued development of centralized workflows for data retrieval, this role will be responsible for deploying and building custom pipelines to ingest and process biological data generated by teams within the NYSCF Research Institute.  

You describe yourself as a skilled data engineer who has knowledge and experience in working with large datasets in Python and wide experience with databases including SQL. You will report directly to the Principal Scientist, AI and Data Science and though you’ll primarily interact with our data science and software engineering teams, you’ll be part of a larger team composed of hardware engineers and biologists. Level will be commensurate with experience.

What you'll do:

  • Develop, deploy, and document software that supports the analysis, annotation, and quality control pipelines for data
  • Work with both data science and software engineering teams, as well as end user biologists, on requirements for processing, analyzing, and generate appropriate logs and reports of data
  • Ingest, process, and perform first quality and balance controls of large datasets of microscopy images, both on premise and on cloud servers
  • Deploy existing pipelines for image processing for data standardization, quality control, characterization and feature extraction, both on premise and on cloud servers
  • Develop and implement novel data visualization strategies to summarize results and QC features
  • Ingest data from screens and process it to look for first impressions, outliers and nuances
  • Optimize the pipelines to be operated on different clusters and virtual machines
  • Centralize and optimize our existing workflows, manage data migration and distribution

What we're looking for:

  • B.S. or M.S. in computer science, engineering, data science, or mathematics
  • 2+ years of experience developing data models and implementing data mining, migration and management
  • Experience with pipeline deploying and resource optimization
  • Strong database experience, must know SQL
  • Strong programming experience, must know Python
  • Strong experience with cloud computing infrastructures on AWS, Google Cloud, etc. (AWS preferred)
  • Familiarity with Python libraries for data framing and visualization (eg Pandas, Seaborn, pyplot)
  • Familiarity with GPU programming and resource optimization and parallel computing
  • Experience with Git repository systems
  • Knowledge of image processing techniques (preferable)
  • Experience with microscopy images and fluorescence images (preferable)

Peers

View in org chart