Machine Learning Engineer

Engineering · Palo Alto, US

Job description

About Arc Institute

The Arc Institute is a new scientific institution that conducts curiosity-driven basic science and technology development to understand and treat complex human diseases. Headquartered in Palo Alto, California, Arc is an independent research organization founded on the belief that many important research programs will be enabled by new institutional models. Arc operates in partnership with Stanford University, UCSF, and UC Berkeley.

While the prevailing university research model has yielded many tremendous successes, we believe in the importance of institutional experimentation as a way to make progress. These include:

  • Funding: Arc will fully fund Core Investigators’ (PIs’) research groups, liberating scientists from the typical constraints of project-based external grants.
  • Technology: Biomedical research has become increasingly dependent on complex tooling. Arc Technology Centers develop, optimize and deploy rapidly advancing experimental and computational technologies in collaboration with Core Investigators. 
  • Support: Arc aims to provide first-class support—operationally, financially and scientifically—that will enable scientists to pursue long-term high risk, high reward research that can meaningfully advance progress in disease cures, including neurodegeneration, cancer, and immune dysfunction.
  • Culture: We believe that culture matters enormously in science and that excellence is difficult to sustain. We aim to create a culture that is focused on scientific curiosity, a deep commitment to truth, broad ambition, and selfless collaboration.

Arc scaled to nearly 100 people in its first year. With $650M+ in committed funding and a state of the art new lab facility in Palo Alto, Arc will continue to grow quickly to several hundred in the coming years.

 

About the position

We are searching for an experienced and collaborative machine learning research engineer focused on building biological foundation models. This role will contribute to the development and application of Arc’s frontier DNA foundation model (Evo), Arc’s Virtual Cell Initiative focusing on developing cell biological models capable of predicting the impact of perturbations and stimuli, and other projects in the context of Institute-wide machine learning efforts.

About you

- You are an innovative machine learning engineer with experience in training and evaluating large deep learning models.

- You are excited about working closely with a multidisciplinary team of computational and experimental biologists at Arc to achieve breakthrough capabilities in biological prediction and design tasks.

- You are a strong communicator, capable of translating complex technical concepts to researchers outside of your domain.

- You are a continuous learner and are enthusiastic about developing and evaluating a model that impacts many biological disciplines.

In this position, you will

- Contribute to optimization and scaling of state-of-the-art foundation models developed in collaboration with other ML researchers and scientists at Arc with the goal of understanding and designing complex biological systems.

- Engineer large-scale distributed model pretraining and pipelines for efficient model inference. 

Enable robust systematic evaluation of trained models.

- Stay up-to-date with the latest advancements in technologies for large-scale sequence modeling and alignment, and implement the most promising strategies to ensure the underlying models remain state-of-the-art.

- Work with experimental biologists to ensure that the developed models are grounded in biologically meaningful problems and evaluations.

- Publish findings through journal publications, white papers, and presentations (both internal to Arc and external).

- Foster internal and external collaborations centered on generative design of biological systems at Arc Institute.

- Commit to a collaborative and inclusive team environment, sharing expertise and mentoring others.

Job Requirements

- B.S, MS or PhD in Computer Science, Machine Learning or a related field.

- Minimum of 5-8+ years of relevant experience in machine learning research or ML engineering in an academic (e.g., PhD) or industry research lab.

- Well-versed in machine learning frameworks such as PyTorch or JAX. 

 - Experience with developing distributed training tools such as FSDP, DeepSpeed, or Megatron-LM.

- Excellent communication skills, both written and verbal, with a strong track record of presentations and publications.

- Ability to communicate and collaborate successfully with biologists and software/infrastructure engineers.

- Motivated to work in a fast-paced, ambitious, multi-disciplinary, and highly collaborative research environment.


The base salary range for this position is $163,950 to $234,550. These amounts reflect the range of base salary that the Institute reasonably would expect to pay a new hire or internal candidate for this position. The actual base compensation paid to any individual for this position may vary depending on factors such as experience, market conditions, education/training, skill level, and whether the compensation is internally equitable, and does not include bonuses, commissions, differential pay, other forms of compensation, or benefits. This position is also eligible to receive an annual discretionary bonus, with the amount dependent on individual and institute performance factors.