Head Of Data Engineering (clinical)

Engineering · Full-time · CA, United States of America · Remote possible

Job description

Deep 6 AI is a fast-growing tech startup headquartered in Los Angeles, California looking for talented, dynamic team members who want to help shape our groundbreaking artificial intelligence platform.      We are transforming and accelerating clinical trials, to help get life-saving treatments to patients faster and accelerate innovation in healthcare. To that end, we build a cutting-edge software suite that connects all clinical research stakeholders, from research teams to treating physicians, patients, and study sponsors on a real-time, real-world data SaaS platform, powered by AI.

At Deep 6 AI, we are on a mission to revolutionize healthcare’s clinical trials process through innovative AI and ML solutions.  Our software mines real-time clinical data to precision-match patients to clinical trials utilizing cutting-edge AI and ML techniques. We are looking for a Head of Data Engineering to spearhead the continued development and application of our data engineering platform. This is both a visionary and hands-on role. 

This role presents an opportunity for an exceptional hands-on data engineering leader to build a team of data and platform engineers, while overseeing the use of AI and ML to drive strategic initiatives.

As the Head of Data Engineering, you will work closely with stakeholders to continuously innovate our platform, ensuring that it stays ahead of the curve in supporting the mission of the organization. You will be tasked with scaling the data ingestion and comprehension of one of the largest and densest sets of rich, unstructured clinical data.

What You'll Do

  • Lead the continued development and enhancement of our clinical data ingestion and comprehension pipeline.
  • Drive the utilization of core principles in development, including observability, scalability, and end-to-end control.
  • Support the utilization of advanced AI/ML research through the exposure of raw, canonicalized, and comprehended data in analytics platforms.
  • Establish and maintain key performance metrics to track the effectiveness of data engineering initiatives.
  • Foster collaboration with internal and external stakeholders to gather feedback and drive continuous improvement.
  • Enhance Deep 6 AI's reputation as a leader in AI-driven clinical trials acceleration through thought leadership and industry recognition.

About You

  • Strong player-coach mentality, with an ability to balance the hands-on needs of leading two groups: data platform (data engineering) and search/enrichment (ML engineering).
  • Proven record of accomplishment, leading successful large-scale clinical data initiatives within a product-focused environment.
  • Strong background in applied data engineering and data pipelines, with hands-on experience developing and deploying production-ready models.
  • Understanding of Software Development Life Cycle and data product development
  • Experience working with healthcare data, especially HL7 and FHIR.
  • Deep understanding of streaming data ingestion and ETL processes
  • Conceptual understanding of ML techniques, particularly NLP (e.g., NER, BERT).
  • Conceptual understanding of AI techniques like large language models (LLMs), self-learning models (SLMs), and other state-of-the-art approaches.
  • Experience with database technologies, especially Elasticsearch, PostgreSQL, Amazon Aurora, and DynamoDB.
  • Demonstrated passion for staying up to date with the latest data engineering and pipeline trends, along with a record of accomplishment of driving innovation.

Preferred Qualifications

  • Cloud Services: Experience with cloud-based data processing and storage services (AWS).
  • Infrastructure as Code: Proficiency with infrastructure as code tools (CDK).

Technologies We Use: While specific expertise in our tech stack is beneficial, we value adaptability and a willingness to learn. Our current stack includes:

  • AWS Cloud Services (e.g., EC2, ECS, RDS, Aurora, DynamoDB, Lambda)
  • Java (Kotlin), Python, TypeScript
  • Kubernetes, Docker
  • FHIR Servers (e.g., HAPI, Health Samurai AidBox)
  • Elasticsearch and Elastic Cloud
  • CI/CD: GitHub Actions
  • Monitoring: OpenTelemetry, AWS X-Ray, AWS Cloudwatch, Datadog, Pendo