Senior Data Engineer

Engineering · Full-time · Lima Metropolitan Area, Peru

Job description

We're opening eyes, hearts and minds to the impact that a pharmacy team can have in changing lives. As part of Catalyst Health Group, Stellus Rx improves ease and outcomes in every moment that matters, along every health journey.

Join our group of talented, committed team members-pharmacists, pharmacy care coordinators, technologists, product strategists and more-to create and expand the delivery of personalized health support that people didn't even know could be possible.

The Senior Data Engineer will help our communities thrive by transforming data into a format that can be easily analyzed in our Cloud Analytics Data Platform. We are a culture that is unabashedly driven by purpose. We are making a difference to our patients and providers while growing at an accelerated rate.

Accountabilities:

  • Develop, construct, and maintain large-scale data processing systems that collect data from a variety of structured and unstructured data sources.

  • Stores data in a scale-out data lake.

  • Prepares data using ELT (Extract, Load, Transform) techniques in preparation for data visualization, exploration and analytic modeling.

Role and Responsibilities:

  • Develop a strong understanding of company domains, strategic direction, and user needs.

  • Set up data pipelines from a multitude of systems into our cloud platform.

  • Collaborate on requirements and determine appropriate: dimensional modeling, ETL pipelines, large-scale distributed ETL pipelines, migration of data across multiple data repositories, and enable large-scale machine learning.

  • Create and maintain analytics pipelines that generate data + insight to power business decision-making.

  • Evaluate, compare, and improve different approaches, including innovation of design patterns, data lifecycle design, data ontology alignment, annotated datasets, and elastic search approaches.

  • Prepare data for the data scientist exploration and discovery process.

  • Data wrangling/munging for downstream analytics.

  • Assemble large, complex data sets that meet functional / non-functional business requirements.

  • Identify, design, and implement internal process improvements: automate manual processes, optimize data delivery, re-design infrastructure for greater scalability, etc.

  • Build the infrastructure required for optimal extraction, transformation, and loading of data from various sources.

  • Work with data and analytics experts to strive for greater functionality in our data systems.

Qualifications and Requirements:

  • Advanced working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of databases.

  • Experience building and optimizing data pipelines, architectures, and data sets.

  • Experience performing root cause analysis on internal and external data and processes to answer specific business questions and identify opportunities for improvement.

  • Strong analytic skills related to working with unstructured datasets.

  • A successful history of manipulating, processing, and extracting value from large, disconnected datasets.

  • Working knowledge of message queuing, stream processing, and highly scalable data stores.

  • Experience supporting and working with cross-functional teams in a dynamic environment.

  • 4+ years of experience in a Data Engineer role

  • A Graduate degree in Computer Science, Statistics, Informatics, Information Systems, or another quantitative field.

  • Experience using the following software/tools:

    • Experience with big data tools: Hadoop, Spark, Kafka, etc.
    • Experience with relational SQL and NoSQL databases, including Postgres and Cassandra.
    • Experience with data pipeline and workflow management tools: Azkaban, Luigi, Airflow, etc.
    • Experience with AWS cloud services: EC2, EMR, RDS, Redshift
    • Experience with stream-processing systems: Storm, Spark-Streaming, etc.
    • Experience with object-oriented/object function scripting languages: Python, Java, C++, Scala, etc.
  • High level of English proficiency