Senior Data Engineer

Role

We're seeking a Senior Data Engineer to join our team. In this role, you will play a pivotal part in shaping our product and business operations, contributing to our core culture, and expanding our team. Your primary responsibilities will involve addressing significant challenges across the following areas:

Data Pipeline Development

We need you to contribute to some of the following:

Making our existing schemata and surrounding pipelines more homogenous and DRY
Improve the quality of existing visibility and instrumentation to waste less developer time
Improve the quality and comprehensiveness of CI/CD to require less manual intervention
Reduce excess flexibility, verbosity and configuration throughout, for maintainability
Generally be on the lookout for ways to reduce complexity and improve reusability

Data Science Infrastructure Growth

We want your help to:

Improve execution environments to handle bulky models with large resource needs, especially pre-trained models with large inference-time demands
Improve oversize models to reduce their resource demands, especially training demands
Develop conventions, software, and tooling, to make EDA and training less tedious
Generally improve the developer experience to reduce friction, confusion, and ambiguity

Your Qualifications

As an engineer:

You have 3+ years of data engineering experience, including 1+ year specifically working with clinical and/or claims data
You developed software in production with at least one modern language (pref. Python)
You’re totally ready to pick up new languages and frameworks, maybe several
You’re familiar enough with Linux to develop with Docker images and EC2 instances

As a person:

You own tasks and take responsibility for them from start to finish.
When confronted with ambiguity, you react with entrepreneurial spirit and an open mind to hypothesize possible solutions, experiment, pivot, and iterate.
You are pragmatic and choose the least complex solution for the task at hand.
You are open to advice and to giving advice, focusing on positive team-wide outcomes.

Our Tech Stack

We have a very diverse tech stack, and open minds for new tools and approaches. What we have now consists of the following (plus many other parts not directly relevant to this position):

Databases (from most to least use): Snowflake, Postgres, Elasticsearch, DynamoDB; plus SQLite for niche use cases. CDC is predictably a big component.
Cloud Infrastructure: AWS CDK (Cloud Development Kit) extensively used, integrated with AWS services like ECS, Lambda, Step Functions, and State Machines.
Languages: Data Science and Data Engineering are done almost exclusively in Python; other apps are in Typescript, Javascript and Rust so some familiarity would be helpful.
Libraries: Pandas and Polars throughout our existing stack; DBT and Snowflake for our newer stack. Airflow for orchestration.

Compensation

$170K to $200K base + equity

Benefits

Medical/dental/vision benefits
401k
Free One Medical membership
Parental leave
Remote first
Minimal bureaucracy
Incredible teammates!

from Remotive Remote Jobs RSS Feed https://ift.tt/touIEcd
via IFTTT

Senior Data Engineer

Post a Comment

Contact Form