Role
We're seeking a Senior Data Engineer to join our team. In this role, you will play a pivotal part in shaping our product and business operations, contributing to our core culture, and expanding our team. Your primary responsibilities will involve addressing significant challenges across the following areas:
Data Pipeline Development
We need you to contribute to some of the following:
- Making our existing schemata and surrounding pipelines more homogenous and DRY
- Improve the quality of existing visibility and instrumentation to waste less developer time
- Improve the quality and comprehensiveness of CI/CD to require less manual intervention
- Reduce excess flexibility, verbosity and configuration throughout, for maintainability
- Generally be on the lookout for ways to reduce complexity and improve reusability
Data Science Infrastructure Growth
We want your help to:
- Improve execution environments to handle bulky models with large resource needs, especially pre-trained models with large inference-time demands
- Improve oversize models to reduce their resource demands, especially training demands
- Develop conventions, software, and tooling, to make EDA and training less tedious
- Generally improve the developer experience to reduce friction, confusion, and ambiguity
Your Qualifications
As an engineer:
- You have 3+ years of data engineering experience, including 1+ year specifically working with clinical and/or claims data
- You developed software in production with at least one modern language (pref. Python)
- You’re totally ready to pick up new languages and frameworks, maybe several
- You’re familiar enough with Linux to develop with Docker images and EC2 instances
As a person:
- You own tasks and take responsibility for them from start to finish.
- When confronted with ambiguity, you react with entrepreneurial spirit and an open mind to hypothesize possible solutions, experiment, pivot, and iterate.
- You are pragmatic and choose the least complex solution for the task at hand.
- You are open to advice and to giving advice, focusing on positive team-wide outcomes.
Our Tech Stack
We have a very diverse tech stack, and open minds for new tools and approaches. What we have now consists of the following (plus many other parts not directly relevant to this position):
- Databases (from most to least use): Snowflake, Postgres, Elasticsearch, DynamoDB; plus SQLite for niche use cases. CDC is predictably a big component.
- Cloud Infrastructure: AWS CDK (Cloud Development Kit) extensively used, integrated with AWS services like ECS, Lambda, Step Functions, and State Machines.
- Languages: Data Science and Data Engineering are done almost exclusively in Python; other apps are in Typescript, Javascript and Rust so some familiarity would be helpful.
- Libraries: Pandas and Polars throughout our existing stack; DBT and Snowflake for our newer stack. Airflow for orchestration.
Compensation
- $170K to $200K base + equity
Benefits
- Medical/dental/vision benefits
- 401k
- Free One Medical membership
- Parental leave
- Remote first
- Minimal bureaucracy
- Incredible teammates!
from Remotive Remote Jobs RSS Feed https://ift.tt/touIEcd
via IFTTT