Skip to content

Data Curation Engineer
Company | Crexi |
---|
Location | Culver City, CA, USA |
---|
Salary | $101000 – $136000 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Mid Level, Senior |
---|
Requirements
- 3+ years proven experience in data management, data curation, or a related role.
- BS degree in Computer Science or relevant work experience
- Experience building machine learning labeling systems
- Working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of SQL/NoSQL databases and data stores like MSSQL, Dynamodb.
- Proficient in Python
- Experience with Data Engineering best practices including Test Automation, Quality Controls, Reconciliation, Documenting data flows, etc.
- Have thorough knowledge of data structures and algorithms
- Have experience working with customers to understand and capture requirements
- Experience with data stores like Snowflake, Postgres, Feast
- Experience with LabelStudio
- Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores using Hadoop, Spark, Kafka, etc.
- Experience labeling video or speech content
- Experience with cloud service providers, including AWS, Azure, or Google
- Knowledge of data service and deployment frameworks, including Docker, and Kubernetes
- Experience with data compliance and governance tools like OneTrust or DataGrail
- Experience with the Kafka
Responsibilities
- Work with the data curation lead and data science team to create labeling experiments in LabelStudio on a variety of different datasets, including image, web, text, and pdf modalities.
- Creates and maintains datasets that are used in these labeling experiments.
- Builds and optimizes ‘big data’ data pipelines, architectures and data sets using AWS cloud services and SnowFlake.
- Works with data engineers and the data science team on engineering systems/designs/architecture.
- Builds processes supporting data transformation, data structures, metadata, dependency and workload management using DBT and Airflow or similar tools.
- Creates a strong test suite, alerting, monitoring and documentation for all the pipelines.
Preferred Qualifications
- Experience labeling video or speech content
- Experience with data compliance and governance tools like OneTrust or DataGrail