Posted in

Data Curation Engineer

Data Curation Engineer

CompanyCrexi
LocationCulver City, CA, USA
Salary$101000 – $136000
TypeFull-Time
DegreesBachelor’s
Experience LevelMid Level, Senior

Requirements

  • 3+ years proven experience in data management, data curation, or a related role.
  • BS degree in Computer Science or relevant work experience
  • Experience building machine learning labeling systems
  • Working SQL knowledge and experience working with relational databases, query authoring (SQL) as well as working familiarity with a variety of SQL/NoSQL databases and data stores like MSSQL, Dynamodb.
  • Proficient in Python
  • Experience with Data Engineering best practices including Test Automation, Quality Controls, Reconciliation, Documenting data flows, etc.
  • Have thorough knowledge of data structures and algorithms
  • Have experience working with customers to understand and capture requirements
  • Experience with data stores like Snowflake, Postgres, Feast
  • Experience with LabelStudio
  • Working knowledge of message queuing, stream processing, and highly scalable ‘big data’ data stores using Hadoop, Spark, Kafka, etc.
  • Experience labeling video or speech content
  • Experience with cloud service providers, including AWS, Azure, or Google
  • Knowledge of data service and deployment frameworks, including Docker, and Kubernetes
  • Experience with data compliance and governance tools like OneTrust or DataGrail
  • Experience with the Kafka

Responsibilities

  • Work with the data curation lead and data science team to create labeling experiments in LabelStudio on a variety of different datasets, including image, web, text, and pdf modalities.
  • Creates and maintains datasets that are used in these labeling experiments.
  • Builds and optimizes ‘big data’ data pipelines, architectures and data sets using AWS cloud services and SnowFlake.
  • Works with data engineers and the data science team on engineering systems/designs/architecture.
  • Builds processes supporting data transformation, data structures, metadata, dependency and workload management using DBT and Airflow or similar tools.
  • Creates a strong test suite, alerting, monitoring and documentation for all the pipelines.

Preferred Qualifications

  • Experience labeling video or speech content
  • Experience with data compliance and governance tools like OneTrust or DataGrail