Posted in

Data Engineer – Applied AI

Data Engineer – Applied AI

CompanyFlagship Pioneering
LocationCambridge, MA, USA
Salary$Not Provided – $Not Provided
TypeFull-Time
DegreesBachelor’s, Master’s
Experience LevelMid Level, Senior

Requirements

  • Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
  • 3+ years of experience building and maintaining production-grade data pipelines or distributed systems.
  • Strong Python skills with a solid grasp of object-oriented programming principles and common data engineering libraries/frameworks.
  • Fluency in relational database usage (e.g., PostgreSQL) for schema design, query optimization, and data governance.
  • Hands-on experience with AWS cloud services for data ingestion, storage, and processing; comfortable designing and deploying infrastructure-as-code solutions.
  • Demonstrated ability to implement and manage distributed data-processing systems (e.g., Spark, Kafka, or similar).
  • Exceptional communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders.

Responsibilities

  • Design and implement robust, scalable data pipelines to support machine learning and generative AI workflows, including Retrieval-Augmented Generation (RAG).
  • Architect and manage distributed data-processing systems that handle large volumes of structured and unstructured data in real time.
  • Leverage AWS services (e.g., S3, EC2, Lambda, and others) to build highly available, fault-tolerant data solutions; utilize Kubernetes for container orchestration and scalability.
  • Work cross-functionally with scientists, engineers, and product managers to define platform requirements, integrate new data sources, and ensure seamless data flow into AI/ML pipelines.
  • Establish best practices for data security, compliance, and quality assurance, ensuring the reliability and integrity of all datasets used in production.
  • Monitor and optimize data workflows for throughput, fault-tolerance, and cost efficiency; implement robust logging, monitoring, and alerting for production readiness.

Preferred Qualifications

  • Prior work on data pipelines specifically supporting ML or generative AI models; familiarity with the MLOps lifecycle.
  • Hands-on experience with RAG techniques and knowledgebases for AI systems.
  • Comfort with container orchestration and scaling using Kubernetes.
  • Exposure to or experience building agent-driven platforms where AI systems autonomously execute complex tasks.
  • Experience adapting quickly and delivering results in a fast-paced, evolving environment.
  • Exposure to life sciences, material sciences, or related fields.