Data Engineer – Applied AI
Company | Flagship Pioneering |
---|---|
Location | Cambridge, MA, USA |
Salary | $Not Provided – $Not Provided |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Mid Level, Senior |
Requirements
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
- 3+ years of experience building and maintaining production-grade data pipelines or distributed systems.
- Strong Python skills with a solid grasp of object-oriented programming principles and common data engineering libraries/frameworks.
- Fluency in relational database usage (e.g., PostgreSQL) for schema design, query optimization, and data governance.
- Hands-on experience with AWS cloud services for data ingestion, storage, and processing; comfortable designing and deploying infrastructure-as-code solutions.
- Demonstrated ability to implement and manage distributed data-processing systems (e.g., Spark, Kafka, or similar).
- Exceptional communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders.
Responsibilities
- Design and implement robust, scalable data pipelines to support machine learning and generative AI workflows, including Retrieval-Augmented Generation (RAG).
- Architect and manage distributed data-processing systems that handle large volumes of structured and unstructured data in real time.
- Leverage AWS services (e.g., S3, EC2, Lambda, and others) to build highly available, fault-tolerant data solutions; utilize Kubernetes for container orchestration and scalability.
- Work cross-functionally with scientists, engineers, and product managers to define platform requirements, integrate new data sources, and ensure seamless data flow into AI/ML pipelines.
- Establish best practices for data security, compliance, and quality assurance, ensuring the reliability and integrity of all datasets used in production.
- Monitor and optimize data workflows for throughput, fault-tolerance, and cost efficiency; implement robust logging, monitoring, and alerting for production readiness.
Preferred Qualifications
- Prior work on data pipelines specifically supporting ML or generative AI models; familiarity with the MLOps lifecycle.
- Hands-on experience with RAG techniques and knowledgebases for AI systems.
- Comfort with container orchestration and scaling using Kubernetes.
- Exposure to or experience building agent-driven platforms where AI systems autonomously execute complex tasks.
- Experience adapting quickly and delivering results in a fast-paced, evolving environment.
- Exposure to life sciences, material sciences, or related fields.