Data Engineer - Applied AI

Data Engineer – Applied AI

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related field.
3+ years of experience building and maintaining production-grade data pipelines or distributed systems.
Strong Python skills with a solid grasp of object-oriented programming principles and common data engineering libraries/frameworks.
Fluency in relational database usage (e.g., PostgreSQL) for schema design, query optimization, and data governance.
Hands-on experience with AWS cloud services for data ingestion, storage, and processing; comfortable designing and deploying infrastructure-as-code solutions.
Demonstrated ability to implement and manage distributed data-processing systems (e.g., Spark, Kafka, or similar).
Exceptional communication skills with the ability to explain complex technical concepts to both technical and non-technical stakeholders.

Design and implement robust, scalable data pipelines to support machine learning and generative AI workflows, including Retrieval-Augmented Generation (RAG).
Architect and manage distributed data-processing systems that handle large volumes of structured and unstructured data in real time.
Leverage AWS services (e.g., S3, EC2, Lambda, and others) to build highly available, fault-tolerant data solutions; utilize Kubernetes for container orchestration and scalability.
Work cross-functionally with scientists, engineers, and product managers to define platform requirements, integrate new data sources, and ensure seamless data flow into AI/ML pipelines.
Establish best practices for data security, compliance, and quality assurance, ensuring the reliability and integrity of all datasets used in production.
Monitor and optimize data workflows for throughput, fault-tolerance, and cost efficiency; implement robust logging, monitoring, and alerting for production readiness.

Prior work on data pipelines specifically supporting ML or generative AI models; familiarity with the MLOps lifecycle.
Hands-on experience with RAG techniques and knowledgebases for AI systems.
Comfort with container orchestration and scaling using Kubernetes.
Exposure to or experience building agent-driven platforms where AI systems autonomously execute complex tasks.
Experience adapting quickly and delivering results in a fast-paced, evolving environment.
Exposure to life sciences, material sciences, or related fields.