Data Engineer

Bachelor’s degree in Computer Science, Data Science, Bioinformatics, or a related field (or equivalent practical experience)
Proven experience as a data engineer or in a similar data-intensive role, preferably supporting analytics or research teams
Strong proficiency in Python and SQL for data manipulation and scripting
Hands-on experience building ETL processes or data pipelines to handle large datasets
Familiarity with big data processing (e.g. using Spark/PySpark) for scalable data transformations required
Solid understanding of data modeling and database concepts
Ability to work with complex, multi-modal datasets (structured and unstructured) and optimize data workflows for performance
Knowledge of software engineering and data engineering best practices – version control (Git), code review, testing, and documentation
Experience ensuring data quality and using data lineage or provenance tracking to audit data flow
Excellent problem-solving skills and the ability to communicate effectively with both technical and non-technical stakeholders
Interest in biomedical science and healthcare data
Ability to quickly learn domain-specific concepts and handle sensitive research data in compliance with regulatory or privacy requirements.

Design, build, and maintain data pipelines in Palantir Foundry to ingest, transform, and integrate diverse biomedical data sources (e.g. clinical, genomic, experimental data) for analysis
Develop transformations and workflows using Foundry’s tools (Pipeline Builder, Code Workbooks, etc.) to prepare high-quality data for researchers
Define and manage the Foundry Ontology and object models to represent biomedical entities and relationships
Work closely with data scientists, bioinformaticians, and research teams to gather requirements and deliver data solutions
Implement data validation checks and follow best practices for data governance
Create or support interactive Foundry dashboarding solutions for researchers to visualize and explore data

Hands-on experience with Palantir Foundry is a strong plus
Familiarity with Foundry components such as Ontology, Code Workbooks, Functions, Foundry Pipelines (Pipeline Builder), Foundry Dashboarding, or Object Builder will be advantageous
Knowledge of Palantir’s developer tools and APIs
Experience using Foundry’s Python SDKs/libraries (e.g. foundry-dev-tools, Foundry Transforms API, titanium-sdk) to develop pipeline code or automate tasks
Experience with PySpark in Foundry or similar big data platforms for data transformations
Deep understanding of Foundry’s Data Lineage capabilities and how to utilize them for impact analysis and audit trails
Experience integrating Foundry with external systems via REST APIs or building custom applications that connect to Foundry data
Experience linking Foundry outputs to public-facing web pages or external dashboards is a plus
Previous experience in biomedical research, healthcare analytics, or pharmaceutical R&D projects
Familiarity with biomedical data standards or datasets (e.g. clinical trial data, clinical imaging, transcriptomics/genomic data, HL7/FHIR or CDISC standards) and an understanding of the scientific research process will help you excel in this role.