Senior Software Engineer - Polaris & Data Lake Catalog

Senior Software Engineer – Polaris & Data Lake Catalog

8+ years of experience designing and building scalable, distributed systems.
Strong programming skills in Java, Scala, or C++ with an emphasis on performance and reliability.
Deep understanding of distributed transaction processing, concurrency control, and high-performance query engines.
Experience with open-source data lake formats (e.g., Apache Iceberg, Parquet, Delta) and the challenges associated with multi-engine interoperability.
Experience building cloud-native services and working with public cloud providers like AWS, Azure, or GCP.
A passion for open-source software and community engagement, particularly in the data ecosystem.
Familiarity with data governance, security, and access control models in distributed data systems.

Design and implement scalable, distributed systems to enable support for Iceberg DML/DDL transactions, schema evolution, partitioning, time travel, and more.
Architect and build systems that integrate Snowflake queries with external Iceberg catalogs (e.g., AWS Glue, Databricks Unity) and various data lake architectures, enabling seamless interoperability across cloud providers.
Develop high-performance, low-latency solutions for catalog federation, allowing customers to manage and query their data lake assets across multiple catalogs from a single interface.
Collaborate with Snowflake’s open-source team and the Apache Iceberg community to contribute new features and enhance the Iceberg REST specification.
Work on core data access control and governance features for Polaris, including fine-grained permissions such as row-level security, column masking, and multi-cloud federated access control.
Contribute to our managed Polaris service, ensuring that external query engines like Spark and Trino can read from and write to Iceberg tables through Polaris in a way that’s decoupled from Snowflake’s core data platform.
Build tooling and services that automate data lake table maintenance, including compaction, clustering, and data retention for enhanced query performance and efficiency.

Contributing to open-source projects, especially in the data infrastructure space.
Designing or implementing REST APIs, particularly in the context of distributed systems.
Managing large-scale data lakes or data catalogs in production environments.
Working on highly-performant and scalable query engines such as Spark, Flink, or Trino.