Senior Site Reliability Engineer – Infrastructure
Company | NVIDIA |
---|---|
Location | Austin, TX, USA, Santa Clara, CA, USA, Durham, NC, USA, Westford, MA, USA |
Salary | $148000 – $287500 |
Type | Full-Time |
Degrees | Bachelor’s, Master’s |
Experience Level | Senior |
Requirements
- Experience with automation workflows such as Ansible and Jenkins.
- UNIX Systems programming and automation using industry standard languages and familiar with API calls. Python experience preferred.
- Authoritative level usage of UNIX and UNIX CLI utilities such as sed, awk, grep.
- Hands on experience with architectural decisions in technologies (storage, networking, compute) our chip engineers depend on.
- Understanding of distributed UNIX system concepts such as NFS, autofs, DNS, LDAP and/or NIS.
- Excellent planning and communication skills and a passion for improving the productivity and efficiency of other specialists.
- Strong experience investigating and debugging complex, multi-discipline problems in a UNIX environment.
- 5+ years experience in a large, distributed UNIX environment.
- History of using data analysis principles and influencing data-driven decisions.
- MS (preferred) or BS in Computer Science, similar degree or equivalent experience.
Responsibilities
- Develop automation in order to scale infrastructure easily and reliably.
- Use broad IT infrastructure skills to implement infrastructure innovations which accelerate chip development.
- Design and implement network architecture, storage solutions, virtualization, and services specific to EDA workflows.
- Work closely with EDA teams to understand their requirements and translate them into infrastructure solutions.
- Work in a diverse team performing fast paced investigations to empower engineers to develop at the speed of light.
- Collaborate to improve how our chip development process utilizes our infrastructure.
- Directly contribute to the overall quality and improve time to market for our next generation chips.
Preferred Qualifications
- Extensive knowledge with job schedulers (in particular IBM Spectrum LSF and/or SLURM).
- Experience with perl.
- Deep understanding of distributed system principles.
- Experience with chip design workflows, such as front end verification, back end workflows, or mixed signal workflows.
- Experience in crafting solutions that balance security and productivity for the end user.