Skip to content

Compute Operations Lead
Company | Leidos |
---|
Location | Washington, DC, USA |
---|
Salary | $104650 – $189175 |
---|
Type | Full-Time |
---|
Degrees | Bachelor’s |
---|
Experience Level | Senior, Expert or higher |
---|
Requirements
- Bachelor Degree with 8+ years of prior relevant experience. additional years of experience will be considered in lieu of a degree
- 2+ years of formal or informal leadership experience including project management. Experience supervising and mentoring System Administrators preferred
- Foundation of knowledge of Windows, Red Hat and Storage Platforms
- Experience building new servers (Physical and Virtual)
- Experience troubleshooting issues in a growing, fast-paced environment
- Experience with log reviews, incident analysis, and identification of issue trends
- Knowledge of ITSM systems (ServiceNow)
- Experience with server patch management methodologies
- Time management skills
- Strong oral and written communication skills
- Track record of working effectively within a team, and support to peers toward improved processes and results
- Candidate must, at a minimum, be able to meet IAT Level II certification requirements (currently Security+ CE, CCNA-Security, GSEC, or SSCP)
- Experience supporting Windows 2019 and later
Responsibilities
- Lead and manage daily operations for a 24/7/365 server and storage enterprise infrastructure on-premises and in the cloud; including incident response and maintenance
- Lead and manage compute team members and contractors.
- Maintain positive, constructive, and professional communication with customer, which includes considering and executing their requests in the name of customer service
- Effectively manage ServiceNow queues for the compute team ensuring that all tickets are assigned in a timely manner and all SLA requirements for ticket handling are followed.
- Ensure comprehensive documentation is created and maintained for infrastructure, processes, and system configurations.
- Ensure that all compute related vulnerabilities are mitigated within timeframes established by SLAs and that the environment is configured to comply with FTC standards and regulations (e.g. CIS 3.0)
- Maintain all operating systems and databases to version N or N-1 and ensure that future resource needs related to capacity or EOS/EOL status are communicated to the customer.
- Address any technical issues or escalations, ensuring that any critical incidents related to compute resources are resolved quickly and efficiently.
- Proactively collaborate with other teams (Network, Security) and maintain an open line of communication.
- Work with customers to define project requirements, deliverables, and timelines, and ensure the team stays aligned with these objectives. Ensure project schedules are maintained and up to date.
- Track individual and team performance, provide constructive feedback, and general system administration supervision and mentoring.
- Effectively manage and project compute capacity and provide recommendations for management of both compute and storage resources.
- Manage team workload and resources (personnel and hardware/software) to ensure operational support requirements are met and project timelines are adhered to.
- Coordinate with senior leadership, customers, and stakeholders to collect data, conduct analysis, develop, and implement solutions associated with incident tickets and requirements.
- Maintain consistent backups for servers and storage and participate in COOP/DR exercises as needed or required by the agency.
- Develop solutions to complex technical issues.
- Provide follow-up reports (technical findings, feedback, resolution steps taken) for root cause analysis, engineering technical assessment and process improvement initiatives.
Preferred Qualifications
- Experience with Ivanti Patch Management
- Experience with NetApp
- Experience with RHEL 8/9
- Experience with Windows 2019 and later
- Experience with Azure GovCloud