HPC Systems Site Lead

Job#: 2027037

Job Description:

Own your opportunity to work alongside federal civilian agencies. Make an impact by providing services that help the government ensure the well being of U.S. citizens.
JOB DESCRIPTION

Our work depends on a HPC Systems Site Lead joining our team to support the National Oceanic and Atmospheric Administration (NOAA), Weather and Climate Operational Supercomputer System (WCOSS).  This position is on-site at a datacenter in the Manassas area.

 

WCOSS provides NOAA the operational High Performance Computing (HPC) resources essential to process sophisticated numerical models used to predict and understand atmospheric and oceanic phenomena for weather and climate operational use.  Operating 24/7, the next 10-year WCOSS program will deliver significant computational capability that will evolve over time to keep pace with NOAA’s growing environmental modeling needs.

 

We are looking for individuals to deploy, operate and support leading-edge technology for WCOSS.  Specific technology training will be provided. CANDIDATES MUST HAVE AN ACTIVE PUBLIC TRUST CLEARANCE OR ABOVE TO BE CONSIDERED.

We think. We act. We deliver. There is no challenge we can’t turn into opportunity.

 

In this role, a typical day will include:

  • Applying current HPC systems administrative skills; desire to learn and deploy new technologies.
  • Developing and deploying monitoring capabilities.
  • Developing and implementing tools for cluster administration.
  • Providing technical support with team of HPC System & Storage Administrators to resolve operational issues.
  • Providing off-hour on-call support on a rotating basis.
  • Managing, planning, and reporting for on-site vendor/subcontractor activities.
  • Working on site at a Manassas data center
  • Managing on-site office and access for vendors and subcontractors
  • Contributing to planning for software and hardware upgrades along with future installations

 

REQUIRED QUALIFICATIONS

  • Bachelor’s degree or equivalent and 10+ years of experience with HPC systems operations.
  • Experience working in a 24X7 operational environment.

 

DESIRED QUALIFICATIONS

  • Demonstrated experience to deploying and managing large-scale HPC systems using OS provisioning tools (e.g., xCat, HPCM).
  • Demonstrated experience using configuration management tools (e.g., Ansible, Puppet).
  • Linux system administration experience (e.g., SLES, RedHat or CentOS).
  • Batch management/scheduling experience, PBSpro preferred.
  • Parallel filesystem configuration and monitoring experience (e.g., Lustre, NFS).
  • Network interconnect configuration and monitoring experience (e.g., Infiniband, Ethernet).
  • Programming or scripting in at least two languages (e.g., Bash, Perl, Python, C).
  • Strong writing skills for technical documents, system procedures, user wiki’s and FAQs.
  • Ability to work both independently and as part of a team.
  • Knowledge/experience with managing subcontractors or vendors under Service Level Agreements (SLAs) 
  • Knowledge of computer system power and cooling
  • Experience managing, maintaining and repairing HPC hardware

 

 

 

 

 

 

 

 

EEO Employer

Apex Systems is an equal opportunity employer. We do not discriminate or allow discrimination on the basis of race, color, religion, creed, sex (including pregnancy, childbirth, breastfeeding, or related medical conditions), age, sexual orientation, gender identity, national origin, ancestry, citizenship, genetic information, registered domestic partner status, marital status, disability, status as a crime victim, protected veteran status, political affiliation, union membership, or any other characteristic protected by law. Apex will consider qualified applicants with criminal histories in a manner consistent with the requirements of applicable law. If you have visited our website in search of information on employment opportunities or to apply for a position, and you require an accommodation in using our website for a search or application, please contact our Employee Services Department at [email protected] or 844-463-6178.

Apex Systems is a world-class IT services company that serves thousands of clients across the globe. When you join Apex, you become part of a team that values innovation, collaboration, and continuous learning. We offer quality career resources, training, certifications, development opportunities, and a comprehensive benefits package. Our commitment to excellence is reflected in many awards, including ClearlyRated's Best of Staffing® in Talent Satisfaction in the United States and Great Place to Work® in the United Kingdom and Mexico.

Employee Type:
Contract

Location:
Manassas, VA, US

Job Type:

Date Posted:
April 30, 2024