
IT Operations Engineer – HPC Engineering
Where you fit in
High Performance Computing (HPC) in Shell is used extensively in the Upstream part of the business (seismic processing, reservoir modeling, and R&D). Shell is actively exploring for hydrocarbons across the globe. The HPC & TI team brings business value with innovative, agile, reliable and secure operational HPC and Technical Infrastructure services.
What’s the role
IT Operations Engineer in HPC Engineering focuses on administrating the hardware and systems software used for High Performance Computing. This includes keeping existing hardware and middleware ecosystem running reliably and securely either, directly or by working with other teams, testing, and installing new hardware and system software, decommissioning old hardware and software. In this role, you will work with colleagues and users across Shell globally.
Aside from above, you are expected for the following:
- Provide support for HPC systems, storage, networks, tools and applications based on the assigned duties
- Support and maintain the simulation software ecosystem and tooling including middleware, systems software and operating systems; Administration of High-Performance computing software and middleware ecosystem
- Compliance with information risk management policies and change control policies and procedures in a distributed computing environment
- Monitoring and tuning of HPC environments focusing on HPC Scheduling, Queueing configurations, Workflow optimization etc.
- Producing and maintaining technical documentation for all the HPC components including portals, Dashboards
- Responsible for HPC service health, SLA metrics monitored through ServiceNow
- Provide application support and troubleshooting for HPC regional compute for Shell globally
- Work as a team player with HPC technical SMEs maintaining Shells HPC ecosystem and sharing and maintaining knowledge with other staff to ensure business continuity
What we need from you
- Bachelor’s degree or equivalent and have around 5-9 years of experience in IT with relevant experience in High Performance Computing industry
- At least 4 years of experience in a large Linux HPC environment with hands-on expertise in supporting the HPC middleware environment for scientific research.
- Significant HPC middleware experience and knowledge of HPC systems, network, and storage; HPC applications Administration, configuration, patching and troubleshooting
- Strong technical skills in the following HPC areas: Programming and scripting skills: shell scripting, Perl, Python, SQL etc.
- Administration and configuration of High-Performance Computing schedulers like SLURM, IBM LSF etc; Coding and automation of middleware tasks in a globally spread HPC ecosystem
- Addressing customer requirements related to simulation processes, workflow, and accounting in HPC with good understanding of the cost and chargeback model
- Configuration and Maintenance of HPC portals, Monitoring Dashboards, HPC Accounting etc.; Ability to understand and deploy monitoring for the HPC components
- Information Risk Management compliance checking in a Linux environment
- Good relationship skills; work well with multiple stakeholders across the organization
- Experience in contributing to Agile based projects and activities