Skip to content

Head of Performance Intensive Computing Engineering

Remote
  • Reno, Nevada, United States

Job description

CIQ OVERVIEW

CIQ believes in helping people do great things by providing world-class software infrastructure for others to build value on top of. This includes working closely with open-source communities, securing the software infrastructure, and driving performance throughout the entire stack. We love our customers who are doing things like running a traditional IT infrastructure to building the future of genAI and leading major research and scientific initiatives like curing cancer.


We are looking for individuals who strive to work on teams empowered by ownership, diversity of thought, pushing the limits of what is possible and want to help others.

POSITION SUMMARY

As the Head of Performance Intensive Computer Engineering, you will be responsible for overseeing the development of our computing offerings including traditional HPC and next-generation computing infrastructures to support GenAI, ML, and compute and data-driven analytics. Additional responsibilities include but are not limited to:

  • Implementing the strategic vision and direction for our HPC., GenAI, ML computing platform.

  • Driving continuous improvements in business processes, and managing the implications of security and compliance guidelines.

  • Building and maintaining strong relationships with leaders, customers, and partners, and participating in technology initiatives to understand current and future architecture and infrastructure needs.

  • Responsible for the development of team members, leading by example and fostering an inclusive environment in support of our corporate values.

  • Administering department budget, creating, planning, monitoring, reconciling, and directing resources.

  • Leading the engineering for all performance-intensive computing initiatives at CIQ.

  • Coordinating groups and teams of engineers to create the next-generation global computing infrastructure.

Job requirements

NEEDED TO SUCCEED

Successful candidates will have team management and leadership experience as well as hands-on architecture design experience with GenAI, ML, and HPC use-cases, workflows, and infrastructure including storage, file system, InfiniBand, security, authentication, and compute architectures. Experience with compute job scheduling, training, learning, and inference. Understanding of computing algorithms and parallelization. Experience using Git to manage shared software configuration code bases. Hands-on experience with cloud-based services (e.g. Azure, AWS, GCP) as well as experience with Linux systems administration, optimization, and debugging. Proven experience with orchestration technologies such as Kubernetes and with container technologies such as Apptainer, Docker, and Podman. Experience with DevOps or DevSecOps methodologies, such as automation and configuration management. Experience configuring and using monitoring systems for cloud-native and HPC infrastructure. A good understanding of fundamental networking concepts and their practical applications. The ability to determine meaningful metrics and usage data for monthly status reports and health dashboards. Strong troubleshooting skills. A friendly, collaborative, humble, honest, and always striving to be better attitude.


EDUCATION AND EXPERIENCE

A minimum of 10 years in leadership roles, managing people, and reporting to and working with VP and C-level executives. At least 5 years combined experience in HPC, GenAI, ML, and other performance-intensive computing environments. A minimum of 5 years experience as an engineer or architect with HPC technologies. At least 3 years experience as an engineer or architect with AI/ML technologies.

or