AI & HPC Application Performance Engineer

Engineering · Full-time · Piedmont, Italy

Job description

Cornelis Networks is a technology leader delivering purpose-built, high-performance fabrics accelerating High Performance Computing, High Performance Data Analytics, and Artificial Intelligence workloads in the Cloud and in the Data Center.

The company’s products enable scientific, academic, governmental, and commercial customers to solve some of the world’s toughest challenges by efficiently focusing the computational power of many processing devices at scale on a single problem, simultaneously improving both result accuracy and time-to-solution for their most complex application workloads. Cornelis Networks delivers its end-to-end interconnect solutions worldwide through an established set of server OEM and channel partners.

We are seeking a highly skilled Senior Artificial Intelligence/Machine Learning & High-Performance Computing (HPC) Application Performance Engineer to join our team.

Key Responsibilities

  • Perform benchmarking and optimization of open source and industry-standard AI/ML & HPC applications with current and future HPC hardware.

  • Develop, execute, and maintain software required to run AI/ML & HPC applications and benchmarks.

  • Participate in the development of supporting libraries and middleware.

  • Assist sales and marketing teams by delivering proof points and performance benchmarking comparisons between Cornelis Omni-Path and competing interconnects.

  • Collect and analyze performance data, identifying performance limitations, and determining the best approach and techniques to improve performance.

  • Present research findings both within company and to external stakeholders.

  • Collaboration with cross-functional teams across all levels of a corporation to evangelize the capabilities and performance advantages of Cornelis products.

Minimum Qualifications

  • Bachelor’s Degree in Computer Science, Engineering, Math, or related technical discipline.

  • Ability to set up, run, and analyze AI/ML & HPC application benchmarks and demonstrate a proficient understanding in message passing, scaling optimization, and identifying performance bottlenecks.

  • 5+ Years' experience with:

    • Message Passing Interface (MPI) and compiling software with a variety of compilers (Intel, gcc, etc.) and libraries.
    • Python and shell script experience.
    • HPC network architectures such as Omni-Path, InfiniBand, or Ethernet.
    • Operating in a Linux computing environment.
  • Excellent written and verbal communication skills.

Preferred Qualifications

  • Master’s Degree in Computer Science, Engineering, Math, or related technical discipline.

  • Experience with MLPerf (https://mlcommons.org/en/) benchmarking deployment, policies, and best known methods.

  • Knowledge of HPC resource management and job scheduling systems (e.g., SLURM, PBS).

  • Hands-on experience with analyzing and optimizing networks to improve scale-out performance using a range of profiling tools.

  • Basic understanding of Linux system administration.

Location

For this position, Cornelis Networks fully supports remote employees who live within the United States and are able to travel to our corporate offices in Chesterbrook, PA periodically for in-person collaboration. 

Immigration Information

To qualify for this position, candidates must be located in the United States, legally authorized to work in the U.S. and must not need U.S. Visa sponsorship now or in the future.

Cornelis Networks is an equal opportunity employer, and all qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity or expression, pregnancy, age, national origin, disability status, genetic information, protected veteran status, or any other characteristic protected by law.

Cornelis Networks does not accept unsolicited resumes from headhunters, recruitment agencies, or fee-based recruitment services.