For organizations with complex, large-scale computational requirements, Oakwood’s Azure HPC MAX PoC delivers a highly scalable and sophisticated environment on Azure. Transitioning from on-premises infrastructure to Azure allows clients to overcome hardware limitations and gain access to advanced networking options like Infiniband and ExpressRoute, as well as high-throughput storage solutions like Weka.io or Hammerspace.
This setup, combined with Slurm’s advanced job scheduling, provides the computational power necessary for high-priority tasks, including simulations and AI/ML applications. Azure’s on-demand resources and built-in security enable organizations to handle peak workloads efficiently, improve cost-effectiveness, and focus on strategic HPC outcomes rather than infrastructure maintenance.
Delivery Approach:
- Assessment: Conduct a thorough review of current HPC requirements to align Azure resources with target performance.
- Design: Architect a large-scale environment with specialized storage, compute, and networking configurations.
- Deployment: Establish CycleCloud and Slurm with advanced configurations to accommodate large workloads, while integrating tools such as Infiniband, Weka.io, and Hammerspace for high throughput.
- Testing: Execute extensive performance testing across complex workloads.
- Review and Report: Provide an in-depth report on performance metrics, cost analysis, and production recommendations.
Deliverables:
- Comprehensive CycleCloud and Slurm setup tailored for high-performance workloads.
- Deployment of Weka.io or Hammerspace for advanced storage.
- High-speed connectivity through Infiniband or ExpressRoute.
- Performance testing of large-scale HPC tasks, such as simulations and AI/ML applications.
Notes:
- In addition to the 12 weeks, the Oakwood Team will devote an additional 2 weeks to work with the organization/user(s) to further optimize and manage the environment to ensure optimal performance.
- The organization/user(s) will be limited by the number of jobs that can be ran during this PoC.
- At this point a discussion will be had around moving forward with a full production environment or decommissioning the infrastructure that was stood up during the PoC.
Is Oakwood’s Azure HPC MAX PoC right for you?
- Targeted Use: High-complexity, large-scale workloads, including AI/ML applications and simulations.
- Complexity & Level of Effort: High complexity and intensive setup, featuring advanced Slurm job scheduling, high-speed networking with Infiniband or ExpressRoute, and high-throughput storage solutions like Weka.io or Hammerspace.
- Scalability: Designed for maximum scalability, this PoC can handle peak computational demands and complex workloads efficiently.
Recommendation Guide:
- If you need: A robust, highly scalable HPC environment capable of handling high-priority tasks and maximizing compute resources for large-scale applications.
- Consider HPC PRO or CORE if: Your workload doesn’t require ultra-low latency networking or high-throughput storage and can be managed within a moderate or lightweight setup.