How Cost Efficient is HPC in the Cloud?

A Cost Model for In-House Versus In-Cloud High Performance Computing
An article by Wolfgang Gentzsch, The UberCloud, December 17, 2014 - Join the UberCloud
The benefits for small and medium size enterprises (SMEs) of using HPC technology within their design and development processes can be huge, such as: enormous cost savings e.g. by reducing product failure early during design, development, and production; more simulations lead to higher quality products; and more computing power enable shorter time to market. Potentially, all this can lead to increased competitiveness and more innovation.
However, today, less than 10% of manufacturers are using HPC servers for computer simulations to design and develop their products, according to the two studies, ‘Reflect’ and ‘Reveal’, from the US Council of Competitiveness. The vast majority (over 90%) of the companies perform virtual prototyping and large-scale data modeling still limited by their desktop computers (workstations or laptops). But 57% of these companies said that they have application problems that they can’t solve with their existing desktop computers, because their desktops are too slow for the problems they want to solve, or because geometry or physics are too complex and need more memory than is available from their desktop. Therefore, most of these companies have a real need for high performance computing.
There are two realistic options today how to acquire additional HPC computing power beyond what is available from the desktop system. One option (which is widely proven to work) is buying an HPC server which is many times faster and more capable than what engineers currently have available on their desk. However, for many companies, especially SMEs, buying an HPC server is often not a viable solution. One reason for this is the high Total Cost of Ownership, TCO, as demonstrated from IDC [1] in the following figure:
In addition, there are often long and painful internal procurement and approval processes, and additional skills and manpower are needed to operate and maintain such a system. Still, buying an HPC server is one possible option, and it was the only one until recently.
However, with the advent of cloud computing, a second viable option arose for SMEs to experience the benefits from HPC without having to buy and operate their own HPC system. HPC in the Cloud allows the SME engineers to continue using their own desktop system for daily design and development work, and to submit (burst) the (sometimes much) larger, more complex, more time-consuming jobs into the cloud. Additional benefits of the HPC Cloud solution are on-demand access to ‘infinite’ resources, pay per use, reduced capital expenditure (CAPEX), greater business agility, higher-quality results, lower risk, lower product failure rate, and dynamically scaling resources up and down as needed.
In the following we offer an analysis just considering the total cost of an HPC system. Certainly, in real-life scenarios, additional requirements should be considered: for example, you have to have access to application software licenses on a pay-per-use basis; your application and data should not be super-secret and you trust your service provider; and you are able to handle the data transfer, and a few other bottlenecks.
The real cost of an HPC system
AbaqusThe IDC study found (see the figure above), that only 7% of the total cost of acquiring and operating an HPC system over three years is coming from the hardware. The much bigger portion of the pie is coming from the high cost of expertise (staffing), equipment, maintenance, middleware, and training. The following analysis just focuses on the system and does not include the application software, which should be analyzed in a similar way.
To estimate the cost for a realistic SME scenario, we assume a company needs to run a mix of simulation jobs, most of them running on 32 cores, and some larger jobs with more fine-grained geometry and more sophisticated physics on 256 cores. Therefore, the company decides to buy a typical 16-node HPC system, each node having 2 CPUs with 16 cores (resulting in 256 cores in total), to be able to perform the 32-core runs as well as the 256-core runs. A reasonable price (but for the sake of simple math, still estimated) of such a system would be say $70K.
Now, according to IDC, the Total Cost of Ownership of such a system is $1M over three years, $333K per year for 256 cores, or $1,302 per core per year, or $0.149 for one core per hour (core hour), if you are utilizing this system 100%. Searching on Google for ‘average server utilization’, however, you find most often utilization rates between 5% and 20%, usually due to a peak-and-valley utilization pattern caused by varying workloads and day time (certainly, you will also find utilization of 90% when e.g. you are using Grid Engine for workload management; but you also find 5% on the other side). The following table shows the total cost of one core hour for a 16-node (256-core) in-house HPC cluster depending on % utilization (or ‘number of busy nodes’) of this cluster. Real cost per core/h is obtained by: cost of 100% utilization divided by real utilization, e.g. for a utilization of 20%: ($0.149 / 20) * 100 = $0.75.


For this example, with an average utilization of say 20%, the real cost for one core hour is $0.75, i.e. 5 times higher than for a 100% utilized cluster. For 40% utilization, which is not uncommon in HPC, the cost of one core hour is still $0.37. A general formula for X% of cluster utilization reads:
Cost per core/h ($) = { Cluster price ($) * TCO factor for 1 year (100/7/3) } / { # cores * 365 * 24 * utilization (%) / 100 }
According to the above table, the cost of the in-cloud solution (with $0.20 per core hour) will exceed the one for the in-house solution only at about 75% (and higher) average utilization of the in-house HPC cluster, a situation which we only heard of from academic or big-industry Supercomputing Centers serving hundreds or even thousands of users.
The real cost of HPC in the Cloud
Now, that’s an easy one, because you just ask your HPC Cloud provider how much one core costs per hour. At the time of this writing (February 2014) a reasonable price for one core hour on a similar powerful HPC system in the cloud is about $0.20 (not including additional services and application software of course). The workload per year for a 20% utilized cluster is equivalent to 256 cores * 24 hours * 365 days * 20 % = 448,512 core hours, or $89,702 when moved to the cloud. Obviously, when compared to $333K above, the cloud option is economically very attractive (in this case 3.7 times better than the in-house option).

The real cost of a hybrid solution
Now let’s go back to the above scenario of a company that needs to run a mix of 32-core and 256-core simulation jobs, at an average utilization of 20% of this system. We assume the 32-core workload utilizes a small in-house HPC cluster with two 16-core nodes, to accommodate all the 32-core jobs. This cluster is just 12.5% of the size of the big 256-core cluster with $1M 3-year TCO, i.e. roughly $42K per year (knowing that IDC’s TCO model gets more inaccurate for very small clusters). With an excellent cluster utilization of for example 92% of this small in-house cluster (to match the 20% utilization of the big cluster mentioned above) the core hour for the small cluster comes to $0.18. The remaining 256-core (16-node) big jobs are running in the cloud, for about one month during the one year (we have to choose 1 month to achieve the above 20% utilization of the big 256-core 16-node in-house cluster), and the core hour in the cloud is again $0.20, resulting in $37K for one month.
In-house cost of $42K (for the small jobs) and in-cloud cost of $37K (for the big jobs) for this hybrid solution result in $79K per year for the hybrid solution, compared with $90K for the full HPC in the Cloud service, and $333K per year for the large 256-core in-house cluster. In summary, if just the total cost matters and nothing else, the final solution (in our example for an average utilization of 20%) is clear: the hybrid and the cloud solutions outplay the in-house HPC cluster by a factor of 4.2 and 3.7 respectively.
[1] Randy Perry and Al Gillen. Demonstrating business value: Selling to your C-level executives. IDC White Paper, April 2007.
By Wolfgang Gentzsch, The UberCloud