Better Hardware for Solving Mathematical Optimization Problems? The Future is Looking Bright

 

Author: Edward Rothberg, PhD
Date: 12/7/2020

 

At Gurobi, we’re heavy consumers of processing cycles. Any candidate change to our product – the Gurobi Optimizer – goes through rigorous testing, which involves running the modified code on literally thousands of models to make sure the change doesn’t have unintended consequences. We’re constantly evaluating possible improvements, which means that we keep hundreds of machines busy nearly all the time.

Every year, we look at how we can enhance our computing infrastructure. Unfortunately, for the past four years our conclusion has always been that the machines available at the time were not significantly better than the machines we already had. At one point, we actually asked our vendor to build more of the exact same machines we bought two years earlier – so we could have a larger set of identical machines. At no point in the past four years have we gotten really excited about our future upgrade options. Fortunately, it looks like that’s finally changing, thanks to new systems from AMD and Apple.

 

New, Cutting-Edge Hardware Systems

On the AMD side, we’ve been testing their latest CPU, Rome. Gurobi performance on one core is quite strong, and parallel scaling is much better than what we’ve seen before. For most of the algorithms in Gurobi, the performance of the machine is ultimately limited by the rate at which the cores can pull data from memory. With the machines we’re used to, it only takes a few cores to saturate the memory system. These new AMD machines provide more memory bandwidth, which leads to better parallel scaling. There are limits to how much parallelism you can exploit, but you hit those limits much later on these new systems.

On the Apple front, the new M1 chip is showing quite impressive performance. The new chip, running a recompiled Gurobi binary, gives the best single-core performance of any system we’ve ever tried. Parallel scaling is quite strong as well. The new chip has eight cores, four “high-performance” cores and four “high-efficiency” (i.e., slower) cores. Scaling is limited beyond four cores, but scaling to four cores is quite strong. The chip is very new, and the software infrastructure isn’t complete right now, so unfortunately it is still too early for us to release a native binary and official support. In the meantime, however, anecdotal testing suggests that our existing Mac OS release gets over 80% of the performance of a “native” port on this new chip, which is quite remarkable for emulation.

 

Looking to the Future

There are still several reasons why we can’t yet replace our current computing resources with these systems. However, after several years of waiting, it appears that people looking to solve optimization problems faster will soon see a significant boost on the hardware front.