Documentation

Distributed Algorithms

Gurobi Remote Services allow you to perform distributed optimization. All you need is a cluster with more than one node. The nodes can be either Compute Server or distributed worker nodes. Ideally these nodes should all give very similar performance. Identical performance is best, especially for distributed tuning, but small variations in performance won't hurt overall results too much.

Choosing an Appropriate Cluster

Before launching a distributed optimization job, you should run the grbcluster nodes command to make sure the cluster contains more than one live machine:

> grbcluster nodes --server=server1:port --password=pass

If you see multiple live nodes, then that cluster is good to go:

ADDRESS       STATUS TYPE    LICENSE #Q #R JL IDLE  %MEM  %CPU
server1:61000 ALIVE  COMPUTE VALID   0  0  1  43m0s 42.67 2.53
server2:61000 ALIVE  WORKER  N/A     0  0  1  <1s   42.67 1.76

We should reiterate a point that was raised earlier: you do not need a Gurobi license to run Gurobi Remote Services on a machine. While some services are only available with a license, any machine that is running Gurobi Remote Services will provide the Distributed Worker service.

Running A Distributed Algorithm

Running a distributed algorithm is simply a matter of setting the appropriate Gurobi parameter. Gurobi supports distributed MIP, concurrent LP and MIP, and distributed tuning. These are controlled with three parameters: DistributedMIPJobs, ConcurrentJobs, and TuneJobs, respectively. These parameters indicate how many distinct distributed worker jobs you would like to start. Keep in mind that the initial Compute Server job will act as the first worker.

To give an example, if you have a cluster consisting of two machines (server1 and server2), and if you set TuneJobs to 2 in grbtune...

> grbtune --server=server1:61000 --password=passwd TuneJobs=2 misc07.mps
...you should see output that looks like the following...
Capacity available on 'server1:61000' - connecting...
...
Using Compute Server as first worker
Started distributed worker on server2:61000

Distributed tuning: launched 2 distributed worker jobs
This output indicates that two worker jobs have been launched, one on machine server1 and the other on machine server2. These two jobs will continue to run until your tuning run completes.

Similarly, if you launch distributed MIP...

> gurobi_cl --server=server1:61000 --password=passwd DistributedMIPJobs=2 misc07.mps
...you should see the following output in the log...
Using Compute Server as first worker
Started distributed worker on server2:61000

Distributed MIP job count: 2

Note that distributed workers are allocated on a first-come, first-served basis, so if multiple users are sharing a cluster, you should be prepared for the possibility that some or all of your distributed workers may be busy when you request them. Your program will grab as many as it can, up to the requested count. If none are available, it will return an error.

Using a Separate Manager

While distributed workers always need to be part of a Remote Services cluster, note that the manager itself does not. Any machine that is licensed to run distributed algorithms can act as the manager. You simply need to set WorkerPool and WorkerPassword parameters to point to the Remote Services cluster that contains your distributed workers. To give an example:

> gurobi_cl WorkerPool=server1:61000 WorkerPassword=passwd DistributedMIPJobs=2 misc07.mps
...you should see the following output in the log...
Started distributed worker on server1:61000
Started distributed worker on server2:61000

Distributed MIP job count: 2
In this case, the distributed computation is managed by the machine where you launched this command, and the two distributed workers come from your Remote Services cluster.

Compute Server Considerations

As noted earlier, Gurobi Compute Servers, can be used for distributed optimization as well. Compute Servers offer a lot more flexibility than distributed workers, though, so they require a bit of additional explanation.

The first point you should be aware of is that one Compute Server node can actually host multiple distributed worker jobs. Compute Server nodes allow you to set a limit on the number of jobs that can run simultaneously. Each of those jobs can be a distributed worker. For example, if you have a cluster that contains a pair of Compute Server nodes, each with a job limit of 2, then issuing the command...

> gurobi_cl --server=server1:61000 --password=passwd DistributedMIPJobs=3 misc07.mps
...would produce the following output...
Capacity available on 'server1:61000' - connecting...
...
Using Compute Server as first worker
Started distributed worker on server2:61000
Started distributed worker on server1:61000
Compute Server assigns a new job to the machine with the most available capacity, so assuming that the two servers are otherwise idle, the first distributed worker job would be assigned to server1, the second to server2, and the third to server1.