Configuring a Distributed Worker Pool
Before your program can perform a distributed optimization task, you'll need to identify a set of machines to use as your distributed workers. Ideally these machines should give very similar performance. Identical performance is best, especially for distributed tuning, but small variations in performance won't hurt your overall results too much.
Specifying the Distributed Worker Pool
Once you've identified your distributed worker machines, you'll need to start Gurobi Remote Services on these machines. Instructions for setting up Gurobi Remote Services can be found in the Gurobi Quick Start Guide. As noted in the Quick Start Guide, run following command to make sure a machine is available to be used as a distributed worker:
> gurobi_cl --server=machine --status(replace
machinewith the name or IP address of your machine). If you see
Distributed Workerlisted among the set of available services...
Gurobi Remote Services (version 7.0.0) functioning normally Available services: Distributed Workerthen that machine is good to go.
We should reiterate a point that is raised in the Quick Start Guide: you do not need a Gurobi license to run Gurobi Remote Services on a machine. Some services are only available with a license (e.g., Compute Server). However, any machine that is running Gurobi Remote Services will provide the Distributed Worker service.
The Distributed Manager Machine
Once you have identified a set of distributed worker machines, you'll need to choose a manager machine. This is the machine where your application actually runs. In addition to building the optimization model, your manager machine will coordinate the efforts of the distributed workers during the execution of the distributed algorithm.
You'll need to choose a manager machine that is licensed to run the
distributed algorithms. You'll see a
DISTRIBUTED= line in your
license file if distributed algorithms are enabled.
Note that, by default, the manager does not participate in the distributed optimization. It simply coordinates the efforts of the distributed workers. If you would like the manager to also act as one of the workers, you'll need to start Gurobi Remote Services on the manager machine as well.
Note that we only allow a machine to act as manager for a single distributed job. If you want to run multiple distributed jobs simultaneously, you'll need multiple manager machines.
Specifying the Distributed Worker Pool
If you'd like to invoke a distributed algorithm from your application,
you'll need to provide the names of the distributed worker machines.
You do this by setting the
WorkerPool parameter (refer to
the Gurobi Parameter section for
information on how to set a parameter). The parameter should be set
to a string that contains a comma-separated list of either machine
names or IP addresses. For example, you might use the following in
gurobi_cl command line:
> gurobi_cl WorkerPool=server1,server2,server3 ...
If you have set up an access password on the distributed worker machines, you'll need to provide it through the WorkerPassword parameter. All machines in the worker pool must have the same access password.
Note that providing a list of available workers is strictly a configuration step. Your program won't actually use any of the distributed algorithms unless it specifically requests them. Instructions for doing so are next.
Requesting A Distributed Algorithm
Once you've set the
WorkerPool parameter to
the appropriate value, your final step is to set the
TuneJobs parameter. These
parameters indicate how many distinct distributed worker
jobs you would like to start.
For example, if you set
TuneJobs to 2 in
> grbtune WorkerPool=server1,server2 TuneJobs=2 misc07.mps...you should see the following output in the log...
Started distributed worker on server1 Started distributed worker on server2 Distributed tuning: launched 2 distributed worker jobsThis output indicates that two jobs have been launched, one on machine
server1and the other on machine
server2. These two jobs will continue to run until your tuning run completes.
Similarly, if you launch distributed MIP...
> gurobi_cl WorkerPool=server1,server2 DistributedMIPJobs=2 misc07.mps...you should see the following output in the log...
Started distributed worker on server1 Started distributed worker on server2 Distributed MIP job count: 2
Note that, in most cases, each machine runs one distributed worker job at a time. Distributed workers are allocated on a first-come, first-served basis, so if multiple users are sharing a set of distributed worker machines, you should be prepared for the possibility that some or all of them may be busy when the manager requests them. The manager will grab as many as it can, up to the requested count. If none are available, it will return an error.
Compute Server Considerations
If you have one or more Gurobi Compute Servers, you can use them for distributed optimization as well. Compute Servers offer a lot more flexibility than distributed workers, though, so they require a bit of additional explanation.
The first point you should be aware of is that one Compute Server can actually host multiple distributed worker jobs. Compute Servers allow you to set a limit on the number of jobs that can run simultaneously. Each of those jobs can be a distributed worker. For example, if you have a pair of Compute Servers, each with a job limit of 2, then issuing the command...
> gurobi_cl DistributedMIPJobs=3 WorkerPool=server1,server2 misc07.mps...would produce the following output...
Started distributed worker on server1 Started distributed worker on server2 Started distributed worker on server1Compute Server assigns a new job to the machine with the most available capacity, so assuming that the two servers are otherwise idle, the first distributed worker job would be assigned to
server1, the second to
server2, and the third to
Another point to note is that, if you are working in a Compute Server environment, it is often better to use the Compute Server itself as the distributed manager, rather than the client machine. This is particularly true if the Compute Server and the workers are physically close to each other, but physically distant from the client machine. In a typical environment, the client machine will offload the Gurobi computations onto the Compute Server, and the Compute Server will then act as the manager for the distributed computation.
To give an example, running following command on machine
> gurobi_cl --server=server1 WorkerPool=server1,server2 DistributeMIPJobs=2 misc07.mps...will lead to the following sequence of events...
- The model will be read from the disk on
client1and passed to Compute Server
server1will act as the manager of the distributed optimization.
server1will start two distributed worker jobs, one that also runs on
server1and another that runs on
Compute Server provides load balancing among multiple machines, so it
is common for the user to provides a list of available servers when a
Gurobi application starts. We'll automatically copy this list into the
WorkerPool parameter. Of course, you can change the value of
this parameter in your program, but the default behavior is to draw
from the same set of machines for the distributed workers. Thus,
the following command would be equivalent to the previous command:
> gurobi_cl --server=server1,server2 DistributedMIPJobs=2 misc07.mps
Please refer to the next section section for more information on using a Gurobi Compute Server.