Documentation


Compute Servers and Distributed Workers

A Remote Services cluster is a collection of nodes of two different types:

COMPUTE
A Compute Server node supports the offloading of optimization jobs. Features include load balancing, queueing and concurrent execution of jobs. A Compute Server license is required on the node. A Compute Server node can also act as a Distributed Worker.

WORKER
A Distributed Worker node can be used to execute part of a distributed algorithm. A license is not necessary to run a Distributed Worker, because it is always used in conjunction with a manager (another node or a client program) that requires a license. A Distributed Worker node can only be used by one manager at a time (i.e., the job limit is always set to 1).

By default, grb_rs will try to start a node in Compute Server mode and the node license status will be INVALID if no license is found. In order to start a Distributed Worker, you need to set the WORKER property in the grb_rs.cnf configuration file (or the --worker command-line flag):

WORKER=true

Once you form your cluster, the node type will be displayed in the TYPE column of the output of grbcluster nodes:

> grbcluster nodes
ID       ADDRESS       STATUS TYPE    LICENSE PROCESSING #Q #R JL IDLE %MEM  %CPU
b7d037db server1:61000 ALIVE  COMPUTE VALID   ACCEPTING  0  0  10 19m  15.30 5.64
735c595f server2:61000 ALIVE  COMPUTE VALID   ACCEPTING  0  0  10 19m  10.45 8.01
eb07fe16 server3:61000 ALIVE  WORKER  VALID   ACCEPTING  0  0  1  <1s  11.44 2.33
4f14a532 server4:61000 ALIVE  WORKER  VALID   ACCEPTING  0  0  1  <1s  12.20 5.60

The node type cannot be changed once grb_rs has started. If you wish to change the node type, you need to stop the node, change the configuration, and restart the node. You may have to update your license as well.

Distributed Optimization

When using distributed optimization, distributed workers are controlled by a manager. There are two ways to set up the manager:

  • The manager can be a job running on a Compute Server. In this case, a job is submitted to the cluster and executes on one of the COMPUTE nodes as usual. When the job reaches the point where distributed optimization is requested, it will also request some number of workers (see parameters DistributedMIPJobs, ConcurrentJobs, or TuneJobs). The first choice will be WORKER nodes. If not enough are available, it will use COMPUTE nodes. The workload associated with managing the distributed algorithm is quite light, so the initial job will act as both the manager and the first worker.

  • The manager can be the client program itself. The manager does not participate in the distributed optimization. It simply coordinates the efforts of the distributed workers. The manager will request distributed workers (using the WorkerPool parameter), and the cluster will first select the WORKER nodes. If not enough are available, it will use COMPUTE nodes as well.
In both cases, the machine where the manager runs must be licensed to run distributed algorithms (you should see a DISTRIBUTED= line in your license file).

It is typically better to use the Compute Server itself as the distributed manager, rather than the client machine. This is particularly true if the Compute Server and the workers are physically close to each other, but physically distant from the client machine. In a typical environment, the client machine will offload the Gurobi computations onto the Compute Server, and the Compute Server will then act as the manager for the distributed computation.