Documentation


Compute Servers and Distributed Workers

A Remote Services cluster is a collection of nodes of two different types:

COMPUTE
A Compute Server node supports the offloading of optimization jobs. Features include load balancing, queueing and concurrent execution of jobs. A Compute Server license is required on the node. A Compute Server node can also act as a distributed worker.

WORKER
A distributed worker node can be used to execute part of a distributed algorithm. A license is not necessary to run a distributed worker, because it is always used in conjunction with a manager (another node or a client program) that requires a license. A distributed worker node can only be used by one manager at a time (i.e., the job limit is always set to 1).

By default, grb_rs will try to start a node in Compute Server mode and the node license status will be INVALID if no license is found. In order to start a distributed worker, you need to set the WORKER property in the grb_rs.cnf configuration file (or the --worker command-line flag):

WORKER=true

Once you form your cluster, the node type will be displayed in the TYPE column of the output of grbcluster nodes:

> grbcluster --server=server1 --password=pass nodes --long
ADDRESS STATUS TYPE    LICENSE PROCESSING #Q #R JL IDLE  %MEM %CPU STARTED             RUNTIMES      VERSION
server1 ALIVE  COMPUTE VALID   ACCEPTING  0  0   2 2h28m 2.67 2.03 2019-04-07 11:41:25 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0
server2 ALIVE  COMPUTE VALID   ACCEPTING  0  0   2 2h28m 3.47 0.83 2019-04-07 11:41:33 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0
server3 ALIVE  WORKER  N/A     ACCEPTING  0  0  1  <1s   0.69 1.13 2019-04-07 14:09:37 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0
server4 ALIVE  WORKER  N/A     ACCEPTING  0  0  1  <1s   1.17 1.05 2019-04-07 14:09:24 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0

The node type cannot be changed once grb_rs has started. If you wish to change the node type, you need to stop the node, change the configuration, and restart the node. You may have to update your license as well.

Distributed Optimization

When using distributed optimization, distributed workers are controlled by a manager. There are two ways to set up the manager:

  • The manager can be a job running on a Compute Server. In this case, the manager job is first submitted to the cluster and executes on one of the COMPUTE nodes as usual. When this job starts, it will also request some number of workers (see parameters DistributedMIPJobs, ConcurrentJobs, or TuneJobs). The first choice will be WORKER nodes. If not enough are available, it will use COMPUTE nodes. The workload associated with managing the distributed algorithm is quite light, so the initial job will act as both the manager and the first worker.

  • The manager can be the client program itself. The manager does not participate in the distributed optimization. It simply coordinates the efforts of the distributed workers. The manager will request distributed workers (using the WorkerPool parameter), and the cluster will first select the WORKER nodes then, if not enough are available, it will use COMPUTE nodes as well.
In both cases, the machine where the manager runs must be licensed to run distributed algorithms (you should see a DISTRIBUTED= line in your license file).

It is typically better to use the Compute Server itself as the distributed manager, rather than the client machine. This is particularly true if the Compute Server and the workers are physically close to each other, but physically distant from the client machine. In a typical environment, the client machine will offload the Gurobi computations onto the Compute Server, and the Compute Server will then act as the manager for the distributed computation.