Compute Servers and Distributed Workers
A Remote Services cluster is a collection of nodes of two different types:
- A Compute Server node supports the offloading of
optimization jobs. Features include load balancing, queueing and
concurrent execution of jobs. A Compute Server license is required
on the node. A Compute Server node can also act as a distributed
- A distributed worker node can be used to execute part of
a distributed algorithm. A license is not necessary to run a
distributed worker, because it is always used in conjunction with a
manager (another node or a client program) that requires a license.
A distributed worker node can only be used by one manager at a time
(i.e., the job limit is always set to 1).
By default, grb_rs will try to start a node in Compute Server mode and the node license status will be INVALID if no license is found. In order to start a distributed worker, you need to set the WORKER property in the grb_rs.cnf configuration file (or the --worker command-line flag):
Once you form your cluster, the node type will be displayed in the TYPE column of the output of grbcluster nodes:
> grbcluster --server=server1 --password=pass nodes --long ADDRESS STATUS TYPE LICENSE PROCESSING #Q #R JL IDLE %MEM %CPU STARTED RUNTIMES VERSION server1 ALIVE COMPUTE VALID ACCEPTING 0 0 2 2h28m 2.67 2.03 2019-04-07 11:41:25 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0 server2 ALIVE COMPUTE VALID ACCEPTING 0 0 2 2h28m 3.47 0.83 2019-04-07 11:41:33 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0 server3 ALIVE WORKER N/A ACCEPTING 0 0 1 <1s 0.69 1.13 2019-04-07 14:09:37 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0 server4 ALIVE WORKER N/A ACCEPTING 0 0 1 <1s 1.17 1.05 2019-04-07 14:09:24 [8.0.1 8.1.1] 8.1.1-v8.1.1rc0
The node type cannot be changed once grb_rs has started. If you wish to change the node type, you need to stop the node, change the configuration, and restart the node. You may have to update your license as well.
When using distributed optimization, distributed workers are controlled by a manager. There are two ways to set up the manager:
- The manager can be a job running on a Compute Server. In this
case, the manager job is first submitted to the cluster and executes
on one of the COMPUTE nodes as usual. When this job starts,
it will also request some number of workers (see
parameters DistributedMIPJobs, ConcurrentJobs, or
TuneJobs). The first choice will be WORKER nodes. If
not enough are available, it will use COMPUTE nodes. The
workload associated with managing the distributed algorithm is quite
light, so the initial job will act as both the manager and the first
- The manager can be the client program itself. The manager does not participate in the distributed optimization. It simply coordinates the efforts of the distributed workers. The manager will request distributed workers (using the WorkerPool parameter), and the cluster will first select the WORKER nodes then, if not enough are available, it will use COMPUTE nodes as well.
It is typically better to use the Compute Server itself as the distributed manager, rather than the client machine. This is particularly true if the Compute Server and the workers are physically close to each other, but physically distant from the client machine. In a typical environment, the client machine will offload the Gurobi computations onto the Compute Server, and the Compute Server will then act as the manager for the distributed computation.