Compute Servers and Distributed Workers
A Remote Services cluster is a collection of nodes of two different types:
- A Compute Server node supports the offloading of
optimization jobs. Features include load balancing, queueing and
concurrent execution of jobs. A Compute Server license is required
on the node. A Compute Server node can also act as a Distributed
- A Distributed Worker node can be used to execute part of
a distributed algorithm. A license is not necessary to run a
Distributed Worker, because it is always used in conjunction with a
manager (another node or a client program) that requires a license.
A Distributed Worker node can only be used by one manager at a time
(i.e., the job limit is always set to 1).
By default, grb_rs will try to start a node in Compute Server mode and the node license status will be INVALID if no license is found. In order to start a Distributed Worker, you need to set the WORKER property in the grb_rs.cnf configuration file (or the —worker command-line flag):
Once you form your cluster, the node type will be displayed in the TYPE column of the output of grbcluster nodes:
> grbcluster nodes ID ADDRESS STATUS TYPE LICENSE PROCESSING #Q #R JL IDLE %MEM %CPU b7d037db server1:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 10 19m 15.30 5.64 735c595f server2:61000 ALIVE COMPUTE VALID ACCEPTING 0 0 10 19m 10.45 8.01 eb07fe16 server3:61000 ALIVE WORKER VALID ACCEPTING 0 0 1 <1s 11.44 2.33 4f14a532 server4:61000 ALIVE WORKER VALID ACCEPTING 0 0 1 <1s 12.20 5.60
The node type cannot be changed once grb_rs has started. If you wish to change the node type, you need to stop the node, change the configuration, and restart the node. You may have to update your license as well.
When using distributed optimization, distributed workers are controlled by a manager. There are two ways to set up the manager:
- The manager can be a job running on a Compute Server. In this
case, a job is submitted to the cluster and executes on one of the
COMPUTE nodes as usual. When the job reaches the point
where distributed optimization is requested, it will also request
some number of workers (see parameters DistributedMIPJobs, ConcurrentJobs, or TuneJobs). The first choice will be WORKER nodes.
If not enough are available, it will use COMPUTE nodes.
The workload associated with managing the distributed algorithm is
quite light, so the initial job will act as both the manager and the
- The manager can be the client program itself. The manager does not participate in the distributed optimization. It simply coordinates the efforts of the distributed workers. The manager will request distributed workers (using the WorkerPool parameter), and the cluster will first select the WORKER nodes. If not enough are available, it will use COMPUTE nodes as well.
It is typically better to use the Compute Server itself as the distributed manager, rather than the client machine. This is particularly true if the Compute Server and the workers are physically close to each other, but physically distant from the client machine. In a typical environment, the client machine will offload the Gurobi computations onto the Compute Server, and the Compute Server will then act as the manager for the distributed computation.