Running simulations that require multiple nodes

I see that the signature for the Evaluator includes the number of processes and/or GPUs, but I see that it's not possible to pass the number of nodes required to launch a given simulation.
By looking at libEnsemble it seems like this would be supported, are there any plans to implement it?
I have the problem that my scan requires to spawn MPI jobs across many nodes. I see that by passing an `extra_args=extra_args="--nodes 2 --ntasks-per-node 16 --cpu-bind=none --exclusive"` to a `TemplateEvaluator` then libEnsamble correctly sees that I'm requesting two nodes, however, it then fails with the backtrace
```
[0]  2026-04-09 12:32:33,173 libensemble.manager (ERROR): ---- Received error message from worker 1 ----
[0]  2026-04-09 12:32:33,173 libensemble.manager (ERROR): Message: libensemble.resources.mpi_resources.MPIResourcesException: Not enough nodes to honor arguments. Requested 2. Only 1 available
[0]  2026-04-09 12:32:33,173 libensemble.manager (ERROR): Traceback (most recent call last):
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/worker.py", line 418, in run
    response = self._handle(Work)
               ^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/worker.py", line 361, in _handle
    calc_out, persis_info, calc_status = self._handle_calc(Work, calc_in)
                                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/worker.py", line 279, in _handle_calc
    out = calc(calc_in, Work)
          ^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/utils/runners.py", line 54, in run
    out = self._result(calc_in, Work["persis_info"], Work["libE_info"])
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/utils/runners.py", line 46, in _result
    return self.f(*args)
           ^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/optimas/sim_functions.py", line 68, in run_template_simulation
    task = Executor.executor.submit(
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/executors/mpi_executor.py", line 341, in submit
    mpi_specs = mpi_runner_obj.get_mpi_specs(
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/executors/mpi_runner.py", line 338, in get_mpi_specs
    nprocs, nnodes, ppn = mpi_resources.get_resources(resources, nprocs, nnodes, ppn, hyperthreads)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/resources/mpi_resources.py", line 203, in get_resources
    rassert(
  File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/resources/mpi_resources.py", line 24, in rassert
    raise MPIResourcesException(*args)
libensemble.resources.mpi_resources.MPIResourcesException: Not enough nodes to honor arguments. Requested 2. Only 1 available
```

I'm running from an allocation of 4 nodes on Perlmutter requested interactively via `salloc`, but somehow the script doesn't see that.
I wonder is hacking the `--nodes 2` into the `extra_args` is the correct way of doing this. Since the Evaluator already supports the number of processes and gpus, would it be feasible to implement direct support for the number of nodes?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Running simulations that require multiple nodes #299

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Running simulations that require multiple nodes #299

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions