I see that the signature for the Evaluator includes the number of processes and/or GPUs, but I see that it's not possible to pass the number of nodes required to launch a given simulation.
By looking at libEnsemble it seems like this would be supported, are there any plans to implement it?
I have the problem that my scan requires to spawn MPI jobs across many nodes. I see that by passing an extra_args=extra_args="--nodes 2 --ntasks-per-node 16 --cpu-bind=none --exclusive" to a TemplateEvaluator then libEnsamble correctly sees that I'm requesting two nodes, however, it then fails with the backtrace
[0] 2026-04-09 12:32:33,173 libensemble.manager (ERROR): ---- Received error message from worker 1 ----
[0] 2026-04-09 12:32:33,173 libensemble.manager (ERROR): Message: libensemble.resources.mpi_resources.MPIResourcesException: Not enough nodes to honor arguments. Requested 2. Only 1 available
[0] 2026-04-09 12:32:33,173 libensemble.manager (ERROR): Traceback (most recent call last):
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/worker.py", line 418, in run
response = self._handle(Work)
^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/worker.py", line 361, in _handle
calc_out, persis_info, calc_status = self._handle_calc(Work, calc_in)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/worker.py", line 279, in _handle_calc
out = calc(calc_in, Work)
^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/utils/runners.py", line 54, in run
out = self._result(calc_in, Work["persis_info"], Work["libE_info"])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/utils/runners.py", line 46, in _result
return self.f(*args)
^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/optimas/sim_functions.py", line 68, in run_template_simulation
task = Executor.executor.submit(
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/executors/mpi_executor.py", line 341, in submit
mpi_specs = mpi_runner_obj.get_mpi_specs(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/executors/mpi_runner.py", line 338, in get_mpi_specs
nprocs, nnodes, ppn = mpi_resources.get_resources(resources, nprocs, nnodes, ppn, hyperthreads)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/resources/mpi_resources.py", line 203, in get_resources
rassert(
File "/global/cfs/cdirs/m558/terzani/sw/perlmutter/gpu/venvs/hipace-gpu/lib/python3.11/site-packages/libensemble/resources/mpi_resources.py", line 24, in rassert
raise MPIResourcesException(*args)
libensemble.resources.mpi_resources.MPIResourcesException: Not enough nodes to honor arguments. Requested 2. Only 1 available
I'm running from an allocation of 4 nodes on Perlmutter requested interactively via salloc, but somehow the script doesn't see that.
I wonder is hacking the --nodes 2 into the extra_args is the correct way of doing this. Since the Evaluator already supports the number of processes and gpus, would it be feasible to implement direct support for the number of nodes?
I see that the signature for the Evaluator includes the number of processes and/or GPUs, but I see that it's not possible to pass the number of nodes required to launch a given simulation.
By looking at libEnsemble it seems like this would be supported, are there any plans to implement it?
I have the problem that my scan requires to spawn MPI jobs across many nodes. I see that by passing an
extra_args=extra_args="--nodes 2 --ntasks-per-node 16 --cpu-bind=none --exclusive"to aTemplateEvaluatorthen libEnsamble correctly sees that I'm requesting two nodes, however, it then fails with the backtraceI'm running from an allocation of 4 nodes on Perlmutter requested interactively via
salloc, but somehow the script doesn't see that.I wonder is hacking the
--nodes 2into theextra_argsis the correct way of doing this. Since the Evaluator already supports the number of processes and gpus, would it be feasible to implement direct support for the number of nodes?