[Feature] Adding test-deployment-baremetal to the vllm_performance actuator by DanteNiewenhuis · Pull Request #784 · IBM/ado

DanteNiewenhuis · 2026-04-02T14:31:22Z

Started with creating a bare-metal experiment. At the moment, it correctly checks if an endpoint is available and otherwise throws a VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.

…ctly checkes if an endpoint is available, and otherwise throws an VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.

…en not running already. Currently it is not yet correctly checking if serving is done.

… after runs benchmarks. Still not working: - No alignment with input parameters. - No checking if the correct model is being served (only if any is being served). - vLLM instance is not yet correctly released after the benchmark.

…te to the output metrics of vllm bench

…rving and stopping models as well as handle failing models. The Baremetal deployment seems to function well, but some cleanup is still needed.

christian-pinto · 2026-04-10T08:02:57Z

@DanteNiewenhuis please, let me know when you want me to have another review pass on this PR.

christian-pinto

@DanteNiewenhuis I gave a first pass to the PR. I think you are trying to fork the logic of Environments/Environment manager too much for the two supported (k8s ant baremetal).

We should get to a point were all is abstracted in the Env and EnvManager and the logic of handling the envs should be the one in run_resource_and_workload_experiment.

In the end a vLLM deployment is the same regardles sof hether it's on baremetal or kubernetes. You might want to think of also limiting the maximum number of possible parallel environments at the source, e.g., when baremetal is set in the actuator config a validator is forcing the number of maximum replicas to be 1 and print a warning is something else was set.

christian-pinto · 2026-04-20T06:27:58Z

+    n_gpus: Annotated[int, pydantic.Field(gt=0)] = 1
+    gpu_type: Annotated[str, pydantic.Field()]
+


These are not benchmark parameters, they are parameters of the deployment. Why do you need to have them here in this model?

christian-pinto · 2026-04-20T06:28:12Z

+    tensor_parallel_size: Annotated[int, pydantic.Field()] = 1
+    max_model_len: Annotated[int, pydantic.Field()] = -1
+    max_num_batched_tokens: Annotated[int, pydantic.Field()] = -1
+    max_num_seqs: Annotated[int, pydantic.Field()] = 256


Same as previous comment

christian-pinto · 2026-04-20T06:28:51Z


    # Remove fields not needed for BenchmarkResult
-    del results["date"]
+    # del results["date"]


Suggested change

# del results["date"]

christian-pinto · 2026-04-20T06:38:06Z

+                "No namespace set in acutator configuration - will not be able to create deployments using k8s"
            )
+
+        if self.actuator_parameters.baremetal:


If one sets the namespace, this would first initialize the K8sEnv Manger and then if baremetal is also set it will overwrite it will overwrite the env manager with the baremetal one.

This should be something along the lines of

if self.actuator_parameters.namespace: # Init K8s env elif self.actuator_parameters.baremetal: # init baremetal else # something along these lines self.log.warning("No namespace set in actuator and not running in baremetal mode." "This actuators will not be able to create neither kubernetes nor baremetal deployments")

I would also add a validator to the actuator configuration model to make sure that at most one between baremetal and namespace is set.

christian-pinto · 2026-04-20T06:40:01Z

+            else:
+                # add to clean up
+                try:
+                    cleaner_handle = ray.get_actor(
+                        name=CLEANER_ACTOR, namespace=queue.ray_namespace()
+                    )
+                    cleaner_handle.add_to_cleanup.remote(handle=self.env_manager)
+                except Exception as e:
+                    logger.warning(
+                        f"Failed to register custom actors for clean up {e}. Make sure you clean it up"
+                    )


Looks like the cleanup handle is set only for the baremetal env manager case?

christian-pinto · 2026-04-20T07:23:46Z

+                return env
+        return None
+
+    def _generate_identifier(self, configuration: dict[str, str]) -> str:


What do you need the identifier for? All envs are "indexed" depending on the configuration that is independent of baremetal/k8s

christian-pinto · 2026-04-20T07:27:03Z

+
+        return json.dumps(env_values)
+
+    def get_environment(self, model: str, configuration: dict[str, str]) -> Environment | None:


This should be the same regardless of the environment manager used. This is becuase changing this would change the logic of how the actuator works and we cannot guarantee the same behavior for different types of envs.

christian-pinto · 2026-04-20T07:28:36Z

+
+    return False
+
+def _serve_vLLM(values: dict[str, str],


This should be handled in the get_environment() from the baremetal env manager.

christian-pinto · 2026-04-20T07:29:10Z

+logger = logging.getLogger(__name__)
+
+
+def create_test_environment(


This function is never used in the code.

christian-pinto · 2026-04-20T07:30:48Z

+
+
+@ray.remote
+def run_serve_and_workload_experiment(


the logic should be the same as in run_resource_and_workload_experiment

christian-pinto · 2026-04-20T07:35:33Z

Also, you need to install the pre-commit hooks and sign-off all your commits.

Have a read at https://github.com/IBM/ado/blob/main/CONTRIBUTING.md

Started with creating a baremetal experiment. At the moment, it corre…

5d9ddcf

…ctly checkes if an endpoint is available, and otherwise throws an VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.

christian-pinto marked this pull request as draft April 2, 2026 14:33

michael-johnston requested a review from christian-pinto April 3, 2026 10:14

DanteNiewenhuis added 4 commits April 4, 2026 14:20

Updated the deployment-baremetal to be able to serve an vLLM model wh…

c9858d4

…en not running already. Currently it is not yet correctly checking if serving is done.

Updated deployment-baremetal to properly serve an LLM model. Added da…

62a320b

…te to the output metrics of vllm bench

Created and environment for the Baremetal deployment which handles se…

d772ce9

…rving and stopping models as well as handle failing models. The Baremetal deployment seems to function well, but some cleanup is still needed.

christian-pinto linked an issue Apr 13, 2026 that may be closed by this pull request

feat: Check for condition before starting experiment #772

Open

christian-pinto requested changes Apr 20, 2026

View reviewed changes

		n_gpus: Annotated[int, pydantic.Field(gt=0)] = 1
		gpu_type: Annotated[str, pydantic.Field()]


		return json.dumps(env_values)

		def get_environment(self, model: str, configuration: dict[str, str]) -> Environment \| None:

		logger = logging.getLogger(__name__)


		def create_test_environment(

Conversation

DanteNiewenhuis commented Apr 2, 2026

Uh oh!

christian-pinto commented Apr 10, 2026

Uh oh!

christian-pinto left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

christian-pinto commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants