Skip to content

[Feature] Adding test-deployment-baremetal to the vllm_performance actuator#784

Draft
DanteNiewenhuis wants to merge 5 commits intoIBM:mainfrom
DanteNiewenhuis:baremetal-experiment
Draft

[Feature] Adding test-deployment-baremetal to the vllm_performance actuator#784
DanteNiewenhuis wants to merge 5 commits intoIBM:mainfrom
DanteNiewenhuis:baremetal-experiment

Conversation

@DanteNiewenhuis
Copy link
Copy Markdown

Started with creating a bare-metal experiment. At the moment, it correctly checks if an endpoint is available and otherwise throws a VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.

…ctly checkes if an endpoint is available, and otherwise throws an VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.
@christian-pinto christian-pinto marked this pull request as draft April 2, 2026 14:33
…en not running already. Currently it is not yet correctly checking if serving is done.
… after runs benchmarks. Still not working:

- No alignment with input parameters.
- No checking if the correct model is being served (only if any is being served).
- vLLM instance is not yet correctly released after the benchmark.
…rving and stopping models as well as handle failing models. The Baremetal deployment seems to function well, but some cleanup is still needed.
@christian-pinto
Copy link
Copy Markdown
Member

@DanteNiewenhuis please, let me know when you want me to have another review pass on this PR.

@christian-pinto christian-pinto linked an issue Apr 13, 2026 that may be closed by this pull request
Copy link
Copy Markdown
Member

@christian-pinto christian-pinto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@DanteNiewenhuis I gave a first pass to the PR. I think you are trying to fork the logic of Environments/Environment manager too much for the two supported (k8s ant baremetal).

We should get to a point were all is abstracted in the Env and EnvManager and the logic of handling the envs should be the one in run_resource_and_workload_experiment.

In the end a vLLM deployment is the same regardles sof hether it's on baremetal or kubernetes. You might want to think of also limiting the maximum number of possible parallel environments at the source, e.g., when baremetal is set in the actuator config a validator is forcing the number of maximum replicas to be 1 and print a warning is something else was set.

Comment on lines +28 to +30
n_gpus: Annotated[int, pydantic.Field(gt=0)] = 1
gpu_type: Annotated[str, pydantic.Field()]

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are not benchmark parameters, they are parameters of the deployment. Why do you need to have them here in this model?

Comment on lines +45 to +48
tensor_parallel_size: Annotated[int, pydantic.Field()] = 1
max_model_len: Annotated[int, pydantic.Field()] = -1
max_num_batched_tokens: Annotated[int, pydantic.Field()] = -1
max_num_seqs: Annotated[int, pydantic.Field()] = 256
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as previous comment


# Remove fields not needed for BenchmarkResult
del results["date"]
# del results["date"]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# del results["date"]

"No namespace set in acutator configuration - will not be able to create deployments using k8s"
)

if self.actuator_parameters.baremetal:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If one sets the namespace, this would first initialize the K8sEnv Manger and then if baremetal is also set it will overwrite it will overwrite the env manager with the baremetal one.

This should be something along the lines of

if self.actuator_parameters.namespace:
   # Init K8s env
elif self.actuator_parameters.baremetal:
   # init baremetal 
else
   # something along these lines
   self.log.warning("No namespace set in actuator and not running in baremetal mode."
                               "This actuators will not be able to create neither kubernetes nor baremetal deployments")

I would also add a validator to the actuator configuration model to make sure that at most one between baremetal and namespace is set.

Comment on lines +156 to +166
else:
# add to clean up
try:
cleaner_handle = ray.get_actor(
name=CLEANER_ACTOR, namespace=queue.ray_namespace()
)
cleaner_handle.add_to_cleanup.remote(handle=self.env_manager)
except Exception as e:
logger.warning(
f"Failed to register custom actors for clean up {e}. Make sure you clean it up"
)
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like the cleanup handle is set only for the baremetal env manager case?

return env
return None

def _generate_identifier(self, configuration: dict[str, str]) -> str:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you need the identifier for? All envs are "indexed" depending on the configuration that is independent of baremetal/k8s


return json.dumps(env_values)

def get_environment(self, model: str, configuration: dict[str, str]) -> Environment | None:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the same regardless of the environment manager used. This is becuase changing this would change the logic of how the actuator works and we cannot guarantee the same behavior for different types of envs.


return False

def _serve_vLLM(values: dict[str, str],
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be handled in the get_environment() from the baremetal env manager.

logger = logging.getLogger(__name__)


def create_test_environment(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function is never used in the code.



@ray.remote
def run_serve_and_workload_experiment(
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the logic should be the same as in run_resource_and_workload_experiment

@christian-pinto
Copy link
Copy Markdown
Member

Also, you need to install the pre-commit hooks and sign-off all your commits.

Have a read at https://github.com/IBM/ado/blob/main/CONTRIBUTING.md

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Check for condition before starting experiment

2 participants