[Feature] Adding test-deployment-baremetal to the vllm_performance actuator#784
[Feature] Adding test-deployment-baremetal to the vllm_performance actuator#784DanteNiewenhuis wants to merge 5 commits intoIBM:mainfrom
Conversation
…ctly checkes if an endpoint is available, and otherwise throws an VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.
…en not running already. Currently it is not yet correctly checking if serving is done.
… after runs benchmarks. Still not working: - No alignment with input parameters. - No checking if the correct model is being served (only if any is being served). - vLLM instance is not yet correctly released after the benchmark.
…te to the output metrics of vllm bench
…rving and stopping models as well as handle failing models. The Baremetal deployment seems to function well, but some cleanup is still needed.
|
@DanteNiewenhuis please, let me know when you want me to have another review pass on this PR. |
christian-pinto
left a comment
There was a problem hiding this comment.
@DanteNiewenhuis I gave a first pass to the PR. I think you are trying to fork the logic of Environments/Environment manager too much for the two supported (k8s ant baremetal).
We should get to a point were all is abstracted in the Env and EnvManager and the logic of handling the envs should be the one in run_resource_and_workload_experiment.
In the end a vLLM deployment is the same regardles sof hether it's on baremetal or kubernetes. You might want to think of also limiting the maximum number of possible parallel environments at the source, e.g., when baremetal is set in the actuator config a validator is forcing the number of maximum replicas to be 1 and print a warning is something else was set.
| n_gpus: Annotated[int, pydantic.Field(gt=0)] = 1 | ||
| gpu_type: Annotated[str, pydantic.Field()] | ||
|
|
There was a problem hiding this comment.
These are not benchmark parameters, they are parameters of the deployment. Why do you need to have them here in this model?
| tensor_parallel_size: Annotated[int, pydantic.Field()] = 1 | ||
| max_model_len: Annotated[int, pydantic.Field()] = -1 | ||
| max_num_batched_tokens: Annotated[int, pydantic.Field()] = -1 | ||
| max_num_seqs: Annotated[int, pydantic.Field()] = 256 |
There was a problem hiding this comment.
Same as previous comment
|
|
||
| # Remove fields not needed for BenchmarkResult | ||
| del results["date"] | ||
| # del results["date"] |
There was a problem hiding this comment.
| # del results["date"] |
| "No namespace set in acutator configuration - will not be able to create deployments using k8s" | ||
| ) | ||
|
|
||
| if self.actuator_parameters.baremetal: |
There was a problem hiding this comment.
If one sets the namespace, this would first initialize the K8sEnv Manger and then if baremetal is also set it will overwrite it will overwrite the env manager with the baremetal one.
This should be something along the lines of
if self.actuator_parameters.namespace:
# Init K8s env
elif self.actuator_parameters.baremetal:
# init baremetal
else
# something along these lines
self.log.warning("No namespace set in actuator and not running in baremetal mode."
"This actuators will not be able to create neither kubernetes nor baremetal deployments")I would also add a validator to the actuator configuration model to make sure that at most one between baremetal and namespace is set.
| else: | ||
| # add to clean up | ||
| try: | ||
| cleaner_handle = ray.get_actor( | ||
| name=CLEANER_ACTOR, namespace=queue.ray_namespace() | ||
| ) | ||
| cleaner_handle.add_to_cleanup.remote(handle=self.env_manager) | ||
| except Exception as e: | ||
| logger.warning( | ||
| f"Failed to register custom actors for clean up {e}. Make sure you clean it up" | ||
| ) |
There was a problem hiding this comment.
Looks like the cleanup handle is set only for the baremetal env manager case?
| return env | ||
| return None | ||
|
|
||
| def _generate_identifier(self, configuration: dict[str, str]) -> str: |
There was a problem hiding this comment.
What do you need the identifier for? All envs are "indexed" depending on the configuration that is independent of baremetal/k8s
|
|
||
| return json.dumps(env_values) | ||
|
|
||
| def get_environment(self, model: str, configuration: dict[str, str]) -> Environment | None: |
There was a problem hiding this comment.
This should be the same regardless of the environment manager used. This is becuase changing this would change the logic of how the actuator works and we cannot guarantee the same behavior for different types of envs.
|
|
||
| return False | ||
|
|
||
| def _serve_vLLM(values: dict[str, str], |
There was a problem hiding this comment.
This should be handled in the get_environment() from the baremetal env manager.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def create_test_environment( |
There was a problem hiding this comment.
This function is never used in the code.
|
|
||
|
|
||
| @ray.remote | ||
| def run_serve_and_workload_experiment( |
There was a problem hiding this comment.
the logic should be the same as in run_resource_and_workload_experiment
|
Also, you need to install the pre-commit hooks and sign-off all your commits. Have a read at https://github.com/IBM/ado/blob/main/CONTRIBUTING.md |
Started with creating a bare-metal experiment. At the moment, it correctly checks if an endpoint is available and otherwise throws a VLLMBenchmarkError. However, when trying to serve a model, It is giving a lot of errors.