feat(llama2-70b): Add multinode to SUT_API.py for the offline scenario by mrzzy · Pull Request #2391 · mlcommons/inference

mrzzy · 2025-11-15T03:14:43Z

Motivation

The LLaMA-2-70B benchmark (Offline Scenario) currently does not have multinode support.

Multi-server API mode for SUT_API (--vllm) with even prompt distribution across multiple OpenAI-compatible endpoints.
Unit tests for API-related logic (query_batch and query_servers).
Documentation updates and example commands for multinode usage.
Additional dependencies specified in READMEs.

User facing Changes

Usage Example (Offline + Multinode API mode)

python3 -u main.py --scenario Offline \
    --vllm \
    --api-model-name ${MODEL_NAME} \
    --api-server http://node1:8000 \
    --api-server http://node2:8000 \
    --api-server http://node3:8000 \
    --model-path ${CHECKPOINT_PATH} \
    --user-conf user.conf \
    --total-sample-count 24576 \
    --dataset-path ${DATASET_PATH} \
    --output-log-dir offline-logs

Each --api-server argument registers an endpoint; SUT_API distributes prompts across them automatically.

This reverts commit 325526f6aed7b2c18d93c19e561d743d3246a3b2.

This reverts commit b10eceddd3b9ecb824cd6d7248df8ada786ef132.

…of opening 1 request per sample

…rompt load over servers" This reverts commit f6769548b45a5872dd20a5c0a329a005c05a8664.

github-actions · 2025-11-15T03:14:52Z

MLCommons CLA bot:
Thank you very much for your submission, we really appreciate it. Before we can accept your contribution, we ask that you sign the MLCommons CLA (Apache 2). Please use this [Google form] (https://forms.gle/Ew1KkBVpyeJDuRw67) to initiate authorization. If you are from an MLCommons member organization, we will request that you be added to the CLA. If you are not from a member organization, we will email you a CLA to sign. For any questions, please contact support@mlcommons.org.
0 out of 1 committers have signed the MLCommons CLA.
❌ @mrzzy
_{You can retrigger this bot by commenting recheck in this Pull Request}

mrzzy added 22 commits November 7, 2025 09:35

feat(inference): add multi api server to llama2-70b

5ef5ac4

fix(llama2-70b): param name mismatch api_server <> api_servers

6cd9d00

fix(llama2-70b): typo api_server param should have no 's'

33df1c5

revert(llama2-70b): restore original implementation of IssueQuery

064c650

feat(llama2-70b): log exception when issuing queries for debugging

f2a7672

fix(llama2-70b): batch_size param not effective

7198e13

fix(llama2-70b): increase timeouts to give more time to complete request

a588809

revert(llama2-70b): "fix(llama2-70b): batch_size param not effective"

107ba62

This reverts commit 325526f6aed7b2c18d93c19e561d743d3246a3b2.

fix(llama2-70b): event loop closed while issuing requests

e76e66d

revert: "fix(llama2-70b): event loop closed while issuing requests"

d70c520

This reverts commit b10eceddd3b9ecb824cd6d7248df8ada786ef132.

fix(llama2-70b): event loop closed while issuing requests

7869630

feat(llama2-70b): more generous 1 hr timeout for requests

ed1b21c

test(llama2-70b): fix typo in param

2d2a145

perf(llama2-70b): send entire batch of prompts to api server instead …

e5d4b06

…of opening 1 request per sample

refactor: pass httpx as param instead of instance param

737a1c6

perf(llama2-70b): copy performance settings from nvidia implementation

38a7356

perf(llama2-70b): shuffle to evenly distribute prompt load over servers

ef020d2

fix(llama2-70b): missing arguments passing in SUTServer

670b0d3

build(llama2-70b): add pip modules needed to run updated SUT_API

9fd61c1

build(llama2-70b): add dev dependencies required to run unit tests

a145d08

docs(llama2-70b): document how to use multinode SUT_API

9ad16a2

revert(llama2-70b): "perf(llama2-70b): shuffle to evenly distribute p…

1ccc3d0

…rompt load over servers" This reverts commit f6769548b45a5872dd20a5c0a329a005c05a8664.

mrzzy marked this pull request as ready for review November 15, 2025 03:21

mrzzy requested a review from a team as a code owner November 15, 2025 03:21

mrzzy added 2 commits November 15, 2025 03:23

Merge commit '8d1c7dfb890839b52568c8f3483887f023231243'

7c39ac5

fix(llama2-70b): accelerate version in requirements.txt too old

22d6b5b

anandhu-eng added the SCC25 label Nov 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llama2-70b): Add multinode to SUT_API.py for the offline scenario#2391

feat(llama2-70b): Add multinode to SUT_API.py for the offline scenario#2391
mrzzy wants to merge 24 commits intomlcommons:masterfrom
ntuhpc:feat/multinode-llama2-70b-sut-api

mrzzy commented Nov 15, 2025

Uh oh!

github-actions bot commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mrzzy commented Nov 15, 2025

Motivation

Contents

User facing Changes

Uh oh!

github-actions bot commented Nov 15, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants