Skip to content

feat(serve): add SageMaker GenAI inference benchmarking and recommendation#5874

Open
ZealSV wants to merge 4 commits into
aws:masterfrom
ZealSV:feature/lumen-ai-inference-recommender
Open

feat(serve): add SageMaker GenAI inference benchmarking and recommendation#5874
ZealSV wants to merge 4 commits into
aws:masterfrom
ZealSV:feature/lumen-ai-inference-recommender

Conversation

@ZealSV
Copy link
Copy Markdown
Contributor

@ZealSV ZealSV commented May 19, 2026

Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer
over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and
AIWorkloadConfig resources.

ModelBuilder gains a new entry point and extends two existing verbs:

Benchmark a deployed endpoint

job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
result = BenchmarkResult.from_job(job)

Recommendation flow extends optimize() and deploy()

mb.optimize(workload=..., performance_target="throughput",
instance_types=["ml.g6.12xlarge"])
endpoint = mb.deploy(role=role) # top recommendation
endpoint = mb.deploy(role=role, recommendation_index=2) # alternative

print(result) and print(mb.recommendations[0]) render their data as
tables.

Public surface added under sagemaker.serve:

  • Workload -- typed factory; extras pass through **params, validated
    server-side.
  • BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the
    AIPerf output.tar.gz from S3.
  • Secret -- opt-in helper for tokens >512 chars (Secrets Manager).
  • BenchmarkJob, RecommendationJob -- re-exports without the AI prefix.
  • FeatureGatedError, WorkloadValidationError -- typed exceptions.

Pin-mode and workload-mode optimize() kwargs are mutually exclusive.
Recommendation deploy uses the ModelPackage path (auto-approves the
package the rec job publishes).

Includes 51 unit tests and 2 slow_test integ tests
(tests/integ/test_ai_inference_recommender_integration.py) verified
end-to-end against real AWS.

Rebased onto upstream to pick up #5860 (preserve falsy values in
sagemaker-core serialize), required so optimize_model=False reaches
the wire.

Issue #, if available:

Description of changes:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@ZealSV ZealSV changed the title feat(serve): add SageMaker GenAI inference benchmarking and recommend… feat(serve): add SageMaker GenAI inference benchmarking and recommendation May 19, 2026
@ZealSV ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from c0cfc77 to 747baeb Compare May 20, 2026 18:58
@ZealSV ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from 747baeb to bb8c26a Compare May 20, 2026 20:34
@ZealSV ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from bb8c26a to 1d9b769 Compare May 21, 2026 19:29
…ation

Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer
over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and
AIWorkloadConfig resources.

ModelBuilder gains a new entry point and extends two existing verbs:

  # Benchmark a deployed endpoint
  job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
  result = BenchmarkResult.from_job(job)

  # Recommendation flow extends optimize() and deploy()
  mb.optimize(workload=..., performance_target="throughput",
              instance_types=["ml.g6.12xlarge"])
  endpoint = mb.deploy(role=role)              # top recommendation
  endpoint = mb.deploy(role=role, recommendation_index=2)  # alternative

print(result) and print(mb.recommendations[0]) render their data as
tables.

Public surface added under sagemaker.serve:

* Workload -- typed factory; extras pass through **params, validated
  server-side.
* BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the
  AIPerf output.tar.gz from S3.
* Secret -- opt-in helper for tokens >512 chars (Secrets Manager).
* BenchmarkJob, RecommendationJob -- re-exports without the AI prefix.
* FeatureGatedError, WorkloadValidationError -- typed exceptions.

Pin-mode and workload-mode optimize() kwargs are mutually exclusive.
Recommendation deploy uses the ModelPackage path (auto-approves the
package the rec job publishes).

Includes 51 unit tests and 2 slow_test integ tests
(tests/integ/test_ai_inference_recommender_integration.py) verified
end-to-end against real AWS.

Rebased onto upstream to pick up aws#5860 (preserve falsy values in
sagemaker-core serialize), required so optimize_model=False reaches
the wire.
@ZealSV ZealSV force-pushed the feature/lumen-ai-inference-recommender branch from 1d9b769 to 31f40bd Compare May 21, 2026 19:32
@ZealSV ZealSV temporarily deployed to manual-approval May 21, 2026 19:34 — with GitHub Actions Inactive
…ation

Adds sagemaker.serve.ai_inference_recommender, a thin ergonomic layer
over sagemaker-core's AIBenchmarkJob, AIRecommendationJob, and
AIWorkloadConfig resources.

ModelBuilder gains a new entry point and extends two existing verbs:

  # Benchmark a deployed endpoint
  job = mb.start_benchmark(endpoint=ep, workload=Workload.synthetic(...))
  result = BenchmarkResult.from_job(job)

  # Recommendation flow extends optimize() and deploy()
  mb.optimize(workload=..., performance_target="throughput",
              instance_types=["ml.g6.12xlarge"])
  endpoint = mb.deploy(role=role)              # top recommendation
  endpoint = mb.deploy(role=role, recommendation_index=2)  # alternative

print(result) and print(mb.recommendations[0]) render their data as
tables.

Public surface added under sagemaker.serve:

* Workload -- typed factory; extras pass through **params, validated
  server-side.
* BenchmarkResult / BenchmarkMetrics / BenchmarkMetric -- parses the
  AIPerf output.tar.gz from S3.
* Secret -- opt-in helper for tokens >512 chars (Secrets Manager).
* BenchmarkJob, RecommendationJob -- re-exports without the AI prefix.
* FeatureGatedError, WorkloadValidationError -- typed exceptions.

Pin-mode and workload-mode optimize() kwargs are mutually exclusive.
Recommendation deploy uses the ModelPackage path (auto-approves the
package the rec job publishes).

Includes 51 unit tests and 2 slow_test integ tests
(tests/integ/test_ai_inference_recommender_integration.py) verified
end-to-end against real AWS.

Rebased onto upstream to pick up aws#5860 (preserve falsy values in
sagemaker-core serialize), required so optimize_model=False reaches
the wire.
@ZealSV ZealSV requested a deployment to manual-approval May 22, 2026 22:49 — with GitHub Actions Waiting
@ZealSV ZealSV deployed to manual-approval May 22, 2026 22:49 — with GitHub Actions Active
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant