New reference benchmark development checklist

New benchmark task force leads should complete this checklist before the benchmark is finalized.
 
**Initial reference code: [roadmap](https://github.com/mlcommons/policies/blob/master/submission_rules.adoc#43-benchmark-roadmap-schedule)**
- [ ] [Review guidelines on how to create a good mlperf training reference](https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#requirements-for-an-mlperf-training-reference)
- [ ] Finalize dataset
- [ ] Finalize model architecture
- [ ] Finalize reference framework
- [ ] Finalize platform/hardware that the reference would be implemented in, add it to the [approved list](https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#general)
- [ ] Finalize reference precision and generate initial loss curves to understand training behavior. At this stage, decide the rough benchmarking region and figure out whether the benchmark should start training from randomly initialized weights or from a previously trained checkpoint. If training from random weights shows lots of instability in losses, it might be better to train for a a few hundred steps and generate a checkpoint so the benchmarking region is more stable and smooth. 
- [ ] Finalize batch sizes for RCPs and get good hyperparameters at these batch sizes (need at least 3 batch sizes). We typically choose one small batch size, one very large batch size and one in the middle to cover a decent range. Can ask task force members and Training WG members for batch size range suggestions so as to choose options that cover what submitters are targeting. 
- [ ] Finalize evaluation metric and dataset. Understand how much the initial dataset can be reduced, ideally we want the smallest possible dataset (both training and evaluation) that allows for a reasonable benchmark.
- [ ] Finalize which hyperparameters are unconstrained for submitters to modify 

**Finalize code**
- [ ] Upload processed dataset to MLC bucket by reaching out to support@mlcommons.org who will give you write access and download instructions
- [ ] Create PR with initial codebase to https://github.com/mlcommons/training. This should have instructions on how to run the code and the README needs to follow the [benchmark readme template](https://github.com/mlcommons/training/blob/master/benchmark_readme_template.md). Note that you need to follow these Contribution guidelines to ensure that your [github handle](https://github.com/mlcommons/training/blob/master/CONTRIBUTING.md) can contribute to MLC repositories.  

**Generate RCPs**
- [ ] [Choose target accuracy](https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#how-to-choose-the-target-accuracy)
- [ ] [Choose eval frequency](https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#how-to-choose-the-evaluation-frequency)
- [ ] [Choose number of submission runs](https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#how-to-choose-the-number-of-submission-runs-n-needed)
- [ ] [Generate RCPs](https://github.com/mlcommons/training_policies/blob/master/CONTRIBUTING.md#some-things-to-note-while-generating-reference-convergence-points-rcps)
- [ ] [Add new RCPs to the logging repo](https://github.com/mlcommons/logging/tree/master/mlperf_logging/rcp_checker)
- [ ] Commit RCP log files to the reference implementation folder in training/ 

**Add new benchmark details**
- [ ] Update logging repo to include compliance checks for the new benchmark (example [PR359](https://github.com/mlcommons/logging/pull/359))
- [ ] Update [training rules](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc) by adding the new benchmark in all relevant tables. Tables list - [benchmarks](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#3-benchmarks), [division](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#41-closed-division), [hyperparameters](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#91-hyperparameters), [quality](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#94-quality-measure), [results](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#12-benchmark-results), [rcp rules](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#131-rcp-rules-and-guidelines). Add any benchmark specific rules to the [appendix](https://github.com/mlcommons/training_policies/blob/master/training_rules.adoc#131-rcp-rules-and-guidelines)
- [ ] Update the [compatibility table](https://github.com/mlcommons/policies/blob/master/MLPerf_Compatibility_Table.adoc#training) with the new benchmark 
- [ ] Create a new benchmark presentation deck and present to the Training WG so everyone is aware of this benchmark and its technical details
- [ ] Create an initial blog write-up with technical details. Once its ready, MLC will bring in a technical writer who can polish it and public the blog before the first round when this benchmark is introduced. Sample blogs: [flux.1](https://mlcommons.org/2025/10/training-flux1/), [llama31_405b](https://mlcommons.org/2025/05/training-llama31405b/). 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New reference benchmark development checklist #810

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

New reference benchmark development checklist #810

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions