You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Finalize platform/hardware that the reference would be implemented in, add it to the approved list
Finalize reference precision and generate initial loss curves to understand training behavior. At this stage, decide the rough benchmarking region and figure out whether the benchmark should start training from randomly initialized weights or from a previously trained checkpoint. If training from random weights shows lots of instability in losses, it might be better to train for a a few hundred steps and generate a checkpoint so the benchmarking region is more stable and smooth.
Finalize batch sizes for RCPs and get good hyperparameters at these batch sizes (need at least 3 batch sizes). We typically choose one small batch size, one very large batch size and one in the middle to cover a decent range. Can ask task force members and Training WG members for batch size range suggestions so as to choose options that cover what submitters are targeting.
Finalize evaluation metric and dataset. Understand how much the initial dataset can be reduced, ideally we want the smallest possible dataset (both training and evaluation) that allows for a reasonable benchmark.
Finalize which hyperparameters are unconstrained for submitters to modify
Finalize code
Upload processed dataset to MLC bucket by reaching out to support@mlcommons.org who will give you write access and download instructions
Create PR with initial codebase to https://github.com/mlcommons/training. This should have instructions on how to run the code and the README needs to follow the benchmark readme template. Note that you need to follow these Contribution guidelines to ensure that your github handle can contribute to MLC repositories.
Create a new benchmark presentation deck and present to the Training WG so everyone is aware of this benchmark and its technical details
Create an initial blog write-up with technical details. Once its ready, MLC will bring in a technical writer who can polish it and public the blog before the first round when this benchmark is introduced. Sample blogs: flux.1, llama31_405b.
New benchmark task force leads should complete this checklist before the benchmark is finalized.
Initial reference code: roadmap
Finalize code
Generate RCPs
Add new benchmark details