Skip to content

streaming aggregation and progress bar for _run_task_get_arffcontent#1720

Open
yashikavishwakarma wants to merge 2 commits intoopenml:mainfrom
yashikavishwakarma:doc/improve-list-runs-docstring
Open

streaming aggregation and progress bar for _run_task_get_arffcontent#1720
yashikavishwakarma wants to merge 2 commits intoopenml:mainfrom
yashikavishwakarma:doc/improve-list-runs-docstring

Conversation

@yashikavishwakarma
Copy link

The current implementation runs all folds in parallel and stores everything
in memory before doing anything with the results. This can be a problem for
large tasks.

This PR changes it to stream results as each fold finishes instead of waiting
for all of them. Also split the big function into smaller ones since it was
getting hard to follow, and added a tqdm progress bar so you can actually
see whats happening while it runs.

Changes:

  • use return_as = "generator" in joblib so results are aggregated on the fly
  • add tqdm progress bar (gracefully skipped if tqdm not installed)
  • extract classification, regression and evaluation aggregation into
    separate helper functions
  • removes the noqa complexity suppressions since the function is simpler now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant