Skip to content

build: limit sdist contents to source and metadata files#1890

Merged
vdusek merged 2 commits into
masterfrom
build/limit-sdist-contents
May 12, 2026
Merged

build: limit sdist contents to source and metadata files#1890
vdusek merged 2 commits into
masterfrom
build/limit-sdist-contents

Conversation

@vdusek
Copy link
Copy Markdown
Collaborator

@vdusek vdusek commented May 11, 2026

Summary

The latest beta release (run 25675751322) failed when uploading the sdist:

400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB.

pyproject.toml only configured the wheel target, so hatchling's default sdist bundled the entire repo — including website/ (42 MB of Docusaurus demo MP4s/GIFs and versioned docs). Each released sdist was ~24.7 MB instead of ~280 KB. Combined with the fact that a beta release is published on every src-touching commit to master, the cumulative storage quota on PyPI hit the 10 GB cap fast.

This PR adds an explicit [tool.hatch.build.targets.sdist] that ships only src/crawlee and standard metadata files (LICENSE, README.md, CHANGELOG.md, CONTRIBUTING.md, pyproject.toml). Verified locally: sdist drops from 24.7 MB → ~280 KB (~88×).

Tests are intentionally excluded — they need dev-only deps (playwright, fakeredis, proxy-py, apify-cli) that aren't installable from a plain sdist anyway.

Note

PyPI's cap is cumulative across all uploaded files, so this PR doesn't unblock the pending beta on its own — we still need to request a project size limit increase. It just stops future releases from chewing through quota.

The default hatchling sdist included the entire repo (notably `website/` with demo videos and versioned docs), producing a 24.7 MB sdist that pushed the project over PyPI's 10 GB cumulative storage limit and broke the latest beta release. Restrict the sdist to `src/`, license, readme, changelog, contributing, and pyproject.toml — final sdist is ~280 KB.
@vdusek vdusek added t-tooling Issues with this label are in the ownership of the tooling team. adhoc Ad-hoc unplanned task added during the sprint. labels May 11, 2026
@vdusek vdusek self-assigned this May 11, 2026
@vdusek vdusek requested a review from janbuchar May 11, 2026 18:52
@github-actions github-actions Bot added this to the 140th sprint - Tooling team milestone May 11, 2026
@codecov
Copy link
Copy Markdown

codecov Bot commented May 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 92.88%. Comparing base (da84db1) to head (ebacd91).
⚠️ Report is 3 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1890      +/-   ##
==========================================
+ Coverage   92.86%   92.88%   +0.02%     
==========================================
  Files         167      167              
  Lines       11699    11699              
==========================================
+ Hits        10864    10867       +3     
+ Misses        835      832       -3     
Flag Coverage Δ
unit 92.88% <ø> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@janbuchar
Copy link
Copy Markdown
Collaborator

Did you try actually installing the package?

@vdusek
Copy link
Copy Markdown
Collaborator Author

vdusek commented May 12, 2026

Did you try actually installing the package?

% poe build
...
% ls -lh dist/
total 640K
-rw-r--r--. 1 vdusek vdusek 359K 12. kvě 08.31 crawlee-1.6.4-py3-none-any.whl
-rw-r--r--. 1 vdusek vdusek 280K 12. kvě 08.31 crawlee-1.6.4.tar.gz

So it works.

But I just double checked its content, and it also contained crawlee-1.6.4/tests/unit/README.md and crawlee-1.6.4/docs/pyproject.toml, probably because of some auto-inclusion of "well-known" metadata files. Weird. So I switched from include directive to only-include 😄.

@vdusek vdusek merged commit 0e7402f into master May 12, 2026
32 checks passed
@vdusek vdusek deleted the build/limit-sdist-contents branch May 12, 2026 07:41
vdusek added a commit that referenced this pull request May 12, 2026
Guards against shipping a wheel or sdist that builds but crashes when
installed - the silent failure mode behind PR #1890 - by adding a
verification script and wiring it into PR CI and the release workflows
as a pre-publish gate. Checks artifact contents, fresh-venv install of
both wheel and sdist, core imports, and `crawlee create` scaffolding.
vdusek added a commit to apify/apify-shared-python that referenced this pull request May 12, 2026
## Summary

Mirrors apify/crawlee-python#1890.

The latest crawlee-python beta release failed when uploading the sdist:

```
400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB.
```

`pyproject.toml` only configured the wheel target, so hatchling's
default sdist bundled the entire repo. Each released sdist was much
larger than the actual source. Combined with the fact that a beta
release is published on every src-touching commit to master, the
cumulative storage quota on PyPI hit the cap fast.

This PR applies the same fix here: an explicit
`[tool.hatch.build.targets.sdist]` that ships only `src/apify_shared`
and standard metadata files (`CHANGELOG.md`, `CONTRIBUTING.md`,
`LICENSE`, `README.md`, `pyproject.toml`). Verified locally: built sdist
is ~17 KB and contains only those files.

Tests are intentionally excluded — they need dev-only deps that aren't
installable from a plain sdist anyway.

## Note

PyPI's cap is cumulative across all uploaded files, so the eventual
mitigation requires a [project size limit
increase](https://docs.pypi.org/project-management/storage-limits#requesting-a-project-size-limit-increase)
once a project hits its quota. This PR just keeps future releases from
chewing through quota.
vdusek added a commit to apify/apify-sdk-python that referenced this pull request May 12, 2026
## Summary

Mirrors apify/crawlee-python#1890.

The latest crawlee-python beta release failed when uploading the sdist:

```
400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB.
```

`pyproject.toml` only configured the wheel target, so hatchling's
default sdist bundled the entire repo. Each released sdist was much
larger than the actual source. Combined with the fact that a beta
release is published on every src-touching commit to master, the
cumulative storage quota on PyPI hit the cap fast.

This PR applies the same fix here: an explicit
`[tool.hatch.build.targets.sdist]` that ships only `src/apify` and
standard metadata files (`CHANGELOG.md`, `CONTRIBUTING.md`, `LICENSE`,
`README.md`, `pyproject.toml`). Verified locally: built sdist is ~96 KB
and contains only those files.

Tests are intentionally excluded — they need dev-only deps that aren't
installable from a plain sdist anyway.

## Note

PyPI's cap is cumulative across all uploaded files, so the eventual
mitigation requires a [project size limit
increase](https://docs.pypi.org/project-management/storage-limits#requesting-a-project-size-limit-increase)
once a project hits its quota. This PR just keeps future releases from
chewing through quota.
vdusek added a commit to apify/apify-client-python that referenced this pull request May 12, 2026
## Summary

Mirrors apify/crawlee-python#1890.

The latest crawlee-python beta release failed when uploading the sdist:

```
400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB.
```

`pyproject.toml` only configured the wheel target, so hatchling's
default sdist bundled the entire repo. Each released sdist was much
larger than the actual source. Combined with the fact that a beta
release is published on every src-touching commit to master, the
cumulative storage quota on PyPI hit the cap fast.

This PR applies the same fix here: an explicit
`[tool.hatch.build.targets.sdist]` that ships only `src/apify_client`
and standard metadata files (`CHANGELOG.md`, `CONTRIBUTING.md`,
`LICENSE`, `README.md`, `pyproject.toml`). Verified locally: built sdist
is ~117 KB and contains only those files.

Tests are intentionally excluded — they need dev-only deps that aren't
installable from a plain sdist anyway.

## Note

PyPI's cap is cumulative across all uploaded files, so the eventual
mitigation requires a [project size limit
increase](https://docs.pypi.org/project-management/storage-limits#requesting-a-project-size-limit-increase)
once a project hits its quota. This PR just keeps future releases from
chewing through quota.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

adhoc Ad-hoc unplanned task added during the sprint. t-tooling Issues with this label are in the ownership of the tooling team.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants