build: limit sdist contents to source and metadata files#1890
Merged
Conversation
The default hatchling sdist included the entire repo (notably `website/` with demo videos and versioned docs), producing a 24.7 MB sdist that pushed the project over PyPI's 10 GB cumulative storage limit and broke the latest beta release. Restrict the sdist to `src/`, license, readme, changelog, contributing, and pyproject.toml — final sdist is ~280 KB.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #1890 +/- ##
==========================================
+ Coverage 92.86% 92.88% +0.02%
==========================================
Files 167 167
Lines 11699 11699
==========================================
+ Hits 10864 10867 +3
+ Misses 835 832 -3
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Collaborator
|
Did you try actually installing the package? |
Collaborator
Author
% poe build
...% ls -lh dist/
total 640K
-rw-r--r--. 1 vdusek vdusek 359K 12. kvě 08.31 crawlee-1.6.4-py3-none-any.whl
-rw-r--r--. 1 vdusek vdusek 280K 12. kvě 08.31 crawlee-1.6.4.tar.gzSo it works. But I just double checked its content, and it also contained |
janbuchar
approved these changes
May 12, 2026
This was referenced May 12, 2026
vdusek
added a commit
that referenced
this pull request
May 12, 2026
Guards against shipping a wheel or sdist that builds but crashes when installed - the silent failure mode behind PR #1890 - by adding a verification script and wiring it into PR CI and the release workflows as a pre-publish gate. Checks artifact contents, fresh-venv install of both wheel and sdist, core imports, and `crawlee create` scaffolding.
vdusek
added a commit
to apify/apify-shared-python
that referenced
this pull request
May 12, 2026
## Summary Mirrors apify/crawlee-python#1890. The latest crawlee-python beta release failed when uploading the sdist: ``` 400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB. ``` `pyproject.toml` only configured the wheel target, so hatchling's default sdist bundled the entire repo. Each released sdist was much larger than the actual source. Combined with the fact that a beta release is published on every src-touching commit to master, the cumulative storage quota on PyPI hit the cap fast. This PR applies the same fix here: an explicit `[tool.hatch.build.targets.sdist]` that ships only `src/apify_shared` and standard metadata files (`CHANGELOG.md`, `CONTRIBUTING.md`, `LICENSE`, `README.md`, `pyproject.toml`). Verified locally: built sdist is ~17 KB and contains only those files. Tests are intentionally excluded — they need dev-only deps that aren't installable from a plain sdist anyway. ## Note PyPI's cap is cumulative across all uploaded files, so the eventual mitigation requires a [project size limit increase](https://docs.pypi.org/project-management/storage-limits#requesting-a-project-size-limit-increase) once a project hits its quota. This PR just keeps future releases from chewing through quota.
vdusek
added a commit
to apify/apify-sdk-python
that referenced
this pull request
May 12, 2026
## Summary Mirrors apify/crawlee-python#1890. The latest crawlee-python beta release failed when uploading the sdist: ``` 400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB. ``` `pyproject.toml` only configured the wheel target, so hatchling's default sdist bundled the entire repo. Each released sdist was much larger than the actual source. Combined with the fact that a beta release is published on every src-touching commit to master, the cumulative storage quota on PyPI hit the cap fast. This PR applies the same fix here: an explicit `[tool.hatch.build.targets.sdist]` that ships only `src/apify` and standard metadata files (`CHANGELOG.md`, `CONTRIBUTING.md`, `LICENSE`, `README.md`, `pyproject.toml`). Verified locally: built sdist is ~96 KB and contains only those files. Tests are intentionally excluded — they need dev-only deps that aren't installable from a plain sdist anyway. ## Note PyPI's cap is cumulative across all uploaded files, so the eventual mitigation requires a [project size limit increase](https://docs.pypi.org/project-management/storage-limits#requesting-a-project-size-limit-increase) once a project hits its quota. This PR just keeps future releases from chewing through quota.
vdusek
added a commit
to apify/apify-client-python
that referenced
this pull request
May 12, 2026
## Summary Mirrors apify/crawlee-python#1890. The latest crawlee-python beta release failed when uploading the sdist: ``` 400 Bad Request — Project size too large. Limit for project 'crawlee' total size is 10 GB. ``` `pyproject.toml` only configured the wheel target, so hatchling's default sdist bundled the entire repo. Each released sdist was much larger than the actual source. Combined with the fact that a beta release is published on every src-touching commit to master, the cumulative storage quota on PyPI hit the cap fast. This PR applies the same fix here: an explicit `[tool.hatch.build.targets.sdist]` that ships only `src/apify_client` and standard metadata files (`CHANGELOG.md`, `CONTRIBUTING.md`, `LICENSE`, `README.md`, `pyproject.toml`). Verified locally: built sdist is ~117 KB and contains only those files. Tests are intentionally excluded — they need dev-only deps that aren't installable from a plain sdist anyway. ## Note PyPI's cap is cumulative across all uploaded files, so the eventual mitigation requires a [project size limit increase](https://docs.pypi.org/project-management/storage-limits#requesting-a-project-size-limit-increase) once a project hits its quota. This PR just keeps future releases from chewing through quota.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
The latest beta release (run 25675751322) failed when uploading the sdist:
pyproject.tomlonly configured the wheel target, so hatchling's default sdist bundled the entire repo — includingwebsite/(42 MB of Docusaurus demo MP4s/GIFs and versioned docs). Each released sdist was ~24.7 MB instead of ~280 KB. Combined with the fact that a beta release is published on every src-touching commit to master, the cumulative storage quota on PyPI hit the 10 GB cap fast.This PR adds an explicit
[tool.hatch.build.targets.sdist]that ships onlysrc/crawleeand standard metadata files (LICENSE,README.md,CHANGELOG.md,CONTRIBUTING.md,pyproject.toml). Verified locally: sdist drops from 24.7 MB → ~280 KB (~88×).Tests are intentionally excluded — they need dev-only deps (playwright, fakeredis, proxy-py, apify-cli) that aren't installable from a plain sdist anyway.
Note
PyPI's cap is cumulative across all uploaded files, so this PR doesn't unblock the pending beta on its own — we still need to request a project size limit increase. It just stops future releases from chewing through quota.