Skip to content

Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395

Open
Copilot wants to merge 7 commits intomainfrom
copilot/add-tar-gz-support
Open

Add .tar.gz / .tgz archive support for extension, preset, and workflow installation#2395
Copilot wants to merge 7 commits intomainfrom
copilot/add-tar-gz-support

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Apr 28, 2026

The extension, preset, and workflow download pipelines only accepted ZIP archives, blocking use of npm registries and CI artifact stores that serve tarballs natively.

Core utilities (extensions.py)

  • _detect_archive_format(url, content_type="") — infers format from URL path extension (.zip, .tar.gz, .tgz) with Content-Type header fallback (application/gzip, application/x-gzip, application/x-tar+gzip)
  • _safe_extract_tarball(archive_path, dest_dir, error_class) — safe extraction with:
    • Pre-extraction validation of all members: rejects absolute paths, .. traversal, symlinks, hard links, and special files
    • Python 3.11: passes pre-validated safe_members list to extractall()
    • Python 3.12+: uses tarfile.data_filter for additional OS-level protection

Extensions & presets

  • install_from_zip() on both managers now detects archive format from the file extension and dispatches to ZIP or tarball extraction accordingly — existing callers are unaffected
  • download_extension() / download_pack() detect format from the download URL (or Content-Type fallback) and persist the archive with the correct extension (.zip or .tar.gz)

__init__.py call sites

Path Change
extension add --from Detects format from URL/Content-Type before saving
preset add --from Same
extension update Inline manifest peek handles both ZIP and tar.gz
workflow add (URL) Extracts workflow.yml from archive when URL points to one
workflow add (local) Accepts local .tar.gz/.tgz/.zip archive files
workflow add (catalog) Same archive detection for catalog-sourced URLs

A shared _extract_workflow_yml(archive_path, fmt) helper handles root-level and single-nested-directory layouts for both formats.

Tests

21 new tests across test_extensions.py and test_presets.py covering: format detection (URL + Content-Type), flat and nested tarball install, missing manifest errors, path traversal rejection, and symlink rejection.

Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:04
Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:06
Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:09
Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:12
Copilot AI requested review from Copilot and removed request for Copilot April 28, 2026 18:14
Copilot AI changed the title [WIP] Add support for .tar.gz and .tgz archives Add .tar.gz / .tgz archive support for extension, preset, and workflow installation Apr 28, 2026
Copilot AI requested a review from mnriem April 28, 2026 18:16
Comment thread src/specify_cli/__init__.py Fixed
@mnriem mnriem marked this pull request as ready for review May 5, 2026 22:58
@mnriem mnriem requested review from Copilot and removed request for Copilot May 5, 2026 22:58
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for installing extensions, presets, and workflows from gzipped tarballs (.tar.gz / .tgz) in addition to ZIPs, unblocking sources like npm registries and artifact stores that serve tarballs by default.

Changes:

  • Introduces archive format detection (zip vs tar.gz) and safe tarball extraction (path traversal + link/special-file rejection).
  • Updates extension/preset download + install pipelines to persist the correct archive extension and extract accordingly.
  • Extends workflow add to accept archives from URL/catalog/local paths by extracting workflow.yml from ZIP/tar.gz.
Show a summary per file
File Description
src/specify_cli/extensions.py Adds archive-format detection + safe tarball extraction; updates extension install/download to support tarballs.
src/specify_cli/presets.py Updates preset install/download to support tarballs via shared helpers from extensions.py.
src/specify_cli/__init__.py Extends CLI flows (preset add --from, extension add --from, extension update, workflow add) to handle tarball archives.
tests/test_extensions.py Adds tests for archive format detection + tarball install and safety checks for extensions.
tests/test_presets.py Adds tarball install and safety tests for presets.

Copilot's findings

Tip

Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comments suppressed due to low confidence (1)

src/specify_cli/init.py:3661

  • In extension add --from, if _detect_archive_format() still returns empty after the Content-Type fallback, the code defaults to saving as .zip and proceeds. This can break tarball installs when servers omit/override Content-Type (e.g., application/octet-stream) and yield a confusing zip parsing error. Consider failing fast with a clear message when the format is unknown (or sniff the first bytes to distinguish ZIP vs gzip).
                # Download archive to temp location; detect format from URL or Content-Type.
                download_dir = project_root / ".specify" / "extensions" / ".cache" / "downloads"
                download_dir.mkdir(parents=True, exist_ok=True)
                archive_fmt = _detect_archive_format(from_url)
                archive_path = None

                try:
                    with urllib.request.urlopen(from_url, timeout=60) as response:
                        if not archive_fmt:
                            content_type = response.headers.get("Content-Type", "")
                            archive_fmt = _detect_archive_format(from_url, content_type)
                        archive_data = response.read()

                    suffix = ".tar.gz" if archive_fmt == "tar.gz" else ".zip"
                    archive_path = download_dir / f"{extension}-url-download{suffix}"
                    archive_path.write_bytes(archive_data)

                    # Install from downloaded archive
                    manifest = manager.install_from_zip(archive_path, speckit_version, priority=priority)
                except urllib.error.URLError as e:
  • Files reviewed: 5/5 changed files
  • Comments generated: 4

Comment on lines +5101 to +5106
elif source_path.is_file() and (
source.endswith(".tar.gz") or source.endswith(".tgz") or source.endswith(".zip")
):
# Local archive file containing workflow.yml
from .extensions import _detect_archive_format
local_fmt = _detect_archive_format(source)
raise typer.Exit(1)

manifest = manager.install_from_zip(zip_path, speckit_version, priority)
suffix = ".tar.gz" if archive_fmt == "tar.gz" else ".zip"
Comment on lines +2153 to +2158
# Choose file extension based on detected format.
if archive_fmt == "tar.gz":
archive_filename = f"{extension_id}-{version}.tar.gz"
else:
archive_filename = f"{extension_id}-{version}.zip"

Comment on lines +2333 to +2338
# Choose file extension based on detected format.
if archive_fmt == "tar.gz":
archive_filename = f"{pack_id}-{version}.tar.gz"
else:
archive_filename = f"{pack_id}-{version}.zip"

…ensitive check

- Add explanatory comment to empty except KeyError block in _extract_workflow_yml
- Use case-insensitive extension matching for local archive detection in workflow add
- Reject unknown archive formats with clear error messages instead of silently
  defaulting to ZIP in preset add --from, extension add --from, download_extension(),
  and download_pack()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Enhancement] Support .tar.gz / .tgz archives for extension, preset, and workflow installation

3 participants