High-performance sort/merge Python extension backed by a C engine. Uses an internal memory pool for fast sorting of fixed-length, variable-length, and CSV files, plus merge of sorted inputs.
- Sort — Order records by text or numeric keys, ascending or descending
- Merge — Combine multiple sorted files into one stream
- Install with pip from PyPI
hsortCLI andimport hsortPython API- Windows, Linux, macOS
- Unix-style
-/--options
In-memory sort — When enough memory is available for the dataset, sorting avoids temp files.
External sort — When memory is insufficient, data is split, sorted in chunks, and merged using temporary files.
If -W is omitted, the engine estimates memory from input size; if allocation fails, it falls back to external sort.
- Configurable memory budget (
-W) with automatic sizing when omitted - Fixed-length, variable-length (text/binary), and CSV
- Stdin/stdout; logs on stderr
- Multi-key sort; stable sort (
-S); unique output (-U) - Invalid records can be skipped or written to an error file (
-E) - ASCII vs numeric key modes; large files supported
- Cross-platform
pip install hsortTest PyPI (when applicable):
pip install --index-url https://test.pypi.org/simple/ hsortCheck install:
hsort --version
hsort --helpCSV (header row, sort column 1)
hsort -C -H -K1 -O output.csv input.csvFixed-length (64-byte records, key bytes 0–19)
hsort -L64 -K0,20 -O output.dat input.datVariable-length
hsort -K0,10 -O output.dat input.dathsort [options] [input files...]
Options are case-sensitive (-C ≠ -c). Short and long forms are equivalent (-C / --csv).
For the authoritative option list, run:
hsort --help| Short | Long | Description |
|---|---|---|
-h |
--help |
Show help |
-V |
--version |
Show version and license info |
-C |
--csv |
Input is CSV |
-H |
--header |
Treat first CSV row as header |
-S |
--stable |
Stable sort |
-U |
--unique |
Unique output (first record per key) |
-L BYTES |
--record-length BYTES |
Fixed-length records; length 1–640KB per file layout |
-K KEY_SPEC |
--key |
Sort key (repeatable). Fixed / variable-length: start[,len][n][a|d] — start: 0-based byte offset; len: optional key length; n: numeric sort (default ASCII); a: ascending (default), d: descending. CSV: col[n][a|d] — column number from 1 |
-A |
--all-asc |
Sort whole record ascending. With -K, -K wins; with -R, the later flag wins |
-R |
--all-desc |
Sort whole record descending. With -K, -K wins; with -A, the later flag wins |
-P CODE |
--newline CODE |
Newline for variable-length / CSV: \n, \r, \r\n (default \n) |
-D CHAR |
--delimiter CHAR |
CSV delimiter; use \t for tab (default comma) |
-W SIZE |
--memory SIZE |
Max memory, e.g. 64MB, 1024KB; if omitted, engine sizes from input; minimum 16MB enforced. Not the free-tier total input file size cap (see Free tier below) |
-T DIR |
--temp-dir DIR |
Temp directory for external sort (default: system temp); files named like hsort_<pid>.tmp, removed after sort |
-O FILE |
--output FILE |
Output path (default stdout); may match an input path |
-E FILE |
--error-file FILE |
Invalid records → this file; if omitted, bad records are skipped silently |
-M |
--merge |
Merge already-sorted files |
| (paths) | — | Input files: multiple allowed, must be last; if omitted, read stdin |
Notes
- Short and long options are equivalent (e.g.
-C/--csv). - Options are case-sensitive (
-C≠-c).
License-related CLI
hsort --license YOUR_KEY # activate
hsort --check-license # status① Comma delimiter, max memory 100MB, whole-record ascending, stdin → stdout
hsort -C -W100M < in.csv② LF newline, tab delimiter, multi-key (col1 asc, col3 desc), output + inputs
hsort -C -P'\n' -D'\t' -K1a -K3d -O out.csv in1.csv in2.csvNo -E: rows with fewer than 3 columns are not written to an error file.
③ Unique, whole-record descending, temp dir, output, error file, input
hsort -C -U -R -T /tmp -O out.csv -E err.csv in.csvBad records go to err.csv.
④ Stable sort; column 1 ASCII ascending, column 3 numeric descending
hsort -C -S -K1 -K3nd -O out.csv in.csv① Record length 64, max memory 100MB, whole-record ascending, stdin → stdout
hsort -L64 -W100M < in.dat② Record length 64, multi-key (asc + numeric desc), output + inputs
hsort -L64 -K0,5a -K3,8nd -O out.dat in1.dat in2.datNo -E: rows shorter than required keys are not written to an error file.
③ Record length 64, unique, whole-record descending, temp dir, output, error file, input
hsort -L64 -U -R -T /tmp -O out.dat -E err.dat in.dat④ Record length 64, stable, multi-key (desc + asc), output + input
hsort -L64 -S -K1,3d -K6,9a -O out.dat in.dat① CRLF newline, max memory 100MB, whole-record ascending, stdin → stdout
hsort -P'\r\n' -W100M < in.dat② Multi-key (asc + numeric desc), output + inputs
hsort -K5,8a -K0,3nd -O out.dat in1.dat in2.dat③ Unique, whole-record descending, temp dir, output, error file, input
hsort -U -R -T /tmp -O out.dat -E err.dat in.dat④ Stable, multi-key (asc + desc), output + input
hsort -S -K1,3 -K5,9d -O out.dat in.dat- If format is unspecified, input is treated as variable-length.
- Options are case-sensitive (e.g.
-C≠-c). - Short and long options mean the same (e.g.
-C/--csv). - Default newline for variable-length / CSV is
\n(LF). - Input file paths must appear last; other options can be in any order.
- After
pip install hsort, thehsortcommand is available. - In code:
import hsortand callhsort.hsort(args).
CLI messages default to English. Japanese is used when detected from, in order:
HSORT_LANG(explicit)- Windows: system UI language
- Unix/Linux:
locale.getdefaultlocale() LANG,LANGUAGE,LC_ALL,LC_MESSAGES
Windows PowerShell
$env:HSORT_LANG="en"
hsort --help
$env:HSORT_LANG="ja"
hsort --help
# Persist for user account:
[System.Environment]::SetEnvironmentVariable("HSORT_LANG", "en", "User")Linux / macOS (Bash)
export HSORT_LANG=en
hsort --help
export HSORT_LANG=ja
hsort --helpWindows CMD
set HSORT_LANG=en
hsort --help
set HSORT_LANG=ja
hsort --helpAccepted values: ja / japanese / jp (Japanese), en / english (English). On PowerShell, use $env:HSORT_LANG, not set.
$env:HSORT_DEBUG_LANG="1"
hsort --helpexport HSORT_DEBUG_LANG=1
hsort --helpDetails print to stderr.
HSORT can be used as a library.
import hsorthsort.hsort(args) takes the same argv-style list as the CLI.
hsort.hsort(args: List[str]) -> int- args: argument list (same as CLI)
- Return: exit code (
0= success)
Example 1 — Sort CSV by column 1
import hsort
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-O", "output.csv",
"input.csv",
])
if ret == 0:
print("Sort succeeded")
else:
print(f"Error, exit code: {ret}")Example 2 — CSV multi-key (col1 asc, col3 desc)
import hsort
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-K3d",
"-O", "output.csv",
"input.csv",
])Example 3 — CSV numeric sort on column 1
import hsort
ret = hsort.hsort([
"-C",
"-H",
"-K1n",
"-O", "output.csv",
"input.csv",
])Example 4 — Fixed-length (64-byte records)
import hsort
ret = hsort.hsort([
"-L64",
"-K0,20",
"-O", "output.dat",
"input.dat",
])Example 5 — Variable-length
import hsort
ret = hsort.hsort([
"-K0,10",
"-O", "output.dat",
"input.dat",
])Example 6 — Unique output
import hsort
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-U",
"-O", "output.csv",
"input.csv",
])Example 7 — Stable sort
import hsort
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-S",
"-O", "output.csv",
"input.csv",
])Example 8 — Error file
import hsort
import os
input_file = "input.csv"
output_file = "output.csv"
error_file = "errors.csv"
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-K2",
"-E", error_file,
"-O", output_file,
input_file,
])
if ret == 0:
if os.path.exists(output_file):
print(f"OK: {output_file}")
if os.path.exists(error_file) and os.path.getsize(error_file) > 0:
print(f"Some rows in: {error_file}")
else:
print(f"Failed, exit code: {ret}")Example 9 — Memory limit
import hsort
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-W64MB",
"-O", "output.csv",
"input.csv",
])Example 10 — Batch CSV files
import hsort
import os
from pathlib import Path
def sort_csv_files(input_dir, output_dir):
input_path = Path(input_dir)
output_path = Path(output_dir)
output_path.mkdir(exist_ok=True)
for csv_file in input_path.glob("*.csv"):
output_file = output_path / f"sorted_{csv_file.name}"
ret = hsort.hsort([
"-C",
"-H",
"-K1",
"-O", str(output_file),
str(csv_file),
])
if ret == 0:
print(f"OK: {csv_file.name}")
else:
print(f"Fail: {csv_file.name} (exit {ret})")
sort_csv_files("data/input", "data/output")hsort.hsort(args) — run a sort/merge; same rules as the CLI.
- args (
List[str]): argv-style list. - Returns (
int): exit code. - Raises
ImportError: extension not installedRuntimeError: free-tier total input size exceeded (same condition as CLI)
args must follow the same rules as the command line.
- Python 3.9+
- Windows, Linux, or macOS
- Suggest ≥ 16 MB available RAM (tune with
-W)
HSORT offers a free tier and paid licenses:
- Up to 100MB total input size
- No feature restrictions (size limit only)
- Unlimited input size
- Full performance
- Commercial use
👉 Activate license:
hsort --license YOUR_KEYWhen no valid license is activated, the combined size of regular input files passed on the command line or in hsort.hsort([...]) must not exceed 100 MiB (100 × 1024 × 1024 bytes). The check is shared by CLI and API.
- Activating a paid license removes this total input-size limit (subject to your license agreement).
Copyright (c) 2015–2026 株式会社GPO
This project is not open source. The software is proprietary; see the LICENSE file for full terms.
- Not open source — No general right to source, redistribution, or modification except as allowed by law or a written agreement.
- Free tier — May include limits (e.g. total input size as above). Does not grant full commercial rights.
- Paid license — Unlocks full features per your agreement with the publisher.
👉 Buy License (Instant Key Delivery):
https://github.com/xuhui-hou/hsort/blob/main/Payment.md
Licensing contact: soft@gpo-i.com
- Homepage: https://github.com/xuhui-hou/hsort
- Repository: https://github.com/xuhui-hou/hsort
- Issues: https://github.com/xuhui-hou/hsort/issues