FuzzyBunny

A high-performance, lightweight Python library for fuzzy string matching and ranking, implemented in C++ with Pybind11.

Features

Blazing Fast: C++ core for 2-5x speed improvement over pure Python alternatives.
Multiple Scorers: Support for Levenshtein, Jaccard, Token Sort, Token Set, QRatio, WRatio, and Partial Ratio.
Partial Matching: Find the best substring matches using mode="partial".
Hybrid Scoring: Combine multiple scorers with custom weights for complex matching tasks.
Pandas & NumPy Integration: Native support for Series and Arrays via a dedicated accessor.
Batch Processing: Parallelized matching for large datasets using OpenMP.
Unicode Support: Handles international characters and basic normalization.
Benchmarking Tools: Built-in utilities to measure and compare performance.
Thread Safe: Releases the GIL in C++ for optimal multi-threaded performance.
Type Safe: Includes PEP 561 type stubs for full IDE and MyPy support.

Installation

pip install fuzzybunny

Quick Start

import fuzzybunny

# Basic matching
score = fuzzybunny.levenshtein("kitten", "sitting")
print(f"Similarity: {score:.2f}")

# Ranking candidates
candidates = ["apple", "apricot", "banana", "cherry"]
results = fuzzybunny.rank("app", candidates, top_n=2)
# [('apple', 0.6), ('apricot', 0.42)]

Advanced Usage

Hybrid Scorer

Combine different algorithms using custom weights:

results = fuzzybunny.rank(
    "apple banana", 
    ["banana apple"], 
    scorer="hybrid", 
    weights={"levenshtein": 0.3, "token_sort": 0.7}
)

Partial Matching

Find the best substring match:

score = fuzzybunny.partial_ratio("apple", "apple pie") # 1.0

# Using rank with partial mode
results = fuzzybunny.rank("apple", ["apple pie", "banana"], mode="partial")
# [('apple pie', 1.0), ('banana', 0.18)]

Pandas Integration

Use the specialized fuzzy accessor:

import pandas as pd
import fuzzybunny

df = pd.DataFrame({"names": ["apple pie", "banana bread", "cherry tart"]})
results = df["names"].fuzzy.match("apple", mode="partial")

Benchmarking

Compare performance on your specific data:

perf = fuzzybunny.benchmark("query", candidates)
print(f"Levenshtein mean time: {perf['levenshtein']['mean']:.6f}s")

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 36 Commits
.github/workflows		.github/workflows
docs		docs
src		src
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
setup.py		setup.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FuzzyBunny

Features

Installation

Quick Start

Advanced Usage

Hybrid Scorer

Partial Matching

Pandas Integration

Benchmarking

License

About

Uh oh!

Releases 5

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FuzzyBunny

Features

Installation

Quick Start

Advanced Usage

Hybrid Scorer

Partial Matching

Pandas Integration

Benchmarking

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 5

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages