markdownbridge

Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown.

Installation

pip install markdownbridge

Quick Start

from markdownbridge import MarkdownBridge

client = MarkdownBridge(api_key="ocrb_prd_xxx")

# One-liner: URL → Markdown
result = client.ocr("https://example.com/invoice.pdf")
print(result.markdown)

# One-liner: local file → Markdown
result = client.ocr("./receipt.png")
print(result.markdown)

Authentication

Pass your API key directly or set the MARKDOWNBRIDGE_API_KEY environment variable:

export MARKDOWNBRIDGE_API_KEY="ocrb_prd_xxx"

client = MarkdownBridge()  # reads from env

Client Options

client = MarkdownBridge(
    api_key="ocrb_prd_xxx",                        # or env MARKDOWNBRIDGE_API_KEY
    base_url="https://api.markdownbridge.com",      # default
    timeout=30.0,                                    # request timeout in seconds
    max_retries=3,                                   # retry 5xx errors with backoff
)

API Reference

`client.ocr(source, **opts)`

The convenience method — give it a URL or file path, get back a ProcessingResult.

result = client.ocr(
    "https://example.com/doc.pdf",
    language="en",
    output_format="markdown",
    enhance_quality=True,
    poll_interval=2.0,     # seconds between status checks
    poll_timeout=300.0,    # max wait time
)
print(result.markdown)
print(result.page_count)

`client.process_url(file_url, **opts)`

Submit a URL for processing without waiting for completion.

proc = client.process_url("https://example.com/doc.pdf")
print(proc.process_id)  # use with get_status() / wait_for_completion()

`client.process_file(file_path, **opts)`

Upload a local file and submit it for processing.

proc = client.process_file("./invoice.pdf")
print(proc.process_id)

`client.upload_file(file_path)`

Upload a file without processing it.

upload = client.upload_file("./photo.png")
print(upload.document_id)

`client.get_status(process_id)`

Check the current status of a processing job.

status = client.get_status("uuid-here")
print(status.status)   # queued | processing | completed | failed
print(status.progress)  # 0–100
print(status.stage)     # queued | download | ocr | llm_improvement | completed | failed

`client.wait_for_completion(process_id, **opts)`

Poll until the job completes or fails.

result = client.wait_for_completion(
    "uuid-here",
    poll_interval=2.0,
    poll_timeout=300.0,
    on_status_change=lambda s: print(f"Status: {s.status} ({s.stage})"),
)

`client.list_results(**filters)`

Fetch paginated results.

page = client.list_results(limit=20, offset=0, status="completed")
for item in page.data:
    print(item.file_name, item.status)
print(f"Total: {page.pagination.total}")

`client.iter_results(**filters)`

Auto-paginating iterator over all results.

for item in client.iter_results(status="completed"):
    print(item.file_name)

`client.get_result(result_id)`

Fetch a specific result by ID.

result = client.get_result("uuid-here")
print(result.result.markdown)

`client.info()`

Get API version and status.

info = client.info()
print(info.version, info.status)

Async Usage

Every method has an async equivalent via AsyncMarkdownBridge:

import asyncio
from markdownbridge import AsyncMarkdownBridge

async def main():
    async with AsyncMarkdownBridge(api_key="ocrb_prd_xxx") as client:
        result = await client.ocr("https://example.com/invoice.pdf")
        print(result.markdown)

        # Auto-paginating async iteration
        async for item in client.iter_results():
            print(item.file_name)

asyncio.run(main())

Error Handling

All exceptions inherit from MarkdownBridgeError and include status_code, error_code, and correlation_id:

from markdownbridge import MarkdownBridge, RateLimitError, AuthenticationError

client = MarkdownBridge(api_key="ocrb_prd_xxx")

try:
    result = client.ocr("https://example.com/doc.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except MarkdownBridgeError as e:
    print(f"API error {e.status_code}: {e}")

Exception Hierarchy

Exception	HTTP Status	When
`AuthenticationError`	401	Invalid or missing API key
`ValidationError`	400/422	Invalid request parameters
`NotFoundError`	404	Resource not found
`RateLimitError`	429	Too many requests
`InsufficientCreditsError`	402	Account has no credits
`ServerError`	5xx	Server-side failure
`ProcessingError`	—	OCR job failed
`FileUploadError`	—	Upload failed
`TimeoutError`	—	Polling exceeded timeout

Data Types

All response types are frozen dataclasses:

ProcessResponse — process_id, status, file_id, stage
ProcessingStatus — process_id, status, progress, stage, result, error
ProcessingResult — text, markdown, json, page_count, processing_time
UploadResponse — file_key, public_url, document_id
ResultItem — id, process_id, file_name, status, result
ResultsPage — data, pagination
Pagination — total, limit, offset, has_more, next_offset
ApiInfo — version, status, endpoints

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src/markdownbridge		src/markdownbridge
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

markdownbridge

Installation

Quick Start

Authentication

Client Options

API Reference

`client.ocr(source, **opts)`

`client.process_url(file_url, **opts)`

`client.process_file(file_path, **opts)`

`client.upload_file(file_path)`

`client.get_status(process_id)`

`client.wait_for_completion(process_id, **opts)`

`client.list_results(**filters)`

`client.iter_results(**filters)`

`client.get_result(result_id)`

`client.info()`

Async Usage

Error Handling

Exception Hierarchy

Data Types

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

markdownbridge

Installation

Quick Start

Authentication

Client Options

API Reference

client.ocr(source, **opts)

client.process_url(file_url, **opts)

client.process_file(file_path, **opts)

client.upload_file(file_path)

client.get_status(process_id)

client.wait_for_completion(process_id, **opts)

client.list_results(**filters)

client.iter_results(**filters)

client.get_result(result_id)

client.info()

Async Usage

Error Handling

Exception Hierarchy

Data Types

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`client.ocr(source, **opts)`

`client.process_url(file_url, **opts)`

`client.process_file(file_path, **opts)`

`client.upload_file(file_path)`

`client.get_status(process_id)`

`client.wait_for_completion(process_id, **opts)`

`client.list_results(**filters)`

`client.iter_results(**filters)`

`client.get_result(result_id)`

`client.info()`

Packages