Skip to content

MarkdownBridge/markdownbridge-python

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 

Repository files navigation

markdownbridge

Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown.

Installation

pip install markdownbridge

Quick Start

from markdownbridge import MarkdownBridge

client = MarkdownBridge(api_key="ocrb_prd_xxx")

# One-liner: URL → Markdown
result = client.ocr("https://example.com/invoice.pdf")
print(result.markdown)

# One-liner: local file → Markdown
result = client.ocr("./receipt.png")
print(result.markdown)

Authentication

Pass your API key directly or set the MARKDOWNBRIDGE_API_KEY environment variable:

export MARKDOWNBRIDGE_API_KEY="ocrb_prd_xxx"
client = MarkdownBridge()  # reads from env

Client Options

client = MarkdownBridge(
    api_key="ocrb_prd_xxx",                        # or env MARKDOWNBRIDGE_API_KEY
    base_url="https://api.markdownbridge.com",      # default
    timeout=30.0,                                    # request timeout in seconds
    max_retries=3,                                   # retry 5xx errors with backoff
)

API Reference

client.ocr(source, **opts)

The convenience method — give it a URL or file path, get back a ProcessingResult.

result = client.ocr(
    "https://example.com/doc.pdf",
    language="en",
    output_format="markdown",
    enhance_quality=True,
    poll_interval=2.0,     # seconds between status checks
    poll_timeout=300.0,    # max wait time
)
print(result.markdown)
print(result.page_count)

client.process_url(file_url, **opts)

Submit a URL for processing without waiting for completion.

proc = client.process_url("https://example.com/doc.pdf")
print(proc.process_id)  # use with get_status() / wait_for_completion()

client.process_file(file_path, **opts)

Upload a local file and submit it for processing.

proc = client.process_file("./invoice.pdf")
print(proc.process_id)

client.upload_file(file_path)

Upload a file without processing it.

upload = client.upload_file("./photo.png")
print(upload.document_id)

client.get_status(process_id)

Check the current status of a processing job.

status = client.get_status("uuid-here")
print(status.status)   # queued | processing | completed | failed
print(status.progress)  # 0–100
print(status.stage)     # queued | download | ocr | llm_improvement | completed | failed

client.wait_for_completion(process_id, **opts)

Poll until the job completes or fails.

result = client.wait_for_completion(
    "uuid-here",
    poll_interval=2.0,
    poll_timeout=300.0,
    on_status_change=lambda s: print(f"Status: {s.status} ({s.stage})"),
)

client.list_results(**filters)

Fetch paginated results.

page = client.list_results(limit=20, offset=0, status="completed")
for item in page.data:
    print(item.file_name, item.status)
print(f"Total: {page.pagination.total}")

client.iter_results(**filters)

Auto-paginating iterator over all results.

for item in client.iter_results(status="completed"):
    print(item.file_name)

client.get_result(result_id)

Fetch a specific result by ID.

result = client.get_result("uuid-here")
print(result.result.markdown)

client.info()

Get API version and status.

info = client.info()
print(info.version, info.status)

Async Usage

Every method has an async equivalent via AsyncMarkdownBridge:

import asyncio
from markdownbridge import AsyncMarkdownBridge

async def main():
    async with AsyncMarkdownBridge(api_key="ocrb_prd_xxx") as client:
        result = await client.ocr("https://example.com/invoice.pdf")
        print(result.markdown)

        # Auto-paginating async iteration
        async for item in client.iter_results():
            print(item.file_name)

asyncio.run(main())

Error Handling

All exceptions inherit from MarkdownBridgeError and include status_code, error_code, and correlation_id:

from markdownbridge import MarkdownBridge, RateLimitError, AuthenticationError

client = MarkdownBridge(api_key="ocrb_prd_xxx")

try:
    result = client.ocr("https://example.com/doc.pdf")
except AuthenticationError:
    print("Invalid API key")
except RateLimitError as e:
    print(f"Rate limited — retry after {e.retry_after}s")
except MarkdownBridgeError as e:
    print(f"API error {e.status_code}: {e}")

Exception Hierarchy

Exception HTTP Status When
AuthenticationError 401 Invalid or missing API key
ValidationError 400/422 Invalid request parameters
NotFoundError 404 Resource not found
RateLimitError 429 Too many requests
InsufficientCreditsError 402 Account has no credits
ServerError 5xx Server-side failure
ProcessingError OCR job failed
FileUploadError Upload failed
TimeoutError Polling exceeded timeout

Data Types

All response types are frozen dataclasses:

  • ProcessResponse — process_id, status, file_id, stage
  • ProcessingStatus — process_id, status, progress, stage, result, error
  • ProcessingResult — text, markdown, json, page_count, processing_time
  • UploadResponse — file_key, public_url, document_id
  • ResultItem — id, process_id, file_name, status, result
  • ResultsPage — data, pagination
  • Pagination — total, limit, offset, has_more, next_offset
  • ApiInfo — version, status, endpoints

License

MIT

Releases

No releases published

Packages

 
 
 

Contributors

Languages