Python SDK for the MarkdownBridge OCR API — convert documents and images to Markdown.
pip install markdownbridgefrom markdownbridge import MarkdownBridge
client = MarkdownBridge(api_key="ocrb_prd_xxx")
# One-liner: URL → Markdown
result = client.ocr("https://example.com/invoice.pdf")
print(result.markdown)
# One-liner: local file → Markdown
result = client.ocr("./receipt.png")
print(result.markdown)Pass your API key directly or set the MARKDOWNBRIDGE_API_KEY environment variable:
export MARKDOWNBRIDGE_API_KEY="ocrb_prd_xxx"client = MarkdownBridge() # reads from envclient = MarkdownBridge(
api_key="ocrb_prd_xxx", # or env MARKDOWNBRIDGE_API_KEY
base_url="https://api.markdownbridge.com", # default
timeout=30.0, # request timeout in seconds
max_retries=3, # retry 5xx errors with backoff
)The convenience method — give it a URL or file path, get back a ProcessingResult.
result = client.ocr(
"https://example.com/doc.pdf",
language="en",
output_format="markdown",
enhance_quality=True,
poll_interval=2.0, # seconds between status checks
poll_timeout=300.0, # max wait time
)
print(result.markdown)
print(result.page_count)Submit a URL for processing without waiting for completion.
proc = client.process_url("https://example.com/doc.pdf")
print(proc.process_id) # use with get_status() / wait_for_completion()Upload a local file and submit it for processing.
proc = client.process_file("./invoice.pdf")
print(proc.process_id)Upload a file without processing it.
upload = client.upload_file("./photo.png")
print(upload.document_id)Check the current status of a processing job.
status = client.get_status("uuid-here")
print(status.status) # queued | processing | completed | failed
print(status.progress) # 0–100
print(status.stage) # queued | download | ocr | llm_improvement | completed | failedPoll until the job completes or fails.
result = client.wait_for_completion(
"uuid-here",
poll_interval=2.0,
poll_timeout=300.0,
on_status_change=lambda s: print(f"Status: {s.status} ({s.stage})"),
)Fetch paginated results.
page = client.list_results(limit=20, offset=0, status="completed")
for item in page.data:
print(item.file_name, item.status)
print(f"Total: {page.pagination.total}")Auto-paginating iterator over all results.
for item in client.iter_results(status="completed"):
print(item.file_name)Fetch a specific result by ID.
result = client.get_result("uuid-here")
print(result.result.markdown)Get API version and status.
info = client.info()
print(info.version, info.status)Every method has an async equivalent via AsyncMarkdownBridge:
import asyncio
from markdownbridge import AsyncMarkdownBridge
async def main():
async with AsyncMarkdownBridge(api_key="ocrb_prd_xxx") as client:
result = await client.ocr("https://example.com/invoice.pdf")
print(result.markdown)
# Auto-paginating async iteration
async for item in client.iter_results():
print(item.file_name)
asyncio.run(main())All exceptions inherit from MarkdownBridgeError and include status_code, error_code, and correlation_id:
from markdownbridge import MarkdownBridge, RateLimitError, AuthenticationError
client = MarkdownBridge(api_key="ocrb_prd_xxx")
try:
result = client.ocr("https://example.com/doc.pdf")
except AuthenticationError:
print("Invalid API key")
except RateLimitError as e:
print(f"Rate limited — retry after {e.retry_after}s")
except MarkdownBridgeError as e:
print(f"API error {e.status_code}: {e}")| Exception | HTTP Status | When |
|---|---|---|
AuthenticationError |
401 | Invalid or missing API key |
ValidationError |
400/422 | Invalid request parameters |
NotFoundError |
404 | Resource not found |
RateLimitError |
429 | Too many requests |
InsufficientCreditsError |
402 | Account has no credits |
ServerError |
5xx | Server-side failure |
ProcessingError |
— | OCR job failed |
FileUploadError |
— | Upload failed |
TimeoutError |
— | Polling exceeded timeout |
All response types are frozen dataclasses:
ProcessResponse— process_id, status, file_id, stageProcessingStatus— process_id, status, progress, stage, result, errorProcessingResult— text, markdown, json, page_count, processing_timeUploadResponse— file_key, public_url, document_idResultItem— id, process_id, file_name, status, resultResultsPage— data, paginationPagination— total, limit, offset, has_more, next_offsetApiInfo— version, status, endpoints
MIT