This service provides an API endpoint to upload PDF files, extract and chunk their text, and generate ONNX-accelerated embeddings for each chunk. It is built with FastAPI and leverages PyMuPDF, LangChain, and SentenceTransformers.
git clone <your-repo-url>
cd fastapi_qdrantWe recommend using Miniconda or [venv]:
conda create -n fastapi_qdrant python=3.10 -y
conda activate fastapi_qdrant
# OR
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\Activate.ps1cd backend
pip install -r requirements.txtNote: Ensure only
PyMuPDF(shows aspymupdfinpip freeze) is installed for PDF processing. Do not install the unrelatedfitzpackage.
From the backend directory, start the FastAPI server:
uvicorn app.main:app --reload- The API will be available at: http://127.0.0.1:8000
- Interactive docs: http://127.0.0.1:8000/docs
- Go to http://127.0.0.1:8000/docs
- Find the
/api/upload-pdfendpoint. - Click "Try it out".
- Upload a PDF file.
- Click "Execute".
- View the JSON response with chunk/embedding stats and timings.
curl -X POST "http://127.0.0.1:8000/api/upload-pdf" -H "accept: application/json" -H "Content-Type: multipart/form-data" -F "file=@sample.pdf"import requests
url = "http://127.0.0.1:8000/api/upload-pdf"
with open("sample.pdf", "rb") as f:
files = {"file": ("sample.pdf", f, "application/pdf")}
response = requests.post(url, files=files)
print(response.json())- ModuleNotFoundError: No module named 'fitz'
- Ensure
PyMuPDFis installed (pip install PyMuPDF). - Do not install the unrelated
fitzpackage from PyPI.
- Ensure
- 500 Internal Server Error
- Check the server logs for details. Common causes:
- Invalid or corrupted PDF file.
- Missing ONNX model file (
onnx/model_qint8_avx512.onnx). - Dependency issues (see
requirements.txt).
- Check the server logs for details. Common causes:
- ONNX Model Not Found
- Ensure the required ONNX model file is present in the correct path, or update the path in the code.