deeppavlov · voorhs · May 19, 2026 · May 19, 2026 · May 19, 2026 · May 19, 2026
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
@@ -72,3 +72,65 @@ Build the HTML version and host it locally:
 ```bash
 make serve-docs
 ```
+
+## Preparing documentation for a release
+
+Use this checklist when cutting a new package release (for example `0.3.0`). Documentation is published to [deeppavlov.github.io/AutoIntent](https://deeppavlov.github.io/AutoIntent/) via GitHub Pages; a **published GitHub Release** triggers the multi-version build and deploy.
+
+### Before opening a PR
+
+1. **Align versions** across `pyproject.toml`, `docs/source/conf.py` (`release`), and `CHANGELOG.md`.
+2. **Update prose** under `docs/source/` (`.rst` files).
+3. **Update tutorials** in the repo-root `user_guides/` directory — not under `docs/source/user_guides/`, which is generated at build time and gitignored.
+4. **Run doctests** (same as CI on PRs and pushes to `dev`):
+   ```bash
+   make test-docs
+   ```
+5. **Build HTML locally** and fix any errors:
+   ```bash
+   make docs
+   ```
+   Optional preview:
+   ```bash
+   make serve-docs
+   ```
+   If the build is stale or autoapi / tutorial links look wrong:
+   ```bash
+   make clean-docs
+   make docs
+   ```
+6. **Match CI dependencies** when tutorials or API pages fail on missing imports:
+   ```bash
+   uv sync --group docs --extra catboost --extra peft --extra transformers --extra sentence-transformers --extra openai
+   ```
+   **Pandoc** is required for nbsphinx (CI installs it via `apt`).
+7. **Regenerate the optimizer JSON schema** if `OptimizerConfig` or related Pydantic models changed:
+   ```bash
+   make schema
+   ```
+
+### Do not commit
+
+- `docs/build/`
+- `docs/source/autoapi/`
+- `docs/source/user_guides/` (symlinks and notebook run artifacts)
+- `**/__pycache__/`
+
+### Version switcher (`versions.json`)
+
+`docs/_static/versions.json` is **auto-generated** on every Sphinx build from git tags matching `vX.Y.Z` (see `docs/source/docs_utils/versions_generator.py`). Do not hand-edit it for a release. Until the `vX.Y.Z` tag exists, local builds will still list the previous tag as stable — that is expected.
+
+### Release day
+
+1. Merge documentation and code changes into **`dev`** (CI runs `make test-docs` and `make docs`).
+2. Create a git tag **`vX.Y.Z`** on the release commit (must match `v` + semver, for example `v0.3.0`).
+3. **Publish a GitHub Release** for that tag. This triggers:
+   - PyPI publish (`.github/workflows/release.yaml`)
+   - Multi-version docs build and deploy (`.github/workflows/build-docs.yaml` → `make multi-version-docs` → GitHub Pages under `/versions/`).
+4. Verify the live site: the version switcher shows the new release as **stable**, and `https://deeppavlov.github.io/AutoIntent/versions/vX.Y.Z/` loads.
+
+To dry-run the multi-version build locally (requires full git history and tags):
+
+```bash
+make multi-version-docs
+```
diff --git a/docs/source/index.rst b/docs/source/index.rst
@@ -65,6 +65,9 @@ In-Depth Learning
 Reference
 .........
 
+:doc:`🌐 Inference servers <server>`
+   Deploy a trained pipeline behind HTTP (FastAPI) or MCP (FastMCP): installation extras, environment variables, and how to run each server.
+
 :doc:`🔧 API Reference <autoapi/autointent/index>`
    Complete technical documentation for all classes, methods, and functions. Essential reference for developers integrating AutoIntent into their applications.
 
@@ -80,4 +83,5 @@ Reference
    concepts
    user_guides
    learn/index
+   server
    autoapi/autointent/index
diff --git a/docs/source/server.rst b/docs/source/server.rst
@@ -0,0 +1,123 @@
+Inference servers
+=================
+
+AutoIntent can serve a **trained** pipeline behind two optional interfaces:
+
+* **HTTP (FastAPI)** — a small REST API for ``predict`` and health checks. Use this when you integrate with services, gateways, or clients that speak HTTP/JSON.
+* **MCP (FastMCP)** — a Model Context Protocol server with tools (``predict``, ``classes``, ``train_data``). Use this when an LLM host or IDE connects over MCP (stdio for local tools, or HTTP transport for remote access).
+
+Both servers load assets from a directory on disk (the same folder produced when you optimize and dump a pipeline). They are **not** a replacement for training: you must fit or load a pipeline and write it to that directory first.
+
+Installation
+------------
+
+Install the core package, then add the extra that matches the server you need:
+
+.. code-block:: bash
+
+   pip install "autointent[fastapi]"
+
+.. code-block:: bash
+
+   pip install "autointent[fastmcp]"
+
+The ``fastapi`` extra pulls in FastAPI, Uvicorn, and ``pydantic-settings``. The ``fastmcp`` extra pulls in FastMCP and ``pydantic-settings``.
+
+.. note::
+
+   If you use ``uv``, the project declares **incompatible optional extras** for ``codecarbon`` and ``fastmcp`` (see ``tool.uv.conflicts`` in ``pyproject.toml``). You cannot enable both in the same resolved environment; pick one or use separate virtual environments.
+
+Prerequisites
+-------------
+
+* A directory containing a **saved optimized pipeline** (for example the project directory after ``context.dump()`` from optimization, or another path where ``Pipeline.load`` succeeds).
+* For the **MCP** server only: a ``dataset.json`` file inside that same directory (the server loads training metadata and samples for the ``classes`` and ``train_data`` tools).
+
+Configuration (both servers)
+----------------------------
+
+Settings are defined with ``pydantic-settings`` and the prefix ``AUTOINTENT_``. Values can be set in the process environment or in a ``.env`` file in the current working directory.
+
+**Shared**
+
+``AUTOINTENT_PATH`` *(required)* — filesystem path to the pipeline directory (same meaning as the ``path`` field in code).
+
+**HTTP server**
+
+``AUTOINTENT_HOST`` — bind address (default ``127.0.0.1``).
+
+``AUTOINTENT_PORT`` — listen port (default ``8013``).
+
+**MCP server**
+
+``AUTOINTENT_TRANSPORT`` — ``stdio`` (default) or ``http``.
+
+``AUTOINTENT_HOST`` / ``AUTOINTENT_PORT`` — used when ``AUTOINTENT_TRANSPORT=http`` (defaults ``127.0.0.1`` and ``8012``).
+
+Example ``.env``:
+
+.. code-block:: text
+
+   AUTOINTENT_PATH=/path/to/my_autointent_project
+   # Optional HTTP defaults:
+   # AUTOINTENT_HOST=0.0.0.0
+   # AUTOINTENT_PORT=8013
+   # Optional MCP over HTTP:
+   # AUTOINTENT_TRANSPORT=http
+   # AUTOINTENT_PORT=8012
+
+Set these variables **before** starting the process (the HTTP app reads settings at import time).
+
+HTTP server (FastAPI)
+---------------------
+
+**Run with Uvicorn** (recommended; module path matches the FastAPI instance):
+
+.. code-block:: bash
+
+   uvicorn autointent.server.http:app --host 127.0.0.1 --port 8013
+
+Bind address and port can follow your deployment; ensure ``AUTOINTENT_PATH`` still points at the pipeline directory.
+
+**Run via the module entrypoint** (uses ``AUTOINTENT_HOST`` and ``AUTOINTENT_PORT`` from settings):
+
+.. code-block:: bash
+
+   python -c "from autointent.server.http import main; main()"
+
+Endpoints
+.........
+
+* ``GET /health`` — returns ``{"status": "healthy"}``.
+* ``POST /predict`` — JSON body and response shaped like the Pydantic models below.
+
+**Request** (``PredictRequest``): ``{"utterances": ["text one", "text two"]}``
+
+**Response** (``PredictResponse``): ``{"predictions": [...]}`` — one prediction per input utterance.
+
+Predictions follow the same convention as ``Pipeline.predict``:
+
+* Single-label: each item is an integer class id, or ``null`` for out-of-scope.
+* Multi-label: each item is a list of integer class ids, or ``null`` for out-of-scope.
+
+MCP server (FastMCP)
+--------------------
+
+**Stdio (default)** — typical for MCP clients that spawn a subprocess:
+
+.. code-block:: bash
+
+   python -c "from autointent.server.mcp import main; main()"
+
+With ``AUTOINTENT_TRANSPORT`` unset or ``stdio``, ``main()`` calls ``mcp.run()`` with stdio transport.
+
+**HTTP transport** — set ``AUTOINTENT_TRANSPORT=http`` (and optionally host/port). ``main()`` then runs with ``transport="http"`` so clients can connect to the configured TCP port (default ``8012``).
+
+Tools
+.....
+
+* ``predict`` — arguments: ``utterances: list[str]``. Returns ``predictions`` in the same sense as the HTTP API.
+* ``classes`` — pagination: ``page``, ``page_size``. Returns ``classes`` (list of ``Intent`` objects: ``id``, ``name``, ``tags``, regex fields, ``description``) and ``pagination_info``.
+* ``train_data`` — pagination and optional ``class_filter`` (list of class ids). Returns ``samples`` (``id``, ``text``, ``label``) and ``pagination_info``.
+
+See the :doc:`API reference <autoapi/autointent/index>` for full type details.