Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 47 additions & 0 deletions docs/source/augmentation_tutorials/adversarial.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
.. _adversarial_human_like_augmentation:

Adversarial human-like augmentation
====================================

This tutorial covers :py:class:`autointent.generation.utterances.HumanUtteranceGenerator` together with :py:class:`autointent.generation.utterances.CriticHumanLike`. The generator proposes paraphrases of training utterances; the critic asks an LLM to label each candidate as ``human`` or ``generated``. Candidates classified as ``generated`` are rejected and refined in a loop until the critic accepts them (or retries are exhausted).

.. warning::

This path is **experimental** and may hurt data quality if the critic or base model mis-judges natural text. Use small ``n_final_per_class`` values first and inspect outputs.

How it fits together
--------------------

- **Generator** — :py:class:`autointent.generation.Generator` wraps your chat/structured-output API (OpenAI-compatible).
- **CriticHumanLike** — builds a JSON-schema prompt so the LLM returns ``reasoning`` and ``label`` (``human`` \| ``generated``); :py:meth:`~autointent.generation.utterances.CriticHumanLike.is_human` returns whether the utterance passed.
- **HumanUtteranceGenerator** — orchestrates rewrite attempts per intent; :py:meth:`~autointent.generation.utterances.HumanUtteranceGenerator.augment` can append accepted samples back into a chosen split (default: train).

Installation
------------

Install the OpenAI-backed generator extra (the ``Generator`` wrapper loads the OpenAI client):

.. code-block:: bash

pip install "autointent[openai]"

Set ``OPENAI_API_KEY`` (and optional base URL) as required by your deployment. No separate DSPy extra is needed for this augmentation path.

Minimal sketch
--------------

.. code-block:: python

from autointent import Dataset
from autointent.generation import Generator
from autointent.generation.utterances import CriticHumanLike, HumanUtteranceGenerator

dataset = Dataset.from_dict({...}) # your train split, with intent names if you use them in prompts

llm = Generator(model_name="gpt-4o-mini")
critic = CriticHumanLike(generator=llm)
augmenter = HumanUtteranceGenerator(generator=llm, critic=critic, async_mode=False)

new_samples = augmenter.augment(dataset, split_name="train", n_final_per_class=3)

See the API reference for full argument lists (:py:class:`~autointent.generation.utterances.HumanUtteranceGenerator`, :py:class:`~autointent.generation.utterances.CriticHumanLike`).
1 change: 1 addition & 0 deletions docs/source/augmentation_tutorials/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,5 @@ Data augmentation tutorials

balancer
dspy_augmentation
adversarial
intent_description
3 changes: 3 additions & 0 deletions docs/source/concepts.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,9 @@ A critical capability for production text classification systems, especially in
**🔗 Integration with Multi-Label**
OOS detection works seamlessly with multi-label scenarios, enabling detection of completely unknown inputs vs. partial matches to known classes.

**🧭 Split handling**
When splits contain OOS samples (``label is None``), the data handler keeps scoring stages on in-domain rows only: in hold-out mode it can duplicate affected splits into ``{split}_0`` (OOS removed for scoring) and ``{split}_1`` (full data for decision) when ``separation_ratio`` is not configured, and cross-validation similarly drops OOS from training folds used while scoring. Before fitting, you can validate whether your data supports splitting with :py:func:`autointent.context.data_handler.check_split_readiness`.

.. _concepts-presets:

Optimization Presets
Expand Down
4 changes: 3 additions & 1 deletion user_guides/basic_usage/03_automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@

# %% [markdown]
"""
The same preset can also be loaded as a typed %mddoclink(class,,OptimizationConfig) via ``OptimizationConfig.from_preset("classic-light")`` and passed to %mddoclink(method,Pipeline,from_optimization_config) when you want a validated configuration object instead of editing the raw dict from ``load_preset``.

You can inspect the structure and default values of any preset:
"""

Expand All @@ -77,7 +79,7 @@

# %% [markdown]
"""
See tutorial %mddoclink(notebook,advanced.03_search_space_configuration) on how the search space is structured.
See tutorial %mddoclink(notebook,advanced.03_automl) on how the search space is structured.
"""

# %% [markdown]
Expand Down
Loading