From e3f314df5f1a561017726701d8a3ce6b94126044 Mon Sep 17 00:00:00 2001 From: voorhs Date: Tue, 19 May 2026 20:33:39 +0300 Subject: [PATCH 1/2] docs: extend quickstart, fix links, add adversarial augmentation page Co-authored-by: Cursor --- .../augmentation_tutorials/adversarial.rst | 47 +++++++++++++++++++ docs/source/augmentation_tutorials/index.rst | 1 + docs/source/concepts.rst | 3 ++ docs/source/quickstart.rst | 9 ++-- user_guides/basic_usage/03_automl.py | 4 +- 5 files changed, 60 insertions(+), 4 deletions(-) create mode 100644 docs/source/augmentation_tutorials/adversarial.rst diff --git a/docs/source/augmentation_tutorials/adversarial.rst b/docs/source/augmentation_tutorials/adversarial.rst new file mode 100644 index 000000000..006d76771 --- /dev/null +++ b/docs/source/augmentation_tutorials/adversarial.rst @@ -0,0 +1,47 @@ +.. _adversarial_human_like_augmentation: + +Adversarial human-like augmentation +==================================== + +This tutorial covers :py:class:`autointent.generation.utterances.HumanUtteranceGenerator` together with :py:class:`autointent.generation.utterances.CriticHumanLike`. The generator proposes paraphrases of training utterances; the critic asks an LLM to label each candidate as ``human`` or ``generated``. Candidates classified as ``generated`` are rejected and refined in a loop until the critic accepts them (or retries are exhausted). + +.. warning:: + + This path is **experimental** and may hurt data quality if the critic or base model mis-judges natural text. Use small ``n_final_per_class`` values first and inspect outputs. + +How it fits together +-------------------- + +- **Generator** — :py:class:`autointent.generation.Generator` wraps your chat/structured-output API (OpenAI-compatible). +- **CriticHumanLike** — builds a JSON-schema prompt so the LLM returns ``reasoning`` and ``label`` (``human`` \| ``generated``); :py:meth:`~autointent.generation.utterances.CriticHumanLike.is_human` returns whether the utterance passed. +- **HumanUtteranceGenerator** — orchestrates rewrite attempts per intent; :py:meth:`~autointent.generation.utterances.HumanUtteranceGenerator.augment` can append accepted samples back into a chosen split (default: train). + +Installation +------------ + +Install the OpenAI-backed generator extra (the ``Generator`` wrapper loads the OpenAI client): + +.. code-block:: bash + + pip install "autointent[openai]" + +Set ``OPENAI_API_KEY`` (and optional base URL) as required by your deployment. No separate DSPy extra is needed for this augmentation path. + +Minimal sketch +-------------- + +.. code-block:: python + + from autointent import Dataset + from autointent.generation import Generator + from autointent.generation.utterances import CriticHumanLike, HumanUtteranceGenerator + + dataset = Dataset.from_dict({...}) # your train split, with intent names if you use them in prompts + + llm = Generator(model_name="gpt-4o-mini") + critic = CriticHumanLike(generator=llm) + augmenter = HumanUtteranceGenerator(generator=llm, critic=critic, async_mode=False) + + new_samples = augmenter.augment(dataset, split_name="train", n_final_per_class=3) + +See the API reference for full argument lists (:py:class:`~autointent.generation.utterances.HumanUtteranceGenerator`, :py:class:`~autointent.generation.utterances.CriticHumanLike`). diff --git a/docs/source/augmentation_tutorials/index.rst b/docs/source/augmentation_tutorials/index.rst index 38280bda6..dcc3fea7c 100644 --- a/docs/source/augmentation_tutorials/index.rst +++ b/docs/source/augmentation_tutorials/index.rst @@ -8,4 +8,5 @@ Data augmentation tutorials balancer dspy_augmentation + adversarial intent_description diff --git a/docs/source/concepts.rst b/docs/source/concepts.rst index 2e4816a32..caa997343 100644 --- a/docs/source/concepts.rst +++ b/docs/source/concepts.rst @@ -85,6 +85,9 @@ A critical capability for production text classification systems, especially in **🔗 Integration with Multi-Label** OOS detection works seamlessly with multi-label scenarios, enabling detection of completely unknown inputs vs. partial matches to known classes. +**🧭 Split handling** + When splits contain OOS samples (``label is None``), the data handler keeps scoring stages on in-domain rows only: in hold-out mode it can duplicate affected splits into ``{split}_0`` (OOS removed for scoring) and ``{split}_1`` (full data for decision) when ``separation_ratio`` is not configured, and cross-validation similarly drops OOS from training folds used while scoring. Before fitting, you can validate whether your data supports splitting with :py:func:`autointent.context.data_handler.check_split_readiness`. + .. _concepts-presets: Optimization Presets diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index 79cfc0227..9247a8b1b 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -22,7 +22,7 @@ Installation Basic Installation .................. -AutoIntent is compatible with Python 3.10+. For core functionality: +AutoIntent supports Python ``>=3.10,<3.15``. For core functionality: .. code-block:: bash @@ -173,10 +173,12 @@ Available Presets ................. - ``classic-light``: Fast training with traditional ML methods +- ``classic-medium``: Medium-budget traditional ML search - ``classic-heavy``: Comprehensive search with traditional methods - ``nn-medium``: Classic neural network-based approaches (RNN, CNN) - ``nn-heavy``: Comprehensive neural network optimization - ``transformers-light``: Transformer models with limited search +- ``transformers-heavy``: Deeper transformer search (more compute) - ``transformers-no-hpo``: Transformer models without hyperparameter optimization - ``zero-shot-llm``: Zero-shot classification using OpenAI models - ``zero-shot-encoders``: Zero-shot classification using transformer models @@ -230,8 +232,9 @@ For more control, use individual components without AutoML: Available Modules ................. -- **Scoring**: :class:`autointent.modules.KNNScorer`, :class:`autointent.modules.BertScorer`, :class:`autointent.modules.SklearnScorer`, :class:`autointent.modules.CatBoostScorer` -- **Decision**: :class:`autointent.modules.ArgmaxDecision`, :class:`autointent.modules.TunableDecision`, :class:`autointent.modules.AdaptiveDecision` +- **Embedding**: :class:`autointent.modules.RetrievalAimedEmbedding`, :class:`autointent.modules.LogregAimedEmbedding` +- **Scoring**: :class:`autointent.modules.KNNScorer`, :class:`autointent.modules.RerankScorer`, :class:`autointent.modules.GCNScorer`, :class:`autointent.modules.MLKnnScorer`, :class:`autointent.modules.BertScorer`, :class:`autointent.modules.SklearnScorer`, :class:`autointent.modules.CatBoostScorer`, and description-based scorers such as :class:`autointent.modules.BiEncoderDescriptionScorer`, :class:`autointent.modules.CrossEncoderDescriptionScorer`, :class:`autointent.modules.LLMDescriptionScorer` +- **Decision**: :class:`autointent.modules.ArgmaxDecision`, :class:`autointent.modules.ThresholdDecision`, :class:`autointent.modules.JinoosDecision`, :class:`autointent.modules.TunableDecision`, :class:`autointent.modules.AdaptiveDecision` See more at API reference :doc:`Modules `. diff --git a/user_guides/basic_usage/03_automl.py b/user_guides/basic_usage/03_automl.py index 909a5d4d2..29a3f5acc 100644 --- a/user_guides/basic_usage/03_automl.py +++ b/user_guides/basic_usage/03_automl.py @@ -52,6 +52,8 @@ # %% [markdown] """ +The same preset can also be loaded as a typed %mddoclink(class,,OptimizationConfig) via ``OptimizationConfig.from_preset("classic-light")`` and passed to %mddoclink(method,Pipeline,from_optimization_config) when you want a validated configuration object instead of editing the raw dict from ``load_preset``. + You can inspect the structure and default values of any preset: """ @@ -77,7 +79,7 @@ # %% [markdown] """ -See tutorial %mddoclink(notebook,advanced.03_search_space_configuration) on how the search space is structured. +See tutorial %mddoclink(notebook,advanced.03_automl) on how the search space is structured. """ # %% [markdown] From 6dbfe05e819b8ddcbd8656d3bf38e48fcfaf55ae Mon Sep 17 00:00:00 2001 From: voorhs Date: Tue, 19 May 2026 23:33:20 +0300 Subject: [PATCH 2/2] revert quickstart page --- docs/source/quickstart.rst | 9 +++------ 1 file changed, 3 insertions(+), 6 deletions(-) diff --git a/docs/source/quickstart.rst b/docs/source/quickstart.rst index 9247a8b1b..79cfc0227 100644 --- a/docs/source/quickstart.rst +++ b/docs/source/quickstart.rst @@ -22,7 +22,7 @@ Installation Basic Installation .................. -AutoIntent supports Python ``>=3.10,<3.15``. For core functionality: +AutoIntent is compatible with Python 3.10+. For core functionality: .. code-block:: bash @@ -173,12 +173,10 @@ Available Presets ................. - ``classic-light``: Fast training with traditional ML methods -- ``classic-medium``: Medium-budget traditional ML search - ``classic-heavy``: Comprehensive search with traditional methods - ``nn-medium``: Classic neural network-based approaches (RNN, CNN) - ``nn-heavy``: Comprehensive neural network optimization - ``transformers-light``: Transformer models with limited search -- ``transformers-heavy``: Deeper transformer search (more compute) - ``transformers-no-hpo``: Transformer models without hyperparameter optimization - ``zero-shot-llm``: Zero-shot classification using OpenAI models - ``zero-shot-encoders``: Zero-shot classification using transformer models @@ -232,9 +230,8 @@ For more control, use individual components without AutoML: Available Modules ................. -- **Embedding**: :class:`autointent.modules.RetrievalAimedEmbedding`, :class:`autointent.modules.LogregAimedEmbedding` -- **Scoring**: :class:`autointent.modules.KNNScorer`, :class:`autointent.modules.RerankScorer`, :class:`autointent.modules.GCNScorer`, :class:`autointent.modules.MLKnnScorer`, :class:`autointent.modules.BertScorer`, :class:`autointent.modules.SklearnScorer`, :class:`autointent.modules.CatBoostScorer`, and description-based scorers such as :class:`autointent.modules.BiEncoderDescriptionScorer`, :class:`autointent.modules.CrossEncoderDescriptionScorer`, :class:`autointent.modules.LLMDescriptionScorer` -- **Decision**: :class:`autointent.modules.ArgmaxDecision`, :class:`autointent.modules.ThresholdDecision`, :class:`autointent.modules.JinoosDecision`, :class:`autointent.modules.TunableDecision`, :class:`autointent.modules.AdaptiveDecision` +- **Scoring**: :class:`autointent.modules.KNNScorer`, :class:`autointent.modules.BertScorer`, :class:`autointent.modules.SklearnScorer`, :class:`autointent.modules.CatBoostScorer` +- **Decision**: :class:`autointent.modules.ArgmaxDecision`, :class:`autointent.modules.TunableDecision`, :class:`autointent.modules.AdaptiveDecision` See more at API reference :doc:`Modules `.