test: agent skills infrastructure and marker taxonomy audit (#727, #728)#742
test: agent skills infrastructure and marker taxonomy audit (#727, #728)#742planetf1 wants to merge 42 commits intogenerative-computing:mainfrom
Conversation
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
|
So this now tries to assess the vram needs by looking at models/hf I'll experiment with running the tests & see how accurate the agent is in general. Plus manually review the assessments |
87d30e4 to
d3135cb
Compare
|
The mypy failure ( |
|
Updated top post with current summary. Ready for review |
|
@ajbozarth you asked about being able to run a test bypassing the gpu check. Without any code changes this is possible by using pytest to run the test directly. I'm thinking this is sufficient? |
That's fair, but I think it'd still be worth having a flag to disable the part of contest that limits based on detected hardware. I wouldn't call it a blocker for this PR though, it could be a follow up issue. As for review I'll do a deep dive into this this afternoon and will re-run all the tests myself for a "second opinion" |
Ok - my thought is to just have a generic flag like Can you raise the followup?
|
|
There was a problem hiding this comment.
I've done an in-depth review including:
- an actual read-through of the skill markdown -> LGTM
- double check example mark updates -> LGTM
- review updates to tests:
- mark updates and other fixes -> LGTM
- a few minor typos in
importskipreasons -> inline suggested changes
- conftest updates -> LGTM
- helper functions in predicates -> LGTM
I'll apply the typo fixes myself, otherwise my other comments are non-blocking
I've also run all the tests and included the results below:
Test run summary
Local run (uv run pytest, Mac M1 Max 32GB, Python 3.12.8): 800 passed, 2 failed, 61 skipped, 19 deselected, 2 xfailed, 1 xpassed in 32m05s.
The 2 failures are @pytest.mark.qualitative tests (test_find_context_attributions, test_hallucination_detection) — non-deterministic content assertions
The 19 deselected are slow tests excluded by default.
Skips breakdown (61 total — all expected):
| Reason | Count |
|---|---|
| Insufficient VRAM | ~23 |
| Missing API credentials | ~16 |
| vLLM process not running | 7 |
test_tracing_backend.py — telemetry not initialised (see #754) |
6 |
test_manager.py — requires --disable-default-mellea-plugins |
2 |
test_reqlib_python.py sandbox tests |
3 |
| Other | ~4 |
Terminal output
$ uv run pytest
Built mellea @ file:///Users/ajbozarth/workspace/ai/mellea
Uninstalled 1 package in 1ms
Installed 3 packages in 3ms
=========================================================================================================== test session starts ============================================================================================================
platform darwin -- Python 3.12.8, pytest-9.0.0, pluggy-1.6.0
rootdir: /Users/ajbozarth/workspace/ai/mellea
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, recording-0.13.4, anyio-4.11.0, xdist-3.8.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-1.3.0, langsmith-0.6.6, Faker-37.12.0, cov-7.0.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 883 items / 19 deselected / 2 skipped / 864 selected
test/backends/test_adapters/test_adapter.py . [ 0%]
test/backends/test_bedrock.py s [ 0%]
test/backends/test_huggingface.py sssssssssssssssssss [ 2%]
test/backends/test_huggingface_tools.py s [ 2%]
test/backends/test_litellm_ollama.py ........ [ 3%]
test/backends/test_litellm_watsonx.py ssss [ 3%]
test/backends/test_mellea_tool.py ....... [ 4%]
test/backends/test_model_options.py ..... [ 5%]
test/backends/test_ollama.py .....X.... [ 6%]
test/backends/test_openai_ollama.py ............. [ 7%]
test/backends/test_openai_vllm.py sssssss [ 8%]
test/backends/test_tool_calls.py ... [ 9%]
test/backends/test_tool_decorator.py ................... [ 11%]
test/backends/test_tool_helpers.py ... [ 11%]
test/backends/test_tool_validation_integration.py ................................. [ 15%]
test/backends/test_vision_ollama.py .... [ 15%]
test/backends/test_vision_openai.py .... [ 16%]
test/backends/test_watsonx.py sssssssssss [ 17%]
test/cli/test_alora_train.py .... [ 18%]
test/cli/test_alora_train_integration.py ss [ 18%]
test/core/test_astream_exception_propagation.py ..... [ 18%]
test/core/test_astream_incremental.py ...... [ 19%]
test/core/test_astream_mock.py ...... [ 20%]
test/core/test_base.py .... [ 20%]
test/core/test_component_typing.py ........ [ 21%]
test/core/test_model_output_thunk.py .. [ 21%]
test/decompose/test_decompose.py .......... [ 23%]
test/formatters/granite/test_intrinsics_formatters.py ........................................................x.................. [ 31%]
test/formatters/test_template_formatter.py ................ [ 33%]
test/helpers/test_event_loop_helper.py .... [ 34%]
test/helpers/test_server_type.py ................ [ 35%]
test/plugins/test_all_payloads.py ................................................................................................... [ 47%]
test/plugins/test_blocking.py ................ [ 49%]
test/plugins/test_build_global_context.py ....... [ 50%]
test/plugins/test_decorators.py ......... [ 51%]
test/plugins/test_execution_modes.py ........................... [ 54%]
test/plugins/test_hook_call_sites.py .............................. [ 57%]
test/plugins/test_manager.py ss...... [ 58%]
test/plugins/test_mellea_plugin.py ....... [ 59%]
test/plugins/test_payloads.py .......... [ 60%]
test/plugins/test_pluginset.py ......... [ 61%]
test/plugins/test_policies.py ...... [ 62%]
test/plugins/test_policy_enforcement.py .......... [ 63%]
test/plugins/test_priority_ordering.py .............. [ 65%]
test/plugins/test_scoping.py ................................... [ 69%]
test/plugins/test_tool_hooks_redaction.py ....... [ 70%]
test/plugins/test_unregister.py ......... [ 71%]
test/stdlib/components/docs/test_document.py ... [ 71%]
test/stdlib/components/docs/test_richdocument.py .....s [ 72%]
test/stdlib/components/intrinsic/test_core.py ..F [ 72%]
test/stdlib/components/intrinsic/test_guardian.py ...... [ 73%]
test/stdlib/components/intrinsic/test_rag.py ....F.. [ 73%]
test/stdlib/components/test_chat.py . [ 74%]
test/stdlib/components/test_genslot.py ................... [ 76%]
test/stdlib/components/test_hello_world.py .. [ 76%]
test/stdlib/components/test_mify.py ........... [ 77%]
test/stdlib/components/test_transform.py .. [ 78%]
test/stdlib/requirements/test_reqlib_markdown.py ...... [ 78%]
test/stdlib/requirements/test_reqlib_python.py .............sss..... [ 81%]
test/stdlib/requirements/test_reqlib_tools.py . [ 81%]
test/stdlib/requirements/test_requirement.py ..... [ 81%]
test/stdlib/sampling/test_majority_voting.py .. [ 82%]
test/stdlib/sampling/test_sampling_ctx.py .. [ 82%]
test/stdlib/sampling/test_sofai_graph_coloring.py ......................... [ 85%]
test/stdlib/sampling/test_sofai_sampling.py ..................... [ 87%]
test/stdlib/sampling/test_think_budget_forcing.py .. [ 87%]
test/stdlib/test_base_context.py ..... [ 88%]
test/stdlib/test_chat_view.py .. [ 88%]
test/stdlib/test_functional.py .... [ 89%]
test/stdlib/test_session.py s....... [ 90%]
test/stdlib/test_spans.py .x [ 90%]
test/telemetry/test_logging.py ........ [ 91%]
test/telemetry/test_metrics.py ....................................... [ 95%]
test/telemetry/test_metrics_backend.py ....s.... [ 96%]
test/telemetry/test_metrics_plugins.py .... [ 97%]
test/telemetry/test_metrics_token.py .... [ 97%]
test/telemetry/test_tracing.py .............. [ 99%]
test/telemetry/test_tracing_backend.py ssssss [100%]
================================================================================================================= FAILURES =================================================================================================================
______________________________________________________________________________________________________ test_find_context_attributions ______________________________________________________________________________________________________
backend = <mellea.backends.huggingface.LocalHFBackend object at 0x14df00380>
@pytest.mark.qualitative
def test_find_context_attributions(backend):
"""Verify that the context-attribution intrinsic functions properly."""
context, assistant_response, documents = _read_rag_input_json(
"context-attribution.json"
)
expected = _read_rag_output_json("context-attribution.json")
> result = core.find_context_attributions(
assistant_response, documents, context, backend
)
test/stdlib/components/intrinsic/test_core.py:102:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/stdlib/components/intrinsic/core.py:90: in find_context_attributions
result_json = call_intrinsic(
mellea/stdlib/components/intrinsic/_util.py:39: in call_intrinsic
model_output_thunk, _ = mfuncs.act(
mellea/stdlib/functional.py:98: in act
out = _run_async_in_thread(
mellea/helpers/event_loop_helper.py:105: in _run_async_in_thread
return __event_loop_handler(co)
^^^^^^^^^^^^^^^^^^^^^^^^
mellea/helpers/event_loop_helper.py:77: in __call__
return asyncio.run_coroutine_threadsafe(co, self._event_loop).result()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
../../../.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
mellea/stdlib/functional.py:584: in aact
await result.avalue()
mellea/core/base.py:394: in avalue
await self.astream()
mellea/core/base.py:485: in astream
await self._process(self, chunk)
mellea/backends/huggingface.py:581: in granite_formatters_processing
res = result_processor.transform(chunk, rewritten) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/formatters/granite/base/io.py:182: in transform
return self._transform_impl(chat_completion_response, chat_completion)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/formatters/granite/intrinsics/output.py:1267: in _transform_impl
self._transform_choice(c, chat_completion)
mellea/formatters/granite/intrinsics/output.py:1308: in _transform_choice
parsed_json = rule.apply(
mellea/formatters/granite/intrinsics/output.py:166: in apply
result = self._apply_at_path(result, path, prepare_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
mellea/formatters/granite/intrinsics/output.py:251: in _apply_at_path
new_values = self._transform(original_value, path, prepare_output)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <mellea.formatters.granite.intrinsics.output.DecodeSentences object at 0x14dbcf050>, value = 765211, path = (0, 'r')
prepare_output = {'begins': [0, 137], 'document_ids': [None, None], 'ends': [137, 257], 'message_indices': [None, None], ...}
def _transform(self, value: Any, path: tuple, prepare_output: dict) -> dict:
# Unpack global values we set aside during the prepare phase
begins = prepare_output["begins"]
ends = prepare_output["ends"]
texts = prepare_output["texts"]
document_ids = prepare_output.get("document_ids")
message_indices = prepare_output.get("message_indices")
if not isinstance(value, int):
raise TypeError(
f"Expected integer sentence number at path {path}, but "
f"found non-integer value {value} of type {type(value)}"
)
sentence_num = value
result = {}
if self.begin_name is not None:
> result[self.begin_name] = begins[sentence_num]
^^^^^^^^^^^^^^^^^^^^
E IndexError: list index out of range
mellea/formatters/granite/intrinsics/output.py:714: IndexError
----------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------
=== 15:39:12-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
------------------------------------------------------------------------------------------------------------ Captured log call -------------------------------------------------------------------------------------------------------------
INFO fancy_logger:huggingface.py:475 passing in model options when generating with an adapter; some model options may be overwritten / ignored
--------------------------------------------------------------------------------------------------------- Captured stdout teardown ---------------------------------------------------------------------------------------------------------
=== 15:41:30-INFO ======
Cleaning up test_core backend GPU memory...
=== 15:41:30-INFO ======
Cleared LRU cache
=== 15:41:30-INFO ======
Removed accelerate dispatch hooks
---------------------------------------------------------------------------------------------------------- Captured log teardown -----------------------------------------------------------------------------------------------------------
INFO fancy_logger:conftest.py:342 Cleaning up test_core backend GPU memory...
INFO fancy_logger:conftest.py:365 Cleared LRU cache
INFO fancy_logger:conftest.py:402 Removed accelerate dispatch hooks
_______________________________________________________________________________________________________ test_hallucination_detection _______________________________________________________________________________________________________
backend = <mellea.backends.huggingface.LocalHFBackend object at 0x13eff8d10>
@pytest.mark.qualitative
def test_hallucination_detection(backend):
"""Verify that the hallucination detection intrinsic functions properly."""
context, assistant_response, docs = _read_input_json("hallucination_detection.json")
expected = _read_output_json("hallucination_detection.json")
# First call triggers adapter loading
result = rag.flag_hallucinated_content(assistant_response, docs, context, backend)
# pytest.approx() chokes on lists of records, so we do this complicated dance.
for r, e in zip(result, expected, strict=True): # type: ignore
> assert pytest.approx(r, abs=2e-2) == e
E AssertionError: assert approx({'resp...he sentence.}) == {'explanation...end': 31, ...}
E
E comparison failed. Mismatched elements: 1 / 5:
E Max absolute difference: 0.022802131238099044
E Max relative difference: 0.03036794087969006
E Index | Obtained | Expected
E faithfulness_likelihood | 0.7280598165124975 | 0.7508619477505966 ± 0.02
test/stdlib/components/intrinsic/test_rag.py:159: AssertionError
----------------------------------------------------------------------------------------------------------- Captured stdout call -----------------------------------------------------------------------------------------------------------
=== 15:42:34-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
------------------------------------------------------------------------------------------------------------ Captured log call -------------------------------------------------------------------------------------------------------------
INFO fancy_logger:huggingface.py:475 passing in model options when generating with an adapter; some model options may be overwritten / ignored
============================================================================================================= warnings summary =============================================================================================================
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_generate_from_raw
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/aiohttp/connector.py:993: DeprecationWarning: enable_cleanup_closed ignored because https://github.com/python/cpython/pull/118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=8, releaselevel='final', serial=0)
super().__init__(
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='The answ...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Subject:...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='yes', ro...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Subject:...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_gen_slot
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='{\n"resu...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py:1855: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
obj_dict = processed_chunk.dict()
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/backends/test_litellm_ollama.py::test_async_avalue
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Hello! H...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_async_parallel_requests
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Goodbye!...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_tool_calls.py::test_tool_called_from_context_action
<frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.
test/backends/test_vision_ollama.py::test_image_block_construction
/Users/ajbozarth/workspace/ai/mellea/test/backends/test_vision_ollama.py:38: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
random_image = Image.fromarray(random_pixel_data, "RGB")
test/backends/test_vision_openai.py::test_image_block_construction
/Users/ajbozarth/workspace/ai/mellea/test/backends/test_vision_openai.py:48: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
random_image = Image.fromarray(random_pixel_data, "RGB")
test/cli/test_alora_train.py::test_alora_config_creation
test/cli/test_alora_train.py::test_lora_config_creation
test/cli/test_alora_train.py::test_invocation_prompt_tokenization
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_config.py:257: DeprecationWarning: `max_seq_length` is deprecated and will be removed in version 0.20.0. Use `max_length` instead.
warnings.warn(
test/helpers/test_event_loop_helper.py::test_event_loop_handler_with_forking
/Users/ajbozarth/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=49161) is multi-threaded, use of fork() may lead to deadlocks in the child.
self.pid = os.fork()
test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
for ann in item.annotations
test/stdlib/components/intrinsic/test_core.py: 2 warnings
test/stdlib/components/intrinsic/test_guardian.py: 3 warnings
test/stdlib/components/intrinsic/test_rag.py: 5 warnings
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
warnings.warn(
test/stdlib/test_spans.py::test_lazy_spans
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/torch/nn/functional.py:5294: UserWarning: MPS: The constant padding of more than 3 dimensions is not currently supported natively. It uses View Ops default implementation to run. This may have performance implications. (Triggered internally at /Users/runner/work/pytorch/pytorch/pytorch/aten/src/ATen/native/mps/operations/Pad.mm:468.)
return torch._C._nn.pad(input, pad, mode, value)
test/telemetry/test_logging.py::test_otlp_logging_enabled_without_endpoint_warns
/Users/ajbozarth/workspace/ai/mellea/mellea/telemetry/logging.py:97: UserWarning: OTLP logs exporter is enabled (MELLEA_LOGS_OTLP=true) but no endpoint is configured. Set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT to export logs.
_logger_provider = _setup_logger_provider()
test/telemetry/test_metrics.py: 24 warnings
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/Users/ajbozarth/workspace/ai/mellea/mellea/telemetry/metrics.py:245: UserWarning: Metrics are enabled (MELLEA_METRICS_ENABLED=true) but no exporters are configured. Metrics will be collected but not exported. Set MELLEA_METRICS_PROMETHEUS=true, set MELLEA_METRICS_OTLP=true with an endpoint (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT), or set MELLEA_METRICS_CONSOLE=true to export metrics.
_meter_provider = _setup_meter_provider()
test/telemetry/test_metrics.py: 28 warnings
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/Users/ajbozarth/.local/share/uv/python/cpython-3.12.8-macos-aarch64-none/lib/python3.12/importlib/__init__.py:131: UserWarning: TokenMetricsPlugin already registered: Plugin token_metrics.generation_post_call already registered
_bootstrap._exec(spec, module)
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_tracing.py::test_session_with_tracing_disabled
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Of cours...ields={'refusal': None}), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_computed_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
or callable(getattr(delta, attr_name))
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
or callable(getattr(delta, attr_name))
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/Users/ajbozarth/workspace/ai/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="I'm an A...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================================= Skipped Examples =============================================================================================================
The following examples were skipped during collection:
• 102_example.py: Example marked with skip marker
• example_readme_generator.py: Example marked with skip marker
• make_training_data.py: Example marked with skip marker
• stembolts_intrinsic.py: Example marked with skip marker
• bedrock_litellm_example.py: Example marked with skip marker
• bedrock_openai_example.py: Example marked with skip marker
• qiskit_code_validation.py: Example marked with skip marker
• validation_helpers.py: Example marked with skip marker
• python_decompose_result.py: Example marked to always skip (skip_always marker)
• m_decomp_result.py: Example marked to always skip (skip_always marker)
• client.py: Example marked to always skip (skip_always marker)
• pii_serve.py: Example marked to always skip (skip_always marker)
• mcp_example.py: Example marked to always skip (skip_always marker)
• rich_document_advanced.py: Example marked with skip marker
• mellea_pdf.py: Example marked to always skip (skip_always marker)
• simple_rag_with_filter.py: Example marked to always skip (skip_always marker)
============================================================================================================== tests coverage ==============================================================================================================
_____________________________________________________________________________________________ coverage: platform darwin, python 3.12.8-final-0 _____________________________________________________________________________________________
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
========================================================================================================= short test summary info ==========================================================================================================
FAILED test/stdlib/components/intrinsic/test_core.py::test_find_context_attributions - IndexError: list index out of range
FAILED test/stdlib/components/intrinsic/test_rag.py::test_hallucination_detection - AssertionError: assert approx({'resp...he sentence.}) == {'explanation...end': 31, ...}
================================================================ 2 failed, 800 passed, 61 skipped, 19 deselected, 2 xfailed, 1 xpassed, 122 warnings in 1925.97s (0:32:05) =================================================================Local slow run (uv run pytest -m slow, Mac M1 Max 32GB): 18 passed, 3 skipped, 864 deselected in 3m32s. All expected.
Terminal output
$ uv run pytest -m slow
=========================================================================================================== test session starts ============================================================================================================
platform darwin -- Python 3.12.8, pytest-9.0.0, pluggy-1.6.0
rootdir: /Users/ajbozarth/workspace/ai/mellea
configfile: pyproject.toml
testpaths: test, docs
plugins: nbmake-1.5.5, recording-0.13.4, anyio-4.11.0, xdist-3.8.0, json-report-1.5.0, timeout-2.4.0, metadata-3.1.1, asyncio-1.3.0, langsmith-0.6.6, Faker-37.12.0, cov-7.0.0
timeout: 900.0s
timeout method: signal
timeout func_only: False
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
collected 883 items / 864 deselected / 2 skipped / 19 selected
test/package/test_dependency_isolation.py ..s................ [100%]
============================================================================================================= Skipped Examples =============================================================================================================
The following examples were skipped during collection:
• 102_example.py: Example marked with skip marker
• example_readme_generator.py: Example marked with skip marker
• make_training_data.py: Example marked with skip marker
• stembolts_intrinsic.py: Example marked with skip marker
• bedrock_litellm_example.py: Example marked with skip marker
• bedrock_openai_example.py: Example marked with skip marker
• qiskit_code_validation.py: Example marked with skip marker
• validation_helpers.py: Example marked with skip marker
• python_decompose_result.py: Example marked to always skip (skip_always marker)
• m_decomp_result.py: Example marked to always skip (skip_always marker)
• client.py: Example marked to always skip (skip_always marker)
• pii_serve.py: Example marked to always skip (skip_always marker)
• mcp_example.py: Example marked to always skip (skip_always marker)
• rich_document_advanced.py: Example marked with skip marker
• mellea_pdf.py: Example marked to always skip (skip_always marker)
• simple_rag_with_filter.py: Example marked to always skip (skip_always marker)
============================================================================================================== tests coverage ==============================================================================================================
_____________________________________________________________________________________________ coverage: platform darwin, python 3.12.8-final-0 _____________________________________________________________________________________________
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
======================================================================================== 18 passed, 3 skipped, 864 deselected in 212.41s (0:03:32) =========================================================================================Cluster run (./test/scripts/run_tests_with_ollama.sh, IBM LSF, NVIDIA GPU node, Python 3.12.12): 735 passed, 47 failed, 30 skipped, 58 errors, 19 deselected, 3 xfailed in 1:20:16.
The 58 errors and majority of the 47 failures are Ollama connectivity issues — the script detected an existing Ollama server but all three model warmups timed out, and tests then errored with "could not create OllamaModelBackend: ollama server not running at None" (base_url resolving to None). This is an environment issue, not related to this PR. Planning to re-run with a clean environment.
test_find_context_attributions qualitative flake also present, same as local run.
Terminal output
$ bsub -Is -n 1 -G grp_preemptable -q preemptable -gpu "num=1/task:mode=shared:mps=no:j_exclusive=yes:gvendor=nvidia" /bin/bash
num=1/task:mode=shared:mps=no:j_exclusive=yes:gvendor=nvidia
GPU mode=shared. This is allowed but deprecated
Job <741102> is submitted to queue <preemptable>.
<<Waiting for dispatch ...>>
<<Starting on p5-r06-n1>>
[ajbozarth@p5-r06-n1 mellea]$ bash ./test/scripts/run_tests_with_ollama.sh
[20:27:25] WARNING: CACHE_DIR not set. Ollama models will download to ~/.ollama (default)
[20:27:25] Using standalone log directory: logs/2026-03-27-20:27:25
[20:27:25] Ollama already running on 127.0.0.1:11434 — using existing server
[20:27:26] Model granite4:micro already pulled
[20:27:26] Model granite4:micro-h already pulled
[20:27:26] Pulling granite3.2-vision ...
success
[20:27:40] All models ready.
[20:27:40] Warming up models...
[20:27:40] Warming granite4:micro ...
[20:29:40] Warning: warmup for granite4:micro timed out (will load on first test)
[20:29:40] Warming granite4:micro-h ...
[20:31:40] Warning: warmup for granite4:micro-h timed out (will load on first test)
[20:31:40] Warming granite3.2-vision ...
[20:33:40] Warning: warmup for granite3.2-vision timed out (will load on first test)
[20:33:40] Warmup complete.
[20:33:40] Starting pytest...
[20:33:40] Log directory: logs/2026-03-27-20:27:25
[20:33:40] Pytest args: --group-by-backend
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0
rootdir: /proj/dmfexp/eiger/users/ajbozarth/mellea
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, json-report-1.5.0, recording-0.13.4, asyncio-1.3.0, timeout-2.4.0, metadata-3.1.1, Faker-37.12.0, xdist-3.8.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collected 892 items / 19 deselected / 873 selected
test/backends/test_huggingface.py ................... [ 2%]
test/backends/test_huggingface_tools.py . [ 2%]
test/cli/test_alora_train_integration.py .. [ 2%]
test/formatters/granite/test_intrinsics_formatters.py ....x.......... [ 4%]
test/stdlib/components/docs/test_richdocument.py s [ 4%]
test/stdlib/components/intrinsic/test_core.py ..F [ 4%]
test/stdlib/components/intrinsic/test_guardian.py ...... [ 5%]
test/stdlib/components/intrinsic/test_rag.py ....... [ 6%]
test/stdlib/test_spans.py .x [ 6%]
test/telemetry/test_metrics_backend.py .. [ 6%]
test/backends/test_openai_ollama.py FFFFFFFF..... [ 8%]
test/backends/test_openai_vllm.py ....... [ 8%]
test/backends/test_vision_openai.py ..FF [ 9%]
test/telemetry/test_metrics_backend.py FF [ 9%]
test/backends/test_vllm.py ........ [ 10%]
test/backends/test_vllm_tools.py . [ 10%]
test/backends/test_litellm_ollama.py .FFFFFFF [ 11%]
test/backends/test_mellea_tool.py EE [ 11%]
test/backends/test_ollama.py EEEEExEEEE [ 12%]
test/backends/test_tool_calls.py EEE [ 13%]
test/backends/test_vision_ollama.py ..EE [ 13%]
test/core/test_astream_incremental.py FFFF.F [ 14%]
test/core/test_component_typing.py EEE [ 14%]
test/core/test_model_output_thunk.py EE [ 15%]
test/stdlib/components/test_genslot.py EEEEEEEEEEEEEEE.EEE [ 17%]
test/stdlib/requirements/test_requirement.py FF... [ 17%]
test/stdlib/sampling/test_majority_voting.py EE [ 17%]
test/stdlib/sampling/test_sampling_ctx.py EE [ 18%]
test/stdlib/sampling/test_sofai_graph_coloring.py FFF [ 18%]
test/stdlib/sampling/test_sofai_sampling.py F [ 18%]
test/stdlib/sampling/test_think_budget_forcing.py EE [ 18%]
test/stdlib/test_chat_view.py EE [ 19%]
test/stdlib/test_functional.py EEEE [ 19%]
test/stdlib/test_session.py sEEEEEEE [ 20%]
test/telemetry/test_metrics_backend.py FFFF [ 20%]
test/telemetry/test_tracing.py FFFF [ 21%]
test/telemetry/test_tracing_backend.py ssssss [ 22%]
test/backends/test_bedrock.py s [ 22%]
test/backends/test_litellm_watsonx.py ssss [ 22%]
test/backends/test_watsonx.py sssssssssss [ 23%]
test/telemetry/test_metrics_backend.py s [ 24%]
test/backends/test_adapters/test_adapter.py . [ 24%]
test/backends/test_mellea_tool.py ..... [ 24%]
test/backends/test_model_options.py ..... [ 25%]
test/backends/test_tool_decorator.py ................... [ 27%]
test/backends/test_tool_helpers.py ... [ 27%]
test/backends/test_tool_validation_integration.py ...................... [ 30%]
........... [ 31%]
test/cli/test_alora_train.py .... [ 32%]
test/core/test_astream_exception_propagation.py ..... [ 32%]
test/core/test_astream_mock.py ...... [ 33%]
test/core/test_base.py .... [ 33%]
test/core/test_component_typing.py ..... [ 34%]
test/decompose/test_decompose.py .......... [ 35%]
test/formatters/granite/test_intrinsics_formatters.py .................. [ 37%]
..................................FFFFFFFF [ 42%]
test/formatters/test_template_formatter.py ................ [ 44%]
test/helpers/test_event_loop_helper.py .... [ 44%]
test/helpers/test_server_type.py ................ [ 46%]
test/plugins/test_all_payloads.py ...................................... [ 50%]
............................................................. [ 57%]
test/plugins/test_blocking.py ................ [ 59%]
test/plugins/test_build_global_context.py ....... [ 60%]
test/plugins/test_decorators.py ......... [ 61%]
test/plugins/test_execution_modes.py ........................... [ 64%]
test/plugins/test_hook_call_sites.py .............................. [ 68%]
test/plugins/test_manager.py ss...... [ 68%]
test/plugins/test_mellea_plugin.py ....... [ 69%]
test/plugins/test_payloads.py .......... [ 70%]
test/plugins/test_pluginset.py ......... [ 71%]
test/plugins/test_policies.py ...... [ 72%]
test/plugins/test_policy_enforcement.py .......... [ 73%]
test/plugins/test_priority_ordering.py .............. [ 75%]
test/plugins/test_scoping.py ................................... [ 79%]
test/plugins/test_tool_hooks_redaction.py ....... [ 80%]
test/plugins/test_unregister.py ......... [ 81%]
test/stdlib/components/docs/test_document.py ... [ 81%]
test/stdlib/components/docs/test_richdocument.py ..... [ 82%]
test/stdlib/components/test_chat.py . [ 82%]
test/stdlib/components/test_hello_world.py .. [ 82%]
test/stdlib/components/test_mify.py ........... [ 83%]
test/stdlib/components/test_transform.py .. [ 83%]
test/stdlib/requirements/test_reqlib_markdown.py ...... [ 84%]
test/stdlib/requirements/test_reqlib_python.py .............sss..... [ 87%]
test/stdlib/requirements/test_reqlib_tools.py . [ 87%]
test/stdlib/sampling/test_sofai_graph_coloring.py ...................... [ 89%]
[ 89%]
test/stdlib/sampling/test_sofai_sampling.py .................... [ 91%]
test/stdlib/test_base_context.py ..... [ 92%]
test/telemetry/test_logging.py ........ [ 93%]
test/telemetry/test_metrics.py ....................................... [ 97%]
test/telemetry/test_metrics_plugins.py .... [ 98%]
test/telemetry/test_metrics_token.py .... [ 98%]
test/telemetry/test_tracing.py .......... [100%]
==================================== ERRORS ====================================
... (removed 32,000 lines of error output)
=============================== warnings summary ===============================
test/backends/test_huggingface.py: 1 warning
test/stdlib/components/intrinsic/test_core.py: 2 warnings
test/stdlib/components/intrinsic/test_guardian.py: 3 warnings
test/stdlib/components/intrinsic/test_rag.py: 5 warnings
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/trl/trainer/utils.py:103: DeprecationWarning: This class is deprecated and will be removed in version 0.20.0. To train on completion only, please use the parameter `completion_only_loss` of `SFTConfig` instead.
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
test/cli/test_alora_train.py::test_alora_config_creation
test/cli/test_alora_train.py::test_lora_config_creation
test/cli/test_alora_train.py::test_invocation_prompt_tokenization
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_config.py:257: DeprecationWarning: `max_seq_length` is deprecated and will be removed in version 0.20.0. Use `max_length` instead.
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:678: DeprecationWarning: Failed to apply the formatting function due to the following error: string index out of range. This may be because the function is designed for batched input. Please update it to process one example at a time (i.e., accept and return a single example). For now, we will attempt to apply the function in batched mode, but note that batched formatting is deprecated and will be removed in version 0.21.
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/pin_memory.py:57: DeprecationWarning: The argument 'device' of Tensor.pin_memory() is deprecated. Please do not pass this argument. (Triggered internally at /pytorch/aten/src/ATen/native/Memory.cpp:46.)
return data.pin_memory(device)
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/pin_memory.py:57: DeprecationWarning: The argument 'device' of Tensor.is_pinned() is deprecated. Please do not pass this argument. (Triggered internally at /pytorch/aten/src/ATen/native/Memory.cpp:31.)
return data.pin_memory(device)
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics.py: 24 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/proj/dmfexp/eiger/users/ajbozarth/mellea/mellea/telemetry/metrics.py:245: UserWarning: Metrics are enabled (MELLEA_METRICS_ENABLED=true) but no exporters are configured. Metrics will be collected but not exported. Set MELLEA_METRICS_PROMETHEUS=true, set MELLEA_METRICS_OTLP=true with an endpoint (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT), or set MELLEA_METRICS_CONSOLE=true to export metrics.
_meter_provider = _setup_meter_provider()
test/telemetry/test_metrics_backend.py: 7 warnings
test/telemetry/test_metrics.py: 28 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py:131: UserWarning: TokenMetricsPlugin already registered: Plugin token_metrics.generation_post_call already registered
_bootstrap._exec(spec, module)
test/backends/test_vision_openai.py::test_image_block_construction
/proj/dmfexp/eiger/users/ajbozarth/mellea/test/backends/test_vision_openai.py:48: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
random_image = Image.fromarray(random_pixel_data, "RGB")
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
test/backends/test_litellm_ollama.py::test_gen_slot
test/backends/test_litellm_ollama.py::test_generate_from_raw
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/aiohttp/connector.py:993: DeprecationWarning: enable_cleanup_closed ignored because https://github.com/python/cpython/pull/118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=12, releaselevel='final', serial=0)
super().__init__(
test/backends/test_vision_ollama.py::test_image_block_construction
/proj/dmfexp/eiger/users/ajbozarth/mellea/test/backends/test_vision_ollama.py:38: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
random_image = Image.fromarray(random_pixel_data, "RGB")
test/helpers/test_event_loop_helper.py::test_event_loop_handler_with_forking
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=4140229) is multi-threaded, use of fork() may lead to deadlocks in the child.
self.pid = os.fork()
test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
<frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.
test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
for ann in item.annotations
test/telemetry/test_logging.py::test_otlp_logging_enabled_without_endpoint_warns
/proj/dmfexp/eiger/users/ajbozarth/mellea/mellea/telemetry/logging.py:97: UserWarning: OTLP logs exporter is enabled (MELLEA_LOGS_OTLP=true) but no endpoint is configured. Set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT to export logs.
_logger_provider = _setup_logger_provider()
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.12.12-final-0 _______________
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
=========================== short test summary info ============================
FAILED test/stdlib/components/intrinsic/test_core.py::test_find_context_attributions
FAILED test/backends/test_openai_ollama.py::test_instruct - openai.APIConnect...
FAILED test/backends/test_openai_ollama.py::test_multiturn - openai.APIConnec...
FAILED test/backends/test_openai_ollama.py::test_chat - openai.APIConnectionE...
FAILED test/backends/test_openai_ollama.py::test_chat_stream - openai.APIConn...
FAILED test/backends/test_openai_ollama.py::test_format - openai.APIConnectio...
FAILED test/backends/test_openai_ollama.py::test_generate_from_raw - openai.A...
FAILED test/backends/test_openai_ollama.py::test_async_parallel_requests - op...
FAILED test/backends/test_openai_ollama.py::test_async_avalue - openai.APICon...
FAILED test/backends/test_vision_openai.py::test_image_block_in_instruction
FAILED test/backends/test_vision_openai.py::test_image_block_in_chat - openai...
FAILED test/telemetry/test_metrics_backend.py::test_openai_token_metrics_integration[non-streaming]
FAILED test/telemetry/test_metrics_backend.py::test_openai_token_metrics_integration[streaming]
FAILED test/backends/test_litellm_ollama.py::test_litellm_ollama_chat - litel...
FAILED test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct - l...
FAILED test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
FAILED test/backends/test_litellm_ollama.py::test_gen_slot - litellm.exceptio...
FAILED test/backends/test_litellm_ollama.py::test_generate_from_raw - litellm...
FAILED test/backends/test_litellm_ollama.py::test_async_parallel_requests - l...
FAILED test/backends/test_litellm_ollama.py::test_async_avalue - litellm.exce...
FAILED test/core/test_astream_incremental.py::test_astream_returns_incremental_chunks
FAILED test/core/test_astream_incremental.py::test_astream_multiple_calls_accumulate_correctly
FAILED test/core/test_astream_incremental.py::test_astream_beginning_length_tracking
FAILED test/core/test_astream_incremental.py::test_astream_empty_beginning - ...
FAILED test/core/test_astream_incremental.py::test_non_streaming_astream - Ex...
FAILED test/stdlib/requirements/test_requirement.py::test_llmaj_validation_req_output_field
FAILED test/stdlib/requirements/test_requirement.py::test_llmaj_requirement_uses_requirement_template
FAILED test/stdlib/sampling/test_sofai_graph_coloring.py::TestSOFAIGraphColoringIntegration::test_graph_coloring_fresh_start
FAILED test/stdlib/sampling/test_sofai_graph_coloring.py::TestSOFAIGraphColoringIntegration::test_graph_coloring_continue_chat
FAILED test/stdlib/sampling/test_sofai_graph_coloring.py::TestSOFAIGraphColoringIntegration::test_graph_coloring_best_attempt
FAILED test/stdlib/sampling/test_sofai_sampling.py::TestSOFAIIntegration::test_sofai_with_ollama
FAILED test/telemetry/test_metrics_backend.py::test_ollama_token_metrics_integration[non-streaming]
FAILED test/telemetry/test_metrics_backend.py::test_ollama_token_metrics_integration[streaming]
FAILED test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
FAILED test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
FAILED test/telemetry/test_tracing.py::test_session_with_tracing_disabled - E...
FAILED test/telemetry/test_tracing.py::test_session_with_application_tracing
FAILED test/telemetry/test_tracing.py::test_session_with_backend_tracing - Ex...
FAILED test/telemetry/test_tracing.py::test_generative_function_with_tracing
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[answerability_simple]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[answerability_answerable]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[answerability_unanswerable]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[hallucination_detection]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[query_clarification]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[query_rewrite]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[context_relevance]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[citations]
ERROR test/backends/test_mellea_tool.py::test_from_callable_generation - Exce...
ERROR test/backends/test_mellea_tool.py::test_from_langchain_generation - Exc...
ERROR test/backends/test_ollama.py::test_simple_instruct - Exception: could n...
ERROR test/backends/test_ollama.py::test_instruct_with_requirement - Exceptio...
ERROR test/backends/test_ollama.py::test_chat - Exception: could not create O...
ERROR test/backends/test_ollama.py::test_format - Exception: could not create...
ERROR test/backends/test_ollama.py::test_generate_from_raw - Exception: could...
ERROR test/backends/test_ollama.py::test_async_parallel_requests - Exception:...
ERROR test/backends/test_ollama.py::test_async_avalue - Exception: could not ...
ERROR test/backends/test_ollama.py::test_multiple_asyncio_runs - Exception: c...
ERROR test/backends/test_ollama.py::test_client_cache - Exception: could not ...
ERROR test/backends/test_tool_calls.py::test_tool_called_from_context_action
ERROR test/backends/test_tool_calls.py::test_tool_called - Exception: could n...
ERROR test/backends/test_tool_calls.py::test_tool_not_called - Exception: cou...
ERROR test/backends/test_vision_ollama.py::test_image_block_in_instruction - ...
ERROR test/backends/test_vision_ollama.py::test_image_block_in_chat - Excepti...
ERROR test/core/test_component_typing.py::test_generating - Exception: could ...
ERROR test/core/test_component_typing.py::test_message_typing - Exception: co...
ERROR test/core/test_component_typing.py::test_generating_with_sampling - Exc...
ERROR test/core/test_model_output_thunk.py::test_model_output_thunk_copy - Ex...
ERROR test/core/test_model_output_thunk.py::test_model_output_thunk_deepcopy
ERROR test/stdlib/components/test_genslot.py::test_gen_slot_output - Exceptio...
ERROR test/stdlib/components/test_genslot.py::test_func - Exception: could no...
ERROR test/stdlib/components/test_genslot.py::test_sentiment_output - Excepti...
ERROR test/stdlib/components/test_genslot.py::test_gen_slot_logs - Exception:...
ERROR test/stdlib/components/test_genslot.py::test_gen_slot_with_context_and_backend
ERROR test/stdlib/components/test_genslot.py::test_async_gen_slot - Exception...
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[session] - ...
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[context and backend]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[backend without context]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[duplicate arg and kwarg]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[original func args as positional args]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[session and func as kwargs]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[all kwargs]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[interspersed kwargs]
ERROR test/stdlib/components/test_genslot.py::test_arg_extraction[missing required args]
ERROR test/stdlib/components/test_genslot.py::test_precondition_failure - Exc...
ERROR test/stdlib/components/test_genslot.py::test_requirement - Exception: c...
ERROR test/stdlib/components/test_genslot.py::test_with_no_args - Exception: ...
ERROR test/stdlib/sampling/test_majority_voting.py::test_majority_voting_for_math
ERROR test/stdlib/sampling/test_majority_voting.py::test_MBRDRougeL - Excepti...
ERROR test/stdlib/sampling/test_sampling_ctx.py::TestSamplingCtxCase::test_ctx_for_rejection_sampling
ERROR test/stdlib/sampling/test_sampling_ctx.py::TestSamplingCtxCase::test_ctx_for_multiturn
ERROR test/stdlib/sampling/test_think_budget_forcing.py::test_think_big - Exc...
ERROR test/stdlib/sampling/test_think_budget_forcing.py::test_think_little - ...
ERROR test/stdlib/test_chat_view.py::test_chat_view_linear_ctx - Exception: c...
ERROR test/stdlib/test_chat_view.py::test_chat_view_simple_ctx - Exception: c...
ERROR test/stdlib/test_functional.py::test_func_context - Exception: could no...
ERROR test/stdlib/test_functional.py::test_aact - Exception: could not create...
ERROR test/stdlib/test_functional.py::test_ainstruct - Exception: could not c...
ERROR test/stdlib/test_functional.py::test_avalidate - Exception: could not c...
ERROR test/stdlib/test_session.py::test_start_session_openai_with_kwargs - Ex...
ERROR test/stdlib/test_session.py::test_aact - Exception: could not create Ol...
ERROR test/stdlib/test_session.py::test_ainstruct - Exception: could not crea...
ERROR test/stdlib/test_session.py::test_async_await_with_chat_context - Excep...
ERROR test/stdlib/test_session.py::test_async_without_waiting_with_chat_context
ERROR test/stdlib/test_session.py::test_session_copy_with_context_ops - Excep...
ERROR test/stdlib/test_session.py::test_powerup - Exception: could not create...
= 47 failed, 735 passed, 30 skipped, 19 deselected, 3 xfailed, 117 warnings, 58 errors in 4816.40s (1:20:16) =
[21:56:14] Shutting down ollama server...
[21:56:14] Ollama stopped.| Run | Passed | Failed | Skipped | Deselected | Notes |
|---|---|---|---|---|---|
| Local, Mac M1 Max 32GB, Python 3.12.8 | 800 | 2 | 61 | 19 | 2 qualitative flakes |
| Local slow, Mac M1 Max 32GB, Python 3.12.8 | 18 | 0 | 3 | 864 | All expected |
| Cluster, LSF GPU node, Python 3.12.12 | 735 | 47 | 30 | 19 | Ollama connectivity issue — re-run planned |
There was a problem hiding this comment.
I'm on the fence about whether this skill belongs in mellea proper and not put in a "useful skills" repo instead. I'm ok adding it here for now though
|
Per convo with @planetf1 I've fixed my own review nits so as to unblock this (as he's now on vacation). I'll be holding off on merge until I've gotten my cluster run to pass (in case it failed due to something I need to fix) and for more reviews. |
|
Opened #759 on my Ollama connectivity issue that blew up my bluevela run |
|
Re-ran the cluster tests after resolving the Ollama connectivity issue from my first run (stale server from a previous session). Results below: Cluster run ( Remaining failures — none related to this PR:
Terminal output$ test/scripts/run_tests_with_ollama.sh
[22:20:32] WARNING: CACHE_DIR not set. Ollama models will download to ~/.ollama (default)
[22:20:32] Using standalone log directory: logs/2026-03-27-22:20:32
[22:20:32] Starting ollama server on 127.0.0.1:11434...
[22:20:32] Added system CUDA to LD_LIBRARY_PATH
[22:20:32] Ollama server PID: 1135706
[22:20:32] Waiting for ollama to be ready...
[22:20:34] Ollama ready after 2s
[22:20:34] Model granite4:micro already pulled
[22:20:34] Model granite4:micro-h already pulled
[22:20:34] Model granite3.2-vision already pulled
[22:20:34] All models ready.
[22:20:34] Warming up models...
[22:20:34] Warming granite4:micro ...
[22:21:38] Warming granite4:micro-h ...
[22:21:48] Warming granite3.2-vision ...
[22:21:55] Warmup complete.
[22:21:55] Starting pytest...
[22:21:55] Log directory: logs/2026-03-27-22:20:32
[22:21:55] Pytest args: --group-by-backend
============================= test session starts ==============================
platform linux -- Python 3.12.12, pytest-9.0.0, pluggy-1.6.0
rootdir: /proj/dmfexp/eiger/users/ajbozarth/mellea
configfile: pyproject.toml
plugins: nbmake-1.5.5, anyio-4.11.0, json-report-1.5.0, recording-0.13.4, asyncio-1.3.0, timeout-2.4.0, metadata-3.1.1, Faker-37.12.0, xdist-3.8.0, langsmith-0.6.6, cov-7.0.0
asyncio: mode=Mode.AUTO, debug=False, asyncio_default_fixture_loop_scope=None, asyncio_default_test_loop_scope=function
timeout: 900.0s
timeout method: signal
timeout func_only: False
collected 892 items / 19 deselected / 873 selected
test/backends/test_huggingface.py ................... [ 2%]
test/backends/test_huggingface_tools.py . [ 2%]
test/cli/test_alora_train_integration.py .. [ 2%]
test/formatters/granite/test_intrinsics_formatters.py ....x.......... [ 4%]
test/stdlib/components/docs/test_richdocument.py s [ 4%]
test/stdlib/components/intrinsic/test_core.py ..F [ 4%]
test/stdlib/components/intrinsic/test_guardian.py ...... [ 5%]
test/stdlib/components/intrinsic/test_rag.py ....... [ 6%]
test/stdlib/test_spans.py .x [ 6%]
test/telemetry/test_metrics_backend.py .. [ 6%]
test/backends/test_openai_ollama.py ............. [ 8%]
test/backends/test_openai_vllm.py sssssss [ 8%]
test/backends/test_vision_openai.py ..F. [ 9%]
test/telemetry/test_metrics_backend.py .. [ 9%]
test/backends/test_vllm.py ........ [ 10%]
test/backends/test_vllm_tools.py . [ 10%]
test/backends/test_litellm_ollama.py ........ [ 11%]
test/backends/test_mellea_tool.py .. [ 11%]
test/backends/test_ollama.py .....X.... [ 12%]
test/backends/test_tool_calls.py ... [ 13%]
test/backends/test_vision_ollama.py .... [ 13%]
test/core/test_astream_incremental.py ...... [ 14%]
test/core/test_component_typing.py ... [ 14%]
test/core/test_model_output_thunk.py .. [ 15%]
test/stdlib/components/test_genslot.py ................... [ 17%]
test/stdlib/requirements/test_requirement.py ..... [ 17%]
test/stdlib/sampling/test_majority_voting.py .. [ 17%]
test/stdlib/sampling/test_sampling_ctx.py .. [ 18%]
test/stdlib/sampling/test_sofai_graph_coloring.py ... [ 18%]
test/stdlib/sampling/test_sofai_sampling.py . [ 18%]
test/stdlib/sampling/test_think_budget_forcing.py EE [ 18%]
test/stdlib/test_chat_view.py .. [ 19%]
test/stdlib/test_functional.py .... [ 19%]
test/stdlib/test_session.py s....... [ 20%]
test/telemetry/test_metrics_backend.py .... [ 20%]
test/telemetry/test_tracing.py .... [ 21%]
test/telemetry/test_tracing_backend.py ssssss [ 22%]
test/backends/test_bedrock.py s [ 22%]
test/backends/test_litellm_watsonx.py ssss [ 22%]
test/backends/test_watsonx.py sssssssssss [ 23%]
test/telemetry/test_metrics_backend.py s [ 24%]
test/backends/test_adapters/test_adapter.py . [ 24%]
test/backends/test_mellea_tool.py ..... [ 24%]
test/backends/test_model_options.py ..... [ 25%]
test/backends/test_tool_decorator.py ................... [ 27%]
test/backends/test_tool_helpers.py ... [ 27%]
test/backends/test_tool_validation_integration.py ...................... [ 30%]
........... [ 31%]
test/cli/test_alora_train.py .... [ 32%]
test/core/test_astream_exception_propagation.py ..... [ 32%]
test/core/test_astream_mock.py ...... [ 33%]
test/core/test_base.py .... [ 33%]
test/core/test_component_typing.py ..... [ 34%]
test/decompose/test_decompose.py .......... [ 35%]
test/formatters/granite/test_intrinsics_formatters.py .................. [ 37%]
..................................FFFFFFFF [ 42%]
test/formatters/test_template_formatter.py ................ [ 44%]
test/helpers/test_event_loop_helper.py .... [ 44%]
test/helpers/test_server_type.py ................ [ 46%]
test/plugins/test_all_payloads.py ...................................... [ 50%]
............................................................. [ 57%]
test/plugins/test_blocking.py ................ [ 59%]
test/plugins/test_build_global_context.py ....... [ 60%]
test/plugins/test_decorators.py ......... [ 61%]
test/plugins/test_execution_modes.py ........................... [ 64%]
test/plugins/test_hook_call_sites.py .............................. [ 68%]
test/plugins/test_manager.py ss...... [ 68%]
test/plugins/test_mellea_plugin.py ....... [ 69%]
test/plugins/test_payloads.py .......... [ 70%]
test/plugins/test_pluginset.py ......... [ 71%]
test/plugins/test_policies.py ...... [ 72%]
test/plugins/test_policy_enforcement.py .......... [ 73%]
test/plugins/test_priority_ordering.py .............. [ 75%]
test/plugins/test_scoping.py ................................... [ 79%]
test/plugins/test_tool_hooks_redaction.py ....... [ 80%]
test/plugins/test_unregister.py ......... [ 81%]
test/stdlib/components/docs/test_document.py ... [ 81%]
test/stdlib/components/docs/test_richdocument.py ..... [ 82%]
test/stdlib/components/test_chat.py . [ 82%]
test/stdlib/components/test_hello_world.py .. [ 82%]
test/stdlib/components/test_mify.py ........... [ 83%]
test/stdlib/components/test_transform.py .. [ 83%]
test/stdlib/requirements/test_reqlib_markdown.py ...... [ 84%]
test/stdlib/requirements/test_reqlib_python.py .............sss..... [ 87%]
test/stdlib/requirements/test_reqlib_tools.py . [ 87%]
test/stdlib/sampling/test_sofai_graph_coloring.py ...................... [ 89%]
[ 89%]
test/stdlib/sampling/test_sofai_sampling.py .................... [ 91%]
test/stdlib/test_base_context.py ..... [ 92%]
test/telemetry/test_logging.py ........ [ 93%]
test/telemetry/test_metrics.py ....................................... [ 97%]
test/telemetry/test_metrics_plugins.py .... [ 98%]
test/telemetry/test_metrics_token.py .... [ 98%]
test/telemetry/test_tracing.py .......... [100%]
==================================== ERRORS ====================================
_______________________ ERROR at setup of test_think_big _______________________
gh_run = 0
@pytest.fixture(scope="module")
def m_session(gh_run):
"""Start default Mellea's session."""
if gh_run == 1: # on github
m = start_session(
"ollama", model_id=MODEL_ID, model_options={ModelOption.MAX_NEW_TOKENS: 5}
)
else:
> m = start_session("ollama", model_id=MODEL_ID)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
test/stdlib/sampling/test_think_budget_forcing.py:25:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/stdlib/session.py:241: in start_session
backend = backend_class(model_id, model_options=model_options, **backend_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <mellea.backends.ollama.OllamaModelBackend object at 0x148cc4795df0>
model_id = ModelIdentifier(hf_model_name='openai/gpt-oss-20b', ollama_name='gpt-oss:20b', watsonx_name=None, mlx_name=None, openai_name=None, bedrock_name='openai.gpt-oss-20b', hf_tokenizer_name=None)
formatter = None, base_url = None, model_options = None
def __init__(
self,
model_id: str | ModelIdentifier = model_ids.IBM_GRANITE_4_MICRO_3B,
formatter: ChatFormatter | None = None,
base_url: str | None = None,
model_options: dict | None = None,
):
"""Initialize an Ollama backend, connecting to the server and pulling the model if needed."""
super().__init__(
model_id=model_id,
formatter=(
formatter
if formatter is not None
else TemplateFormatter(model_id=model_id)
),
model_options=model_options,
)
# Run the ollama model id accessor early, so that an Assertion fails immediately if we cannot find an ollama model id for the provided ModelIdentifier.
self._get_ollama_model_id()
# Setup the client and ensure that we have the model available.
self._base_url = base_url
self._client = ollama.Client(base_url)
self._client_cache = ClientCache(2)
# Call once to set up an async client and prepopulate the cache.
_ = self._async_client
if not self._check_ollama_server():
err = f"could not create OllamaModelBackend: ollama server not running at {base_url}"
FancyLogger.get_logger().error(err)
raise Exception(err)
if not self._pull_ollama_model():
err = f"could not create OllamaModelBackend: {self._get_ollama_model_id()} could not be pulled from ollama library"
FancyLogger.get_logger().error(err)
> raise Exception(err)
E Exception: could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library
mellea/backends/ollama.py:97: Exception
---------------------------- Captured stdout setup -----------------------------
=== 22:40:43-ERROR ======
could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library
---------------------------- Captured stderr setup -----------------------------
------------------------------ Captured log setup ------------------------------
ERROR fancy_logger:ollama.py:96 could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library
_____________________ ERROR at setup of test_think_little ______________________
gh_run = 0
@pytest.fixture(scope="module")
def m_session(gh_run):
"""Start default Mellea's session."""
if gh_run == 1: # on github
m = start_session(
"ollama", model_id=MODEL_ID, model_options={ModelOption.MAX_NEW_TOKENS: 5}
)
else:
> m = start_session("ollama", model_id=MODEL_ID)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
test/stdlib/sampling/test_think_budget_forcing.py:25:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/stdlib/session.py:241: in start_session
backend = backend_class(model_id, model_options=model_options, **backend_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = <mellea.backends.ollama.OllamaModelBackend object at 0x148cc4795df0>
model_id = ModelIdentifier(hf_model_name='openai/gpt-oss-20b', ollama_name='gpt-oss:20b', watsonx_name=None, mlx_name=None, openai_name=None, bedrock_name='openai.gpt-oss-20b', hf_tokenizer_name=None)
formatter = None, base_url = None, model_options = None
def __init__(
self,
model_id: str | ModelIdentifier = model_ids.IBM_GRANITE_4_MICRO_3B,
formatter: ChatFormatter | None = None,
base_url: str | None = None,
model_options: dict | None = None,
):
"""Initialize an Ollama backend, connecting to the server and pulling the model if needed."""
super().__init__(
model_id=model_id,
formatter=(
formatter
if formatter is not None
else TemplateFormatter(model_id=model_id)
),
model_options=model_options,
)
# Run the ollama model id accessor early, so that an Assertion fails immediately if we cannot find an ollama model id for the provided ModelIdentifier.
self._get_ollama_model_id()
# Setup the client and ensure that we have the model available.
self._base_url = base_url
self._client = ollama.Client(base_url)
self._client_cache = ClientCache(2)
# Call once to set up an async client and prepopulate the cache.
_ = self._async_client
if not self._check_ollama_server():
err = f"could not create OllamaModelBackend: ollama server not running at {base_url}"
FancyLogger.get_logger().error(err)
raise Exception(err)
if not self._pull_ollama_model():
err = f"could not create OllamaModelBackend: {self._get_ollama_model_id()} could not be pulled from ollama library"
FancyLogger.get_logger().error(err)
> raise Exception(err)
E Exception: could not create OllamaModelBackend: gpt-oss:20b could not be pulled from ollama library
mellea/backends/ollama.py:97: Exception
=================================== FAILURES ===================================
________________________ test_find_context_attributions ________________________
backend = <mellea.backends.huggingface.LocalHFBackend object at 0x14894643d550>
@pytest.mark.qualitative
def test_find_context_attributions(backend):
"""Verify that the context-attribution intrinsic functions properly."""
context, assistant_response, documents = _read_rag_input_json(
"context-attribution.json"
)
expected = _read_rag_output_json("context-attribution.json")
result = core.find_context_attributions(
assistant_response, documents, context, backend
)
> assert result == expected
E AssertionError: assert [{'attributio...ne, ...}, ...] == [{'attributio...ne, ...}, ...]
E
E Left contains 5 more items, first extra item: {'attribution_begin': 0, 'attribution_doc_id': None, 'attribution_end': 66, 'attribution_msg_index': 2, ...}
E Use -v to get more diff
test/stdlib/components/intrinsic/test_core.py:105: AssertionError
----------------------------- Captured stdout call -----------------------------
=== 22:30:56-INFO ======
passing in model options when generating with an adapter; some model options may be overwritten / ignored
----------------------------- Captured stderr call -----------------------------
Fetching 1 files: 100%|██████████| 1/1 [00:00<00:00, 9320.68it/s]
Fetching 3 files: 100%|██████████| 3/3 [00:00<00:00, 10960.72it/s]
------------------------------ Captured log call -------------------------------
INFO fancy_logger:huggingface.py:475 passing in model options when generating with an adapter; some model options may be overwritten / ignored
--------------------------- Captured stdout teardown ---------------------------
=== 22:30:59-INFO ======
Cleaning up test_core backend GPU memory...
=== 22:30:59-INFO ======
GPU before cleanup: 58.1GB free / 79.2GB total
=== 22:30:59-INFO ======
Cleared LRU cache
=== 22:30:59-INFO ======
Removed accelerate dispatch hooks
=== 22:31:00-INFO ======
GPU after cleanup: 78.1GB free / 79.2GB total (reclaimed 20.0GB)
---------------------------- Captured log teardown -----------------------------
INFO fancy_logger:conftest.py:342 Cleaning up test_core backend GPU memory...
INFO fancy_logger:conftest.py:349 GPU before cleanup: 58.1GB free / 79.2GB total
INFO fancy_logger:conftest.py:365 Cleared LRU cache
INFO fancy_logger:conftest.py:402 Removed accelerate dispatch hooks
INFO fancy_logger:conftest.py:437 GPU after cleanup: 78.1GB free / 79.2GB total (reclaimed 20.0GB)
_______________________ test_image_block_in_instruction ________________________
m_session = <mellea.stdlib.session.MelleaSession object at 0x148f2d2cdbb0>
pil_image = <PIL.Image.Image image mode=RGB size=200x150 at 0x148F2D255640>
gh_run = 0
def test_image_block_in_instruction(
m_session: MelleaSession, pil_image: Image.Image, gh_run: int
):
image_block = ImageBlock.from_pil_image(pil_image)
# Set strategy=None here since we are directly comparing the object and sampling strategies tend to do a deepcopy.
instr = m_session.instruct(
"Is this image mainly blue? Answer yes or no.",
images=[image_block],
strategy=None,
)
assert isinstance(instr, ModelOutputThunk)
# if not on GH
if not gh_run == 1:
> assert "yes" in instr.value.lower() or "no" in instr.value.lower() # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
E AssertionError: assert ('yes' in '\nthe image is predominantly blue with varying shades creating a mosaic effect.' or 'no' in '\nthe image is predominantly blue with varying shades creating a mosaic effect.')
E + where '\nthe image is predominantly blue with varying shades creating a mosaic effect.' = <built-in method lower of str object at 0x148f2d2c65e0>()
E + where <built-in method lower of str object at 0x148f2d2c65e0> = '\nThe image is predominantly blue with varying shades creating a mosaic effect.'.lower
E + where '\nThe image is predominantly blue with varying shades creating a mosaic effect.' = ModelOutputThunk(\nThe image is predominantly blue with varying shades creating a mosaic effect.).value
E + and '\nthe image is predominantly blue with varying shades creating a mosaic effect.' = <built-in method lower of str object at 0x148f2d2c65e0>()
E + where <built-in method lower of str object at 0x148f2d2c65e0> = '\nThe image is predominantly blue with varying shades creating a mosaic effect.'.lower
E + where '\nThe image is predominantly blue with varying shades creating a mosaic effect.' = ModelOutputThunk(\nThe image is predominantly blue with varying shades creating a mosaic effect.).value
test/backends/test_vision_openai.py:86: AssertionError
---------------------------- Captured stdout setup -----------------------------
=== 22:33:27-INFO ======
Starting Mellea session: backend=openai, model=granite3.2-vision, context=SimpleContext, model_options={'@@@max_new_tokens@@@': 5}
------------------------------ Captured log setup ------------------------------
INFO fancy_logger:session.py:246 Starting Mellea session: backend=openai, model=granite3.2-vision, context=SimpleContext, model_options={'@@@max_new_tokens@@@': 5}
____________________ test_run_ollama[answerability_simple] _____________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='answerability_simple', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--ibm-...rability', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
__________________ test_run_ollama[answerability_answerable] ___________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='answerability_answerable', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--...rability', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
_________________ test_run_ollama[answerability_unanswerable] __________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='answerability_unanswerable', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models...rability', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
___________________ test_run_ollama[hallucination_detection] ___________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='hallucination_detection', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--i...etection', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
_____________________ test_run_ollama[query_clarification] _____________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='query_clarification', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--ibm-g...fication', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
________________________ test_run_ollama[query_rewrite] ________________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='query_rewrite', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--ibm-granite..._rewrite', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
______________________ test_run_ollama[context_relevance] ______________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='context_relevance', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--ibm-gra...elevance', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
__________________________ test_run_ollama[citations] __________________________
yaml_json_combo_for_ollama = YamlJsonCombo(short_name='citations', yaml_file=PosixPath('/u/ajbozarth/.cache/huggingface/hub/models--ibm-granite--gr...itations', is_alora=False, repo_id='ibm-granite/granite-lib-rag-r1.0', revision='main', base_model_id='granite4:micro')
def test_run_ollama(yaml_json_combo_for_ollama):
"""
Run the target model end-to-end with a mock Ollama backend.
"""
cfg = yaml_json_combo_for_ollama
# Change base model id to Ollama's version
if cfg.base_model_id == "ibm-granite/granite-4.0-micro":
cfg.base_model_id = "granite4:micro"
else:
pytest.xfail(f"Unsupported base model: {cfg.base_model_id}")
if cfg.arguments_file:
with open(cfg.arguments_file, encoding="utf8") as f:
transform_kwargs = json.load(f)
else:
transform_kwargs = {}
# Load input request
with open(cfg.inputs_file, encoding="utf-8") as f:
model_input = ChatCompletion.model_validate_json(f.read())
model_input.model = cfg.task
# Download files from Hugging Face Hub
try:
> lora_dir = intrinsics_util.obtain_lora(
cfg.task,
cfg.base_model_id,
cfg.repo_id,
revision=cfg.revision,
alora=cfg.is_alora,
)
test/formatters/granite/test_intrinsics_formatters.py:714:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
mellea/formatters/granite/intrinsics/util.py:154: in obtain_lora
local_root_path = huggingface_hub.snapshot_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:332: in snapshot_download
thread_map(
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:69: in thread_map
return _executor_map(ThreadPoolExecutor, fn, *iterables, **tqdm_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/contrib/concurrent.py:51: in _executor_map
return list(tqdm_class(ex.map(fn, *iterables, chunksize=chunksize), **kwargs))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/tqdm/std.py:1181: in __iter__
for obj in iterable:
^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:619: in result_iterator
yield _result_or_cancel(fs.pop())
^^^^^^^^^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:317: in _result_or_cancel
return fut.result(timeout)
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:456: in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/_base.py:401: in __get_result
raise self._exception
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/concurrent/futures/thread.py:59: in run
result = self.fn(*self.args, **self.kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/_snapshot_download.py:306: in _inner_hf_hub_download
return hf_hub_download(
.venv/lib/python3.12/site-packages/huggingface_hub/utils/_validators.py:114: in _inner_fn
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1007: in hf_hub_download
return _hf_hub_download_to_cache_dir(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1168: in _hf_hub_download_to_cache_dir
_download_to_tmp_and_move(
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:1720: in _download_to_tmp_and_move
xet_get(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
def xet_get(
*,
incomplete_path: Path,
xet_file_data: XetFileData,
headers: Dict[str, str],
expected_size: Optional[int] = None,
displayed_filename: Optional[str] = None,
_tqdm_bar: Optional[tqdm] = None,
) -> None:
"""
Download a file using Xet storage service.
Args:
incomplete_path (`Path`):
The path to the file to download.
xet_file_data (`XetFileData`):
The file metadata needed to make the request to the xet storage service.
headers (`Dict[str, str]`):
The headers to send to the xet storage service.
expected_size (`int`, *optional*):
The expected size of the file to download. If set, the download will raise an error if the size of the
received content is different from the expected one.
displayed_filename (`str`, *optional*):
The filename of the file that is being downloaded. Value is used only to display a nice progress bar. If
not set, the filename is guessed from the URL or the `Content-Disposition` header.
**How it works:**
The file download system uses Xet storage, which is a content-addressable storage system that breaks files into chunks
for efficient storage and transfer.
`hf_xet.download_files` manages downloading files by:
- Taking a list of files to download (each with its unique content hash)
- Connecting to a storage server (CAS server) that knows how files are chunked
- Using authentication to ensure secure access
- Providing progress updates during download
Authentication works by regularly refreshing access tokens through `refresh_xet_connection_info` to maintain a valid
connection to the storage server.
The download process works like this:
1. Create a local cache folder at `~/.cache/huggingface/xet/chunk-cache` to store reusable file chunks
2. Download files in parallel:
2.1. Prepare to write the file to disk
2.2. Ask the server "how is this file split into chunks?" using the file's unique hash
The server responds with:
- Which chunks make up the complete file
- Where each chunk can be downloaded from
2.3. For each needed chunk:
- Checks if we already have it in our local cache
- If not, download it from cloud storage (S3)
- Save it to cache for future use
- Assemble the chunks in order to recreate the original file
"""
try:
from hf_xet import PyXetDownloadInfo, download_files # type: ignore[no-redef]
except ImportError:
raise ValueError(
"To use optimized download using Xet storage, you need to install the hf_xet package. "
'Try `pip install "huggingface_hub[hf_xet]"` or `pip install hf_xet`.'
)
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
def token_refresher() -> Tuple[str, int]:
connection_info = refresh_xet_connection_info(file_data=xet_file_data, headers=headers)
if connection_info is None:
raise ValueError("Failed to refresh token using xet metadata.")
return connection_info.access_token, connection_info.expiration_unix_epoch
xet_download_info = [
PyXetDownloadInfo(
destination_path=str(incomplete_path.absolute()), hash=xet_file_data.file_hash, file_size=expected_size
)
]
if not displayed_filename:
displayed_filename = incomplete_path.name
# Truncate filename if too long to display
if len(displayed_filename) > 40:
displayed_filename = f"{displayed_filename[:40]}(…)"
progress_cm = _get_progress_bar_context(
desc=displayed_filename,
log_level=logger.getEffectiveLevel(),
total=expected_size,
initial=0,
name="huggingface_hub.xet_get",
_tqdm_bar=_tqdm_bar,
)
with progress_cm as progress:
def progress_updater(progress_bytes: float):
progress.update(progress_bytes)
> download_files(
xet_download_info,
endpoint=connection_info.endpoint,
token_info=(connection_info.access_token, connection_info.expiration_unix_epoch),
token_refresher=token_refresher,
progress_updater=[progress_updater],
)
E RuntimeError: Data processing error: CAS service error : IO Error: Disk quota exceeded (os error 122)
.venv/lib/python3.12/site-packages/huggingface_hub/file_download.py:626: RuntimeError
----------------------------- Captured stderr call -----------------------------
Fetching 3 files: 0%| | 0/3 [00:00<?, ?it/s]
=============================== warnings summary ===============================
test/backends/test_huggingface.py: 1 warning
test/stdlib/components/intrinsic/test_core.py: 2 warnings
test/stdlib/components/intrinsic/test_guardian.py: 3 warnings
test/stdlib/components/intrinsic/test_rag.py: 5 warnings
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/peft/tuners/tuners_utils.py:285: UserWarning: Already found a `peft_config` attribute in the model. This will lead to having multiple adapters in the model. Make sure to know what you are doing!
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/trl/trainer/utils.py:103: DeprecationWarning: This class is deprecated and will be removed in version 0.20.0. To train on completion only, please use the parameter `completion_only_loss` of `SFTConfig` instead.
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
test/cli/test_alora_train.py::test_alora_config_creation
test/cli/test_alora_train.py::test_lora_config_creation
test/cli/test_alora_train.py::test_invocation_prompt_tokenization
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_config.py:257: DeprecationWarning: `max_seq_length` is deprecated and will be removed in version 0.20.0. Use `max_length` instead.
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/trl/trainer/sft_trainer.py:678: DeprecationWarning: Failed to apply the formatting function due to the following error: string index out of range. This may be because the function is designed for batched input. Please update it to process one example at a time (i.e., accept and return a single example). For now, we will attempt to apply the function in batched mode, but note that batched formatting is deprecated and will be removed in version 0.21.
warnings.warn(
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/pin_memory.py:57: DeprecationWarning: The argument 'device' of Tensor.pin_memory() is deprecated. Please do not pass this argument. (Triggered internally at /pytorch/aten/src/ATen/native/Memory.cpp:46.)
return data.pin_memory(device)
test/cli/test_alora_train_integration.py::test_alora_training_integration
test/cli/test_alora_train_integration.py::test_lora_training_integration
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/torch/utils/data/_utils/pin_memory.py:57: DeprecationWarning: The argument 'device' of Tensor.is_pinned() is deprecated. Please do not pass this argument. (Triggered internally at /pytorch/aten/src/ATen/native/Memory.cpp:31.)
return data.pin_memory(device)
test/telemetry/test_metrics_backend.py: 8 warnings
test/telemetry/test_metrics.py: 24 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/proj/dmfexp/eiger/users/ajbozarth/mellea/mellea/telemetry/metrics.py:245: UserWarning: Metrics are enabled (MELLEA_METRICS_ENABLED=true) but no exporters are configured. Metrics will be collected but not exported. Set MELLEA_METRICS_PROMETHEUS=true, set MELLEA_METRICS_OTLP=true with an endpoint (OTEL_EXPORTER_OTLP_METRICS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT), or set MELLEA_METRICS_CONSOLE=true to export metrics.
_meter_provider = _setup_meter_provider()
test/telemetry/test_metrics_backend.py: 7 warnings
test/telemetry/test_metrics.py: 28 warnings
test/telemetry/test_metrics_token.py: 4 warnings
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/importlib/__init__.py:131: UserWarning: TokenMetricsPlugin already registered: Plugin token_metrics.generation_post_call already registered
_bootstrap._exec(spec, module)
test/backends/test_vision_openai.py::test_image_block_construction
/proj/dmfexp/eiger/users/ajbozarth/mellea/test/backends/test_vision_openai.py:48: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
random_image = Image.fromarray(random_pixel_data, "RGB")
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
test/backends/test_litellm_ollama.py::test_generate_from_raw
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/aiohttp/connector.py:993: DeprecationWarning: enable_cleanup_closed ignored because https://github.com/python/cpython/pull/118960 is fixed in Python version sys.version_info(major=3, minor=12, micro=12, releaselevel='final', serial=0)
super().__init__(
test/backends/test_litellm_ollama.py::test_litellm_ollama_chat
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='The answ...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Subject:...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='yes', ro...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_litellm_ollama_instruct_options
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Subject:...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_gen_slot
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='{\n "...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/streaming_handler.py:1855: PydanticDeprecatedSince20: The `dict` method is deprecated; use `model_dump` instead. Deprecated in Pydantic V2.0 to be removed in V3.0. See Pydantic V2 Migration Guide at https://errors.pydantic.dev/2.12/migration/
obj_dict = processed_chunk.dict()
test/backends/test_litellm_ollama.py::test_async_parallel_requests
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="Goodbye!...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_litellm_ollama.py::test_async_parallel_requests
test/backends/test_litellm_ollama.py::test_async_avalue
test/backends/test_litellm_ollama.py::test_async_avalue
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='Hello! H...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/backends/test_tool_calls.py::test_tool_called_from_context_action
<frozen abc>:106: DeprecationWarning: Use BaseMetaSerializer() instead.
test/backends/test_vision_ollama.py::test_image_block_construction
/proj/dmfexp/eiger/users/ajbozarth/mellea/test/backends/test_vision_ollama.py:38: DeprecationWarning: 'mode' parameter is deprecated and will be removed in Pillow 13 (2026-10-15)
random_image = Image.fromarray(random_pixel_data, "RGB")
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[non-streaming]
test/telemetry/test_tracing.py::test_session_with_tracing_disabled
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content="I'm here...ields={'refusal': None}), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...ider_specific_fields={}), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_computed_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
or callable(getattr(delta, attr_name))
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/litellm/litellm_core_utils/model_response_utils.py:206: PydanticDeprecatedSince211: Accessing the 'model_fields' attribute on the instance is deprecated. Instead, you should access this attribute from the model class. Deprecated in Pydantic V2.11 to be removed in V3.0.
or callable(getattr(delta, attr_name))
test/telemetry/test_metrics_backend.py::test_litellm_token_metrics_integration[streaming]
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/pydantic/main.py:464: UserWarning: Pydantic serializer warnings:
PydanticSerializationUnexpectedValue(Expected 10 fields but got 5: Expected `Message` - serialized value may not be as expected [field_name='message', input_value=Message(content='As an AI...er_specific_fields=None), input_type=Message])
PydanticSerializationUnexpectedValue(Expected `StreamingChoices` - serialized value may not be as expected [field_name='choices', input_value=Choices(finish_reason='st...r_specific_fields=None)), input_type=Choices])
return self.__pydantic_serializer__.to_python(
test/helpers/test_event_loop_helper.py::test_event_loop_handler_with_forking
/u/ajbozarth/.local/share/uv/python/cpython-3.12.12-linux-x86_64-gnu/lib/python3.12/multiprocessing/popen_fork.py:66: DeprecationWarning: This process (pid=1137938) is multi-threaded, use of fork() may lead to deadlocks in the child.
self.pid = os.fork()
test/stdlib/components/docs/test_richdocument.py::test_richdocument_basics
test/stdlib/components/docs/test_richdocument.py::test_richdocument_markdown
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
test/stdlib/components/docs/test_richdocument.py::test_richdocument_save
/proj/dmfexp/eiger/users/ajbozarth/mellea/.venv/lib/python3.12/site-packages/docling_core/transforms/serializer/markdown.py:490: DeprecationWarning: Field `annotations` is deprecated; use `meta` instead.
for ann in item.annotations
test/telemetry/test_logging.py::test_otlp_logging_enabled_without_endpoint_warns
/proj/dmfexp/eiger/users/ajbozarth/mellea/mellea/telemetry/logging.py:97: UserWarning: OTLP logs exporter is enabled (MELLEA_LOGS_OTLP=true) but no endpoint is configured. Set OTEL_EXPORTER_OTLP_LOGS_ENDPOINT or OTEL_EXPORTER_OTLP_ENDPOINT to export logs.
_logger_provider = _setup_logger_provider()
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================ tests coverage ================================
_______________ coverage: platform linux, python 3.12.12-final-0 _______________
Coverage HTML written to dir htmlcov
Coverage JSON written to file coverage.json
=========================== short test summary info ============================
FAILED test/stdlib/components/intrinsic/test_core.py::test_find_context_attributions
FAILED test/backends/test_vision_openai.py::test_image_block_in_instruction
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[answerability_simple]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[answerability_answerable]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[answerability_unanswerable]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[hallucination_detection]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[query_clarification]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[query_rewrite]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[context_relevance]
FAILED test/formatters/granite/test_intrinsics_formatters.py::test_run_ollama[citations]
ERROR test/stdlib/sampling/test_think_budget_forcing.py::test_think_big - Exc...
ERROR test/stdlib/sampling/test_think_budget_forcing.py::test_think_little - ...
= 10 failed, 821 passed, 37 skipped, 19 deselected, 2 xfailed, 1 xpassed, 131 warnings, 2 errors in 2295.03s (0:38:15) =
[23:00:24] Shutting down ollama server...
[23:00:24] Ollama stopped.
|
|
Looking into the above issues I've actually hit a handful of things to address that I will be retuning to on Monday, including but not limited to:
|
Replace deprecated pytest markers with typed predicate functions from
test/predicates.py across all test files and example files:
- requires_gpu → require_gpu(min_vram_gb=N) with per-model VRAM estimates
- requires_heavy_ram → removed (conflated VRAM with RAM; no replacement needed)
- requires_gpu_isolation → removed (GPU isolation is now automatic)
- requires_api_key → require_api_key("VAR1", "VAR2", ...) with explicit env vars
Also removes spurious requires_gpu from ollama-backed tests (test_genslot,
test_think_budget_forcing, test_component_typing) and adds missing
integration marker to test_hook_call_sites.
VRAM estimates computed from model parameter counts using bf16 formula
(params_B × 2 × 1.2, rounded up to next even GB):
- granite-3.3-8b: 20 GB, Mistral-7B: 18 GB, granite-4.0-micro (3B): 8 GB
- Qwen3-0.6B: 4 GB (conservative for vLLM KV cache headroom)
- granite-4.0-h-micro (3B): 8 GB, alora training (3B): 12 GB
Add pytest.importorskip() / pytest.importorskip() guards to 14 test files that previously aborted the entire test run with a ModuleNotFoundError when optional extras were not installed: - torch / llguidance (mellea[hf]): test_huggingface, test_huggingface_tools, test_alora_train_integration, test_intrinsics_formatters, test_core, test_guardian, test_rag, test_spans - litellm (mellea[litellm]): test_litellm_ollama, test_litellm_watsonx - ibm_watsonx_ai (mellea[watsonx]): test_watsonx - docling / docling_core (mellea[mify]): test_tool_calls, test_richdocument, test_transform With these guards, `uv run pytest` runs all collectable tests and reports skipped files with a clear reason instead of aborting at first ImportError.
Expand integration to cover SDK-boundary tests (OTel InMemoryMetricReader,
InMemorySpanExporter, LoggingHandler) — tests that assert against a real
third-party SDK contract, not just multi-component wiring. Updates SKILL.md
and MARKERS_GUIDE.md with new definition, indicators, tie-breaker, and
SDK-boundary signal tables.
Applied fixes:
- test/telemetry/test_{metrics,metrics_token,logging}.py: add integration marker
- test/telemetry/test_metrics_backend.py: add openai marker to OTel+OpenAI test,
remove redundant inline skip already covered by require_api_key predicate
- test/cli/test_alora_train.py: add integration to test_imports_work (real LoraConfig)
- test/formatters/granite/test_intrinsics_formatters.py: remove unregistered block_network marker
- test/stdlib/components/docs/test_richdocument.py: add integration pytestmark + e2e/huggingface/qualitative on skipped generation test
- test/backends/test_openai_ollama.py: note inherited module marker limitation
- docs/examples/plugins/testing_plugins.py: add # pytest: unit
- test/plugins/test_payloads.py: importorskip("cpex") — skip module when
mellea[hooks] not installed instead of failing mid-test with ImportError
- test/telemetry/test_metrics_plugins.py: same cpex guard
- docs/examples/conftest.py: extend _check_optional_imports to cover docling,
pandas, cpex (mellea.plugins imports), and litellm; also call the check from
pytest_pycollect_makemodule so directly-specified files are guarded too
- docs/examples/image_text_models/README.md: add Prerequisites section listing
models to pull (granite3.2-vision, qwen2.5vl:7b)
…ards
Replace per-dep import checks in examples conftest with a runtime approach:
ExampleModule (a pytest.Module subclass) is now returned by
pytest_pycollect_makemodule for all runnable example files, preventing
pytest's default collector from importing them directly. Import errors in
the subprocess are caught in ExampleItem.runtest() and converted to skips,
so no optional dependency needs to be encoded in conftest.
Remove _check_optional_imports entirely — it was hand-maintained and would
need updating for every new optional dep.
Also:
- test/plugins/test_payloads.py: importorskip("cpex")
- test/telemetry/test_metrics_plugins.py: importorskip("cpex")
- docs/examples/image_text_models/README.md: add Prerequisites section
listing models to pull (granite3.2-vision, qwen2.5vl:7b)
Locally running without mellea[telemetry] caused three tests to fail with assertion errors rather than skip cleanly. Add importorskip at module level for test_tracing.py and a skipif decorator for the single OTel-gated test in test_astream_exception_propagation.py.
Metal's recommendedMaxWorkingSetSize is a static device property (~75% of total RAM) that ignores current system load. Replace it with min(total * 0.75, total - 16) so that desktop/IDE memory usage is accounted for. Also removes the torch dependency for GPU detection on Apple Silicon — sysctl hw.memsize is used directly. CUDA path on Linux is unchanged.
…VRAM gate Training tests need ~2x the base model inference memory (activations, optimizer states, gradient temporaries). The skill now detects training signals (train_model, Trainer, epochs=) and checks that require_gpu min_vram_gb uses the 2x rule. Bump test_alora_train_integration from min_vram_gb=12 to 20 (3B bfloat16: ~6 GB inference, ~12 GB training peak + headroom) so it skips correctly on 32 GB Apple Silicon under typical load.
get_system_capabilities() was caching the function reference, not the result — causing the Ollama socket check (1s timeout) and full capability detection to re-run for every example file during collection (~102 times). Cache the result dict instead so detection runs exactly once.
The function was called once per test in pytest_runtest_setup (325+ calls) and once at collection in pytest_collection_modifyitems, each time re-running the Ollama socket check (1s timeout when down), sysctl subprocess, and psutil query. Cache the result after the first call.
torch.cuda.empty_cache() is a no-op on Apple Silicon MPS, leaving the MPS allocator pool occupied after each module fixture tears down. The next module then loads a fresh model into an already-pressured pool, causing the process RSS to grow unboundedly across modules. Both calls are now guarded so CUDA and MPS runs each get the correct flush.
…asting AutoModelForCausalLM.from_pretrained without torch_dtype may load weights in float32 on CPU before moving to MPS/CUDA, doubling peak memory briefly and leaving float32 remnants in the allocator pool. torch_dtype="auto" respects the model config (bfloat16 for Granite) for both the CPU load and the device transfer.
…M gates - Remove --isolate-heavy flag, _run_heavy_modules_isolated(), pytest_collection_finish(), and require_gpu_isolation() predicate — superseded by cleanup_gpu_backend() from PR generative-computing#721 - Remove dead requires_gpu/requires_api_key branches from docs/examples/conftest.py - Bump min_vram_gb from 8 → 12 on test_guardian, test_core, test_rag, test_spans — correct gate for 3B base model (6 GB) + adapters + inference overhead; 8 GB was wrong and masked by the now-fixed MPS pool leak - Add adapter accumulation signals to audit-markers skill - Update AGENTS.md, test/README.md, MARKERS_GUIDE.md to remove --isolate-heavy references
Replace deprecated @pytest.mark.llm, @pytest.mark.requires_gpu, @pytest.mark.requires_heavy_ram, @pytest.mark.requires_gpu_isolation with @pytest.mark.e2e and @require_gpu(min_vram_gb=12) to align with the new marker taxonomy (generative-computing#727/generative-computing#728). VRAM gate set to 12 GB matching the 3B-parameter model loaded across the parametrized test cases.
…tion markers and handlers
…tch watsonx+bedrock markers
Signed-off-by: Alex Bozarth <ajbozart@us.ibm.com>
…skip/resource issues
53b7fed to
ec0254d
Compare
Thanks!
I had a look at the test. It does test some structural aspects, but also has a few qualitative things in. I didn't hit the issue in multi runs, but agree with your suggested classification. It would benefit from some rewrite to tease out the different aspects . I also looked at the skill, but we're into subtle detail here so I don't think there's anything else to add. Pushed a fix on this
My error -- I had left a necessary file in .gitignore for project-specific config (which is why it worked for me!). That is accepted best-practice and claude intent. I also tweaked the description slightly as we discussed last week so it considers the one-off cases (though I'm sure more tweaks are possible)
I've also done a rebase on upstream/main (with no conflicts) - not squashed, but if'when we're ready I can do that to make it easier to track upstream whilst being reviewed. Trying a cluster run with 'uv sync --all-extras --all-groups && uv run test/scripts/run_tests_with_ollama.sh' (and using Thanks for the thorough checks and patches. |



Marker Taxonomy & Agent Skills
Type of PR
Description
Fixes #727, #728
Introduces a four-tier test marker taxonomy (
unit/integration/e2e/qualitative), an agent skill to audit and fix markers, and applies the resulting reclassifications across the test suite. Also removes the legacy--isolate-heavyprocess isolation mechanism (superseded bycleanup_gpu_backend()from #721).How we define the tiers
unitintegratione2equalitativeunitis never written explicitly — conftest applies it automatically to any test that carries none of the other three.New agent skills (
.agents/skills/)Two skills following the agentskills.io standard, discoverable by Claude Code, VS Code/Copilot, and IBM Bob:
/audit-markers— classifies any test as unit/integration/e2e/qualitative using signal detection (imports, fixtures, assertion patterns, decorator shapes). Traces model identifiers to estimatemin_vram_gbfrom parameter counts. Report-first by default;--applyskips confirmation./skill-author— meta-skill for creating new skills with correct frontmatter and structure.pytest infrastructure changes
BACKEND_MARKERSregistry inconftest.py— single source of truth for all 7 backend markers;pytest_configureregisters them automatically. New backends need one dict entry.unitauto-apply hook —pytest_collection_modifyitemsappliesunitto any collected test that has none ofintegration,e2e,qualitative,llm. Enablespytest -m unit.--isolate-heavyand all associated code (_run_heavy_modules_isolated(),pytest_collection_finish(),require_gpu_isolation()). Thecleanup_gpu_backend()helper from ci: memory management in tests #721 handles GPU memory teardown;--group-by-backendhandles ordering.torch_dtype="auto"on model load —LocalHFBackend.from_pretrainednow passestorch_dtype="auto"toAutoModelForCausalLM.from_pretrained, preventing silent float32 upcasting on CPU during model load. On MPS/CUDA this halves memory use for bfloat16/float16 models._gpu_vram_gb()on Apple Silicon now usessysctl hw.memsizewith a conservative heuristic (min(total * 0.75, total - 16 GB)) instead of returning 0 — leaves headroom for OS and desktop apps.get_system_capabilities()cached — avoids repeated torch/MPS calls during collection.--ignore-gpu-check,--ignore-ollama-check,--ignore-api-key-check,--ignore-all-checksremoved — unused escape hatches; skips are now unconditional when a capability is missing.require_ollama()removed — redundant with theollamabackend marker + conftest auto-skip.llmmarker deprecated — treated as synonym fore2efor backwards compat; 0 remaining uses intest/ordocs/examples/.Test reclassifications
All changes are marker-only — no test logic was modified.
New
integrationtests (were unmarked/unit):test/cli/test_alora_train.pytest/telemetry/test_metrics.pyInMemoryMetricReader— asserts SDK attribute namestest/telemetry/test_tracing.pyInMemorySpanExporter— asserts span structuretest/telemetry/test_metrics_token.pytest/telemetry/test_metrics_plugins.pytest/package/test_dependency_isolation.pyuvsubprocesses — controls its own depstest/plugins/,test/core/,test/stdlib/e2emarker additions/corrections:test/backends/test_bedrock.pybedrockbackend marker; registered in conftest/pyprojecttest/telemetry/test_metrics_backend.pye2e(had backend markers but no tier)test/formatters/granite/test_intrinsics_formatters.pyllm/requires_gpu/requires_heavy_ram/requires_gpu_isolationwithe2e+require_gpu(min_vram_gb=12)VRAM gates updated (8 GB → 12 GB): the
/audit-markersskill estimatesmin_vram_gbby tracing model identifiers to parameter counts — test authors can override the estimate directly on therequire_gpu()call.test_guardian.py,test_core.py,test_rag.py,test_spans.pyDocs updated
test/MARKERS_GUIDE.md— full rewrite with tier definitions, backend marker table, resource predicate reference, auto-skip logic, and common patternstest/README.md— updated env var table; addedOLLAMA_KEEP_ALIVE=1mtip for unordered runsAGENTS.md/CONTRIBUTING.md— removed--isolate-heavyreferences; added skills discovery tableLocal test run (Mac M1, 32 GB)
Full run (
uv run pytest): 800 passed, 2 failed, 61 skipped, 19 deselected in 17m23s.The 2 failures are
@pytest.mark.qualitativetests (test_find_context_attributions,test_hallucination_detection) — non-deterministic content assertions that can vary between runs; not related to this PR.The 19 deselected are
slowtests excluded by default (-m "not slow"inaddopts).Skips breakdown (61 total — all expected):
test_huggingface.py,test_alora_train_integration.py,test_richdocument.pytest_watsonx.py,test_litellm_watsonx.py,test_bedrock.py,test_watsonx_token_metricstest_openai_vllm.py,test_vllm_tools.pytest_tracing_backend.py— telemetry not initialisedtest_manager.py— requires--disable-default-mellea-pluginsflagtest_reqlib_python.pysandbox testsSlow tests run explicitly (
uv run pytest -m slow):test_dependency_isolation.pygenerative_gsm8k.pymini_researcher/researcher.pypython_decompose_example.pyIssues raised during testing
test_tracing_backend.pytests always skip (Telemetry not initialized): root cause is_tracer_providerset at module import time;MonkeyPatch.setenvhas no effect. Flagged for @ajbozart.python_decompose_example.pyKeyError infinalize_result: constraint strings from two separate model calls don't match exactly. Flagged for @AngeloDanducci.Testing
pytest --collect-only— collection unchangedruff formatandruff checkpasscodespellandmarkdownlintpassCluster test run (IBM BLUEVELA LSF, Linux / Python 3.12.13, p-series GPU node)
Full run using
test/scripts/run-all(starts Ollama, pulls models, warms up, thenpytest --group-by-backend):832 passed, 1 failed, 37 skipped, 19 deselected, 2 xfailed, 1 xpassed in 45m18s (job 737802).
The 1 failure is
test_find_context_attributions(@pytest.mark.qualitative) — same non-deterministic content assertion flake as seen in local runs; not related to this PR.The 1 xpassed is a bonus: a test marked
xfailthat unexpectedly passed.Skips breakdown (37 total — all expected):
skipif not OTEL_AVAILABLE)@pytest.mark.skip(test_richdocument— memory)Compared to a run without the startup script (job 737413): skips dropped from 142 → 37 once Ollama was running and models were warmed up.
Test run summary across environments
uv run pytest)uv run pytest, no services)test/scripts/run-all19 deselected =
slowtests excluded by default inpyproject.tomlacross all runs.Skip reduction (142 → 37) is ~95 Ollama-dependent tests that become runnable once the startup script brings services up.
The LSF script run passes ~30 more tests than local because vLLM is available on the cluster.