feat: citation requirement#725
feat: citation requirement#725akihikokuroda wants to merge 9 commits intogenerative-computing:mainfrom
Conversation
|
The PR description has been updated. Please fill out the template for your PR to be reviewed. |
Merge ProtectionsYour pull request matches the following merge protections and will not be merged until they are valid. 🟢 Enforce conventional commitWonderful, this rule succeeded.Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/
|
frreiss
left a comment
There was a problem hiding this comment.
This needs some work. The fundamental calculation does not appear to be valid, and there are a number of puzzling API design decisions.
|
|
||
| # Use constructor documents if provided, otherwise get from message | ||
| if self.documents is not None: | ||
| documents = self.documents |
There was a problem hiding this comment.
Checking against different set of documents than those used to generate the last message is outside the target domain of the Citations intrinsic. Have you done anything to validate the assumption that such a trick would work?
There was a problem hiding this comment.
This implementation is based on the current example and find_citations implementation below
https://github.com/generative-computing/mellea/blob/main/docs/examples/intrinsics/citations.py
mellea/stdlib/requirements/rag.py
Outdated
| all_messages = ctx.as_list() | ||
| if len(all_messages) > 1: | ||
| # Rebuild context without last message | ||
| from ..context import ChatContext |
There was a problem hiding this comment.
Why is this import not at the top of the file?
There was a problem hiding this comment.
It's to avoid circular dependencies
There was a problem hiding this comment.
The fact that you're seeing a circular dependency here is concerning. The only imports at the top of this file are:
from ...backends.adapters import AdapterMixin
from ...core import Backend, Context, Requirement, ValidationResult
from ..components import Document, Message
An import of mellea.stdlib.context.ChatContext ought to be compatible with these imports. What is the root cause of the dependency cycle when that dependency is added to this file?
There was a problem hiding this comment.
Yes, it is right. After looking at them again, there are no cycle dependency for ChatContext. I'll move it at top. Thanks!
mellea/stdlib/requirements/rag.py
Outdated
| citation["response_end"] - citation["response_begin"] | ||
| for citation in citations | ||
| ) | ||
| coverage_ratio = cited_chars / total_chars |
There was a problem hiding this comment.
Checking a ratio of characters is not in line with the original requirement in #503: "requires [sic.] citations for factual claims". You should be checking the fraction of factual claims that are backed by citations.
There was a problem hiding this comment.
Implemented "fraction of factual claims" and also add an option to choose between character or factual base check.
| reason += f"\n\nCitations found ({num_citations}):" | ||
| for i, citation in enumerate(citations[:5]): # Show first 5 | ||
| response_text = citation["response_text"].strip() | ||
| doc_id = citation.get("citation_doc_id", "unknown") |
There was a problem hiding this comment.
Why are you passing default values to get() here and on the next line?
| @pytest.mark.requires_heavy_ram | ||
| async def test_citation_requirement_basic(): | ||
| """Test basic citation requirement functionality.""" | ||
| backend = LocalHFBackend(model_id="ibm-granite/granite-4.0-micro") |
There was a problem hiding this comment.
Every test in this file loads the base model. Use a fixture instead.
There was a problem hiding this comment.
There are both tests, accessing model and using mock. There are test markers that make selectable execution such as @pytest.mark.huggingface.
| @pytest.mark.llm | ||
| @pytest.mark.requires_heavy_ram | ||
| async def test_citation_requirement_threshold_boundary(): | ||
| """Test citation requirement at exact threshold boundary.""" |
There was a problem hiding this comment.
Test case does not do what comment says it does. You would need to construct an input that produces the exact target threshold boundary, probably by mocking the citation intrinsic, to do what this comment says.
There was a problem hiding this comment.
Thanks! I'll fix it.
|
FYI @yannisk2 |
|
@frreiss Thanks for review. I addressed all of your comments so far. |
|
@frreiss I missed one. I'll work on it. |
113ae32 to
0f47f16
Compare
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Signed-off-by: Akihiko Kuroda <akihikokuroda2020@gmail.com>
Misc PR
Type of PR
Description
Testing