Skip to content

Adds ontology support #255

Merged
PGijsbers merged 6 commits intoopenml:mainfrom
Prtm2110:ontology
Mar 16, 2026
Merged

Adds ontology support #255
PGijsbers merged 6 commits intoopenml:mainfrom
Prtm2110:ontology

Conversation

@Prtm2110
Copy link
Contributor

Hi, I haven't read guild lines yet so pardon me in advance and please let me know if you need any changes - Fixes #237

Few Questions-

  • The POST endpoint does not exist in the Python API at all, it's an admin-only endpoint in PHP used by the evaluation engine to upload extracted features. Should this PR also add that endpoint, or should it be a separate issue/PR?
  • Should I add tests as well?

CC: @PGijsbers

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Feb 24, 2026

Warning

Rate limit exceeded

@PGijsbers has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 12 minutes and 31 seconds before requesting another review.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 16ea4fc6-99ad-4089-9bb4-696a8849f93f

📥 Commits

Reviewing files that changed from the base of the PR and between 7ef810b and 40caa49.

📒 Files selected for processing (2)
  • src/database/datasets.py
  • tests/routers/openml/migration/datasets_migration_test.py

Walkthrough

Added an async data-layer function get_feature_ontologies(dataset_id: int, connection: AsyncConnection) -> dict[int, list[str]] that queries data_feature_description for description_type = 'ontology' and aggregates ontology strings by feature index. The dataset feature assembly fetches these ontologies and assigns them to each Feature as feature.ontology. The Feature schema was extended with an optional field ontology: list[str] | None so ontology lists are included in serialized API responses.

🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (1 warning, 1 inconclusive)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Linked Issues check ❓ Inconclusive The PR implements ontology support in feature descriptions via get_feature_ontologies function and Feature model updates, partially addressing issue #237 requirements. The PR lacks the POST /data/features endpoint implementation. Clarify whether this endpoint should be added to complete issue #237 or if it should be addressed separately.
✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Adds ontology support' directly and clearly describes the main change in the PR.
Description check ✅ Passed The description explains the PR adds ontology support to fix issue #237, though informal and includes meta-questions about contribution guidelines.
Out of Scope Changes check ✅ Passed All changes are directly related to adding ontology support as specified in issue #237 requirements.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

CodeRabbit can use TruffleHog to scan for secrets in your code with verification capabilities.

Add a TruffleHog config file (e.g. trufflehog-config.yml, trufflehog.yml) to your project to customize detectors and scanning behavior. The tool runs only when a config file is present.

Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 1 issue, and left some high level feedback:

  • In get_feature_ontologies, consider using rows.mappings() (as in get_features) and accessing row["index"]/row["value"] for consistency and to avoid relying on attribute-style access to row fields.
  • When attaching ontologies in get_dataset_features, you can simplify the loop by using feature.ontology = ontologies.get(feature.index) instead of the explicit if feature.index in ontologies check.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `get_feature_ontologies`, consider using `rows.mappings()` (as in `get_features`) and accessing `row["index"]`/`row["value"]` for consistency and to avoid relying on attribute-style access to row fields.
- When attaching ontologies in `get_dataset_features`, you can simplify the loop by using `feature.ontology = ontologies.get(feature.index)` instead of the explicit `if feature.index in ontologies` check.

## Individual Comments

### Comment 1
<location path="src/database/datasets.py" line_range="134-143" />
<code_context>
     return [Feature(**row, nominal_values=None) for row in rows.mappings()]


+def get_feature_ontologies(dataset_id: int, connection: Connection) -> dict[int, list[str]]:
+    rows = connection.execute(
+        text(
+            """
+            SELECT `index`, `value`
+            FROM data_feature_description
+            WHERE `did` = :dataset_id AND `description_type` = 'ontology'
+            """,
+        ),
+        parameters={"dataset_id": dataset_id},
+    )
+    ontologies: dict[int, list[str]] = {}
+    for row in rows:
+        ontologies.setdefault(row.index, []).append(row.value)
+    return ontologies
+
</code_context>
<issue_to_address>
**issue (bug_risk):** Accessing `row.index` on a SQLAlchemy Row is likely using the positional index method rather than the `index` column value.

Because `Row` defines `index()` for positional lookup, `row.index` is ambiguous and may not return the `index` column value. Similarly, `row.value` is less explicit than using the mapping interface. To avoid subtle bugs, either iterate over `rows.mappings()` and use `row["index"]` / `row["value"]`, or access `row._mapping["index"]` and `row._mapping["value"]` here.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/database/datasets.py`:
- Around line 134-148: The function get_feature_ontologies currently iterates
over SQLAlchemy Row objects and accesses row.index/row.value which can collide
with Row attributes; change the query iteration to use .mappings() (e.g.,
iterate over rows.mappings()) and access columns via mapping["index"] and
mapping["value"] so the ontologies dict is populated safely; update references
in get_feature_ontologies to use the mapping keys instead of attribute access.

ℹ️ Review info

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6a0fed5 and 2f98934.

📒 Files selected for processing (3)
  • src/database/datasets.py
  • src/routers/openml/datasets.py
  • src/schemas/datasets/openml.py

@PGijsbers
Copy link
Contributor

Thanks for taking the time to contribute! And sorry for the delay. Normally, yes, you would need to add tests. However, in this case #262 also implemented this feature and did add tests. Because you were first (and because this implementation doesn't sort ontology results -- something that seems useless provided they're returned as a mapping), I will go ahead and allow this, and take the tests from the other PR. That way you can both share credit :)

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/database/datasets.py`:
- Around line 137-151: The get_feature_ontologies function is using an undefined
Connection type and is implemented synchronously while the module and callers
use async connections; change its signature to use AsyncConnection, import
AsyncConnection from sqlalchemy.ext.asyncio (or the project's configured alias),
make the function async def get_feature_ontologies(...), await
connection.execute(...) and iterate the result using async-compatible methods
(e.g., rows = await connection.execute(...); for row in rows.mappings(): or use
rows.fetchall() with await if needed), and update any type hints/imports to
remove the undefined Connection reference so callers passing an AsyncConnection
work correctly.

In `@src/routers/openml/datasets.py`:
- Around line 297-300: The call to get_feature_ontologies is currently invoked
synchronously but should be awaited; update the call in the openml datasets
router to use await when calling
database.datasets.get_feature_ontologies(dataset_id, expdb) (so ontologies
receives the resolved result) after you fix get_feature_ontologies to be async
in src/database/datasets.py; keep the subsequent loop that assigns
feature.ontology = ontologies.get(feature.index) unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 05f04624-fb4e-4782-9cf9-e27bacf3c48c

📥 Commits

Reviewing files that changed from the base of the PR and between 732b326 and 1cef353.

📒 Files selected for processing (3)
  • src/database/datasets.py
  • src/routers/openml/datasets.py
  • src/schemas/datasets/openml.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/schemas/datasets/openml.py

@codecov
Copy link

codecov bot commented Mar 16, 2026

Codecov Report

❌ Patch coverage is 27.27273% with 8 lines in your changes missing coverage. Please review.
⚠️ Please upload report for BASE (main@f94808c). Learn more about missing BASE report.

Files with missing lines Patch % Lines
src/database/datasets.py 28.57% 5 Missing ⚠️
src/routers/openml/datasets.py 0.00% 3 Missing ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #255   +/-   ##
=======================================
  Coverage        ?   54.52%           
=======================================
  Files           ?       34           
  Lines           ?     1458           
  Branches        ?      118           
=======================================
  Hits            ?      795           
  Misses          ?      661           
  Partials        ?        2           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@PGijsbers PGijsbers merged commit c617d6d into openml:main Mar 16, 2026
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add ontology support

2 participants