Added new checkfunctions from 1.4 to model by mike-finopsorg · Pull Request #141 · finopsfoundation/focus_validator

mike-finopsorg · 2026-05-14T11:47:42Z

Summary

Adds the check functions needed for FOCUS 1.4 requirements modeling, simplifies the FormatJSON check, and tightens datetime parsing in the parquet loader.

New check functions

TypeJSON: column must be of type JSON
CheckJSONSchema: JSON values must conform to a JSON Schema referenced by SchemaId
CheckRegexMatch: column must match a regex pattern
CheckStringEndsWith: column must end with a specific value
CheckGreaterThanValue: column must be greater than a threshold
CheckLessOrEqualThanValue: column must be less than or equal to a threshold
CheckColumnComparison: two columns compared with a configurable Comparator (=, !=, <>, >, >=, <, <=)
CheckNoDuplicates: column must contain no duplicate values

Other changes

Rewrote FormatJSON to use json_valid(CAST(col AS VARCHAR)) instead of the previous TRY_CAST + typeof logic. The predicate now also applies the row-filter condition.
parquet_data_loader.py strategy 6 (auto-format inference) now requires every value to convert successfully, matching strategies 1 to 5. Previously it accepted partial success. Users may see more columns dropped if their data has mixed formats.
Added jsonschema ^4.25.1 as a runtime dependency for CheckJSONSchema.
Bumped version to 2.2.0.

Closes #140.

Signed-off-by: Mike Fuller mike@finops.org

Bumped version Signed-off-by: Mike Fuller <mike@finops.org>

Signed-off-by: Mike Fuller <mike@finops.org>

Matt-Cowsert

Thanks for putting this together. A few notes below.

The main item is CheckColumnComparison: as written it flags rows where either column is null, which disagrees with the convention CheckSameValue already establishes (skip the row when either side is null). Worth aligning those two before merge. The rest are smaller refinements.

I've also drafted a fuller PR description below so the title doesn't carry the whole story by itself. Feel free to tweak.

Proposed PR description

## Summary
Adds the check functions needed for FOCUS 1.4 requirements modeling, simplifies the FormatJSON check, and tightens datetime parsing in the parquet loader.

## New check functions
- `TypeJSON`: column must be of type JSON
- `CheckJSONSchema`: JSON values must conform to a JSON Schema referenced by `SchemaId`
- `CheckRegexMatch`: column must match a regex pattern
- `CheckStringEndsWith`: column must end with a specific value
- `CheckGreaterThanValue`: column must be greater than a threshold
- `CheckLessOrEqualThanValue`: column must be less than or equal to a threshold
- `CheckColumnComparison`: two columns compared with a configurable `Comparator` (=, !=, <>, >, >=, <, <=)
- `CheckNoDuplicates`: column must contain no duplicate values

## Other changes
- Rewrote `FormatJSON` to use `json_valid(CAST(col AS VARCHAR))` instead of the previous `TRY_CAST + typeof` logic. The predicate now also applies the row-filter condition.
- `parquet_data_loader.py` strategy 6 (auto-format inference) now requires every value to convert successfully, matching strategies 1 to 5. Previously it accepted partial success. Users may see more columns dropped if their data has mixed formats.
- Added `jsonschema ^4.25.1` as a runtime dependency for `CheckJSONSchema`.
- Bumped version to 2.2.0.

Closes #140.

Signed-off-by: Mike Fuller <mike@finops.org>

Additional notes

Consolidating threshold checks

CheckGreaterThanGenerator (line 1856), CheckLessOrEqualGenerator (line 1907), and the existing CheckGreaterOrEqualGenerator (line 1790) are nearly identical, differing only in the operator. CheckColumnComparisonGenerator (line 1960) already accepts the operator as a parameter, so the same approach could collapse these three into one. Not in scope for this PR, just flagging for a future cleanup.

Test coverage

The new tests cover happy paths only. Worth adding:

CheckColumnComparison with one or both columns null (catches the null-handling item below)
CheckStringEndsWith with multi-byte characters
CheckJSONSchema with malformed paths and with jsonschema unavailable
CheckNoDuplicates combined with a row-filter condition

Matt-Cowsert · 2026-05-20T03:50:23Z

+        msg_sql = message.replace("'", "''")
+
+        pass_predicate = f"{col_a} IS NOT NULL AND {col_b} IS NOT NULL AND {col_a} {comparator} {col_b}"
+        condition = f"NOT ({pass_predicate})"


Suggested change

condition = f"NOT ({pass_predicate})"

condition = f"{col_a} IS NOT NULL AND {col_b} IS NOT NULL AND NOT ({col_a} {comparator} {col_b})"

The current condition expands to A IS NULL OR B IS NULL OR NOT(A op B), which flags rows where either column is null. CheckSameValueGenerator (line 1590) uses the opposite convention: skip those rows, only flag rows where both columns have values and the comparison fails. Aligning here keeps the two functions consistent.

Matt-Cowsert · 2026-05-20T03:50:23Z

+        col_a = self.params.ColumnAName
+        col_b = self.params.ColumnBName
+        comparator = self.params.Comparator
+        condition = f"NOT ({col_a} IS NOT NULL AND {col_b} IS NOT NULL AND {col_a} {comparator} {col_b})"


Suggested change

condition = f"NOT ({col_a} IS NOT NULL AND {col_b} IS NOT NULL AND {col_a} {comparator} {col_b})"

condition = f"{col_a} IS NOT NULL AND {col_b} IS NOT NULL AND NOT ({col_a} {comparator} {col_b})"

Same null-handling fix in get_sample_sql so the sample violation output doesn't include rows that weren't actually flagged.

Matt-Cowsert · 2026-05-20T03:50:23Z

+        message = self.errorMessage or f"{col} {keyword} end with '{value}'."
+        msg_sql = message.replace("'", "''")
+
+        condition = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) != '{value_sql}'"


Suggested change

condition = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) != '{value_sql}'"

condition = f"{col} IS NOT NULL AND NOT ends_with(CAST({col} AS VARCHAR), '{value_sql}')"

DuckDB has a built-in ends_with(string, suffix). Using it removes the manual length calculation. After this and the matching changes on lines 1561 and 1573, value_len = len(str(value)) on line 1542 (and the one in get_sample_sql) can be deleted.

Matt-Cowsert · 2026-05-20T03:50:23Z

+        FROM invalid
+        """
+
+        predicate_sql = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) = '{value_sql}'"


Suggested change

predicate_sql = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) = '{value_sql}'"

predicate_sql = f"{col} IS NOT NULL AND ends_with(CAST({col} AS VARCHAR), '{value_sql}')"

Matt-Cowsert · 2026-05-20T03:50:23Z

+        value_sql = str(value).replace("'", "''")
+        value_len = len(str(value))
+
+        condition = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) != '{value_sql}'"


Suggested change

condition = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) != '{value_sql}'"

condition = f"{col} IS NOT NULL AND NOT ends_with(CAST({col} AS VARCHAR), '{value_sql}')"

Matt-Cowsert · 2026-05-20T03:50:23Z

+            Draft202012Validator.check_schema(schema)
+            validator = Draft202012Validator(schema)
+            table_name = getattr(self.params, "table_name", "focus_data")
+            sql = query.replace("{{table_name}}", table_name)


Suggested change

sql = query.replace("{{table_name}}", table_name)

sql = query.replace("{table_name}", table_name)

The f-string that builds query on line 1224 uses {{table_name}} to escape a single brace, so query only contains {table_name}. This line looks for {{table_name}} (double-braced) and never matches anything. Line 1240 below does the actual replacement.

Matt-Cowsert · 2026-05-20T03:50:23Z

-            f"{col} IS NOT NULL "
-            f"AND (TRY_CAST({col} AS JSON) IS NOT NULL "
-            f"OR (typeof({col}) = 'VARCHAR' AND json_valid({col}::TEXT)))"
+        predicate_sql = self._apply_condition(


The rewritten FormatJSONGenerator wraps its predicate in _apply_condition. None of the other check functions in the file (CheckSameValueGenerator line 1590, CheckGreaterOrEqualGenerator line 1790, ColumnByColumnEqualsColumnValueGenerator line 1746, others) do this. test_format_json_generator_applies_row_condition_to_predicate shows it's deliberate, but the codebase now has two patterns. Worth either applying the new pattern to the other check functions or leaving a short note here explaining why FormatJSON is the exception.

Co-authored-by: Matt Cowsert <matthew@finops.org>

Added new checkfunctions from 1.4 to model

ea89362

Bumped version Signed-off-by: Mike Fuller <mike@finops.org>

mike-finopsorg self-assigned this May 14, 2026

mike-finopsorg added 2 commits May 14, 2026 21:56

fix linter checks

a44e789

Signed-off-by: Mike Fuller <mike@finops.org>

fix failing test

3010874

Signed-off-by: Mike Fuller <mike@finops.org>

mike-finopsorg mentioned this pull request May 20, 2026

[AI] Update FOCUS Validator to work with FOCUS 1.4 Requirements Model FinOps-Open-Cost-and-Usage-Spec/FOCUS_Spec#1984

Open

Matt-Cowsert requested changes May 20, 2026

View reviewed changes

Update focus_validator/config_objects/focus_to_duckdb_converter.py

970055e

Co-authored-by: Matt Cowsert <matthew@finops.org>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added new checkfunctions from 1.4 to model#141

Added new checkfunctions from 1.4 to model#141
mike-finopsorg wants to merge 4 commits into
devfrom
140-update-focus-validator-to-support-focus-14-requirements-modelling

mike-finopsorg commented May 14, 2026 •

edited

Loading

Uh oh!

Matt-Cowsert left a comment

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Matt-Cowsert May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	condition = f"NOT ({pass_predicate})"
	condition = f"{col_a} IS NOT NULL AND {col_b} IS NOT NULL AND NOT ({col_a} {comparator} {col_b})"

	condition = f"NOT ({col_a} IS NOT NULL AND {col_b} IS NOT NULL AND {col_a} {comparator} {col_b})"
	condition = f"{col_a} IS NOT NULL AND {col_b} IS NOT NULL AND NOT ({col_a} {comparator} {col_b})"

	condition = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) != '{value_sql}'"
	condition = f"{col} IS NOT NULL AND NOT ends_with(CAST({col} AS VARCHAR), '{value_sql}')"

	predicate_sql = f"{col} IS NOT NULL AND RIGHT(CAST({col} AS VARCHAR), {value_len}) = '{value_sql}'"
	predicate_sql = f"{col} IS NOT NULL AND ends_with(CAST({col} AS VARCHAR), '{value_sql}')"

	sql = query.replace("{{table_name}}", table_name)
	sql = query.replace("{table_name}", table_name)

Conversation

mike-finopsorg commented May 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

New check functions

Other changes

Uh oh!

Matt-Cowsert left a comment

Choose a reason for hiding this comment

Proposed PR description

Additional notes

Consolidating threshold checks

Test coverage

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Matt-Cowsert May 20, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mike-finopsorg commented May 14, 2026 •

edited

Loading