Skip to content

Issue with Databricks Type Export and Validation in CLI Version 0.11.x #1048

@IchEssBlumen

Description

@IchEssBlumen

We have encountered an issue with the export and validation of Databricks physical types when upgrading from version 0.10.x to 0.11.x. The types that are written into the data contract YAML format have changed, and this change affects the validation process.

In version 0.11.x, the following changes in type representation were noted:

  1. string is now represented as StringType()
  2. integer is now represented as IntegerType()
  3. etc.

While validation checks confirm the presence of fields, they incorrectly report their types as None, without actual validation:

Validation Results:
│ passed │ Check that field 'string_test_1' is present
│ passed │ Check that field string_test_1 has type None
│ passed │ Check that field 'bool_test' is present
│ passed │ Check that field bool_test has type None
│ passed │ Check that field 'date_test_1' is present
│ passed │ Check that field date_test_1 has type None
│ passed │ Check that field 'num_test_1' is present
│ passed │ Check that field num_test_1 has type None

Expected Behavior:
Databricks datatypes should be exported similar to how 0.10.x. OR: Validation should recognize the updated types and validate the fields accordingly rather than defaulting to None.

Attachments:
Detailed testing results: datacontract-databricks-datatypes-export-and-test.xlsx

Code Snippet for Data Contract Generation:
from datacontract.data_contract import DataContract
data_contract_specification = DataContract().import_from_source("spark", 'abc.def.table')
data_contract = DataContract(data_contract=data_contract_specification, spark=spark)
contract_yaml = yaml.safe_load(data_contract.export("odcs").replace("\xa0", " "))

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions