A better approach to validating Datasets

Problem

Domo reports can only be trusted if data sets have been validated, however Domo does not appear to provide much support for validation and work needs to be manaully validate.

 

Background

1) Domo provides a rapid building environment (Magic) for joining and transforming data, however the risk is the new data set does not accurately reflect the inputs.

 

2) There are existing reports, generated outside of domo, that we need to validate against

 

3) Input data sets can introduce problems long after the transforms have been created.

 

Typical best practice is to create a few cards and sumos then manually validate the output data set agaisnt the inputs or the reports.

 

When doing data transformation in Python I typically wrote validation tests to automate this and ongoinly validate my data. 

 

Suggestion

Create a validation tool that regularly checks a data set based on a set of rules/test such as:

  • Type: Column is a Number or  a Date etc. 
  • Value: Column is in [or not in] a range (numeric, date) or list (text), eg, 'Value' > 0 or Value NOT NULL.
  • Relationship: Column value is in a related table (FK/PK relationship)
  • Aggregate: An Aggregate in this table (or filtered selection) matches an aggrigate in another. 
  • Lenght: The lengh of a table (or filtered selection) matches some other table.

 

The validations whould be associated with the data set and probably based on SQL

There are probably more but, this woudl be a great start for checkign an output table against the inputs or an existing report table. 

0
0 votes

· Last Updated

This discussion has been closed.