Use Domo cards to monitor, manage and optimise ETLs and data sets to improve data quality

Problem: Domo has a lot of information about ELTs and data sets but it is in disparate locations and much of it can't be monitored and optimise using Domo cards and alerts. While there are some custom views (Data Warehouse) and some alerts, we can the full power of Domo to be used to monitor,managed and optimise  ELTs and data sets.   Without this its difficult to ensure that the data is timely and accurate in complex sites. 

 

Proposal: Existing operational data about ETLs be written to a log file, which can then be used as a data source in Domo to create collections/cards such as:

  • ​​ETL operations : A Gantt chart of the select ETLs  showing the start/end time and status. This chart would enable customers to confirm that all the ETLs had run as expected. It would also show sequences of ETL that may not finish before the end of the day.
  • ETL performance: A chart of the run times for select ETLs. This would enable customers to create alerts for ETLs that taken longer, or shorter, than normal, indicating an underlying problem in the data.

Existing operational data about Data Sets be written to a log file, which can then be used as a data source in Domo to create collections/cards such as:

  • Data Freshness: A chart showing how current select data sets are. Alerts can be used to create a warning if the data has become stale. This can show problems with the ETLs and connections. 
  • Data Growth: A chart showing the number of rows in the select data sets. Alerts can be used to create warning if the data size changes significantly. A much smaller or larger data set might indicate a problem in the transforms and connectors. 

 

ELTs (Magic and SQL) be extended to support a validation/assertion logging framework. When an ETL is created custom validation/assertions for each steps can optionally be created, which are written to a log, e..g, ​​ 

  1. ​confirm that the number of rows before and after a transform step are the same.
  2. confirm that an aggregate (e.g., sum) of a column (e.g., media cost) is the same before and after the transform step.

 

The log would include sufficient details to support understanding the problem, e.g.,

        2017/10/14, Error, "LOOKUP Campaign Name", Size, 150, "The number of rows should remain constant in this transform. An increase indicates an error introduce in the JOIN."

3
3 votes

· Last Updated

Comments

  • I second the need to create Gantt Charts that can also be auto sent to clients to show the timeline of a project we are working on for them.

This discussion has been closed.