Dataset overall freshness
Dear MajorDomo's,
I have a question / suggestion. Just came back from Domopalooza and saw the soon-to-be-released Lineage function. Great. Now what I want is a status icon (similar to workbench job status icons) for the overall freshness of the dataset. Did all predecessor datasets get refreshed on time? Did all predecessor dataflows run correctly before this dataset was published? Am I looking at really good data or sort of good data? It would be nice to know.
I have a detailed writeup with pictures attached, but I'll try to do it justice here.
Dojo team members
Re: “Idiot light” indicator for health and status of dataflow dependencies
Summary:
Dataflows are often the lifeblood of any data science result. Not surprisingly, each dataflow can be scheduled individually. They can also be piped into subsequent flows to be combined with other datasets. I would like to determine a method to create a ‘roll-up’ status light (red, green, yellow) that examines all prerequisite dataflows in any given set.
Executive Summary:
The idea originates from the main jobs screen in Workbench below, one can clearly see the status of any scheduled workbench job.
Figure 1: Workbench job status
This is extremely valuable for assessing status of the data loading processes. This philosophy is carried through to the dataset and dataflow area as well – see Figure 2 below.
Since dataflows and datasets can be chained together, sophisticated results can be obtained that drive a data science project. However, the status information of the chain is lacking. If one dataset in the chain is out of date, the successor sets are not provided with a status indicator of any provenance issues. Now that the Lineage function is becoming available, it may be possible to show the status of the entire flow on the screen.
Figure 2: Data set status
This suggestion is to allow MajorDomos, Data Scientists, and data set owners to build a status card of the constituent datasets and flows. Clearly the data is available; it just needs to be accessible to those concerned with the status. Below is a chart of a typical chained dataflow in the EMC instance. Each box lists the input dataset, the name of the data flow (in purple), and the resulting dataset output. The flows are chained together to produce multiple output datasets, used for various reporting purposes. In this scenario, if one dataset is old or stale, successor datasets are still reported as ‘Good’ or green because their individual flow has run; however, there is no indicator that the data was in fact, stale.
A simple output card for all the constituent datasets and flows would suffice to address this – at least until the Domo team decides to act on the overall need of data provenance (ala the Lineage diagram for dataflows).
In effect, this is very similar to a piping diagram in a power plant; the same principles apply.
The Ask: The ask in the short term is to identify the mechanism which can be used to create the summary card outlined above.
Comments
-
Thank you for submitting this @mcoblentz. I am assigning to our product manager @ckwright to review and comment.
0 -
CC @JonSharp
Dani aka "Mr.Dojo"
Dojo Admin
**Say "Thanks" by clicking the "heart" in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"
**You can update your Dojo Community name and avatar by clicking on your avatar then the "My Profile" button.0 -
0
Categories
- 10.8K All Categories
- 3 Private Company Board
- 1 APAC User Group
- 12 Welcome
- 39 Domo News
- 9.7K Using Domo
- 1.9K Dataflows
- 2.5K Card Building
- 2.2K Ideas Exchange
- 1.2K Connectors
- 343 Workbench
- 260 Domo Best Practices
- 11 Domo Certification
- 465 Domo Developer
- 50 Domo Everywhere
- 106 Apps
- 717 New to Domo
- 85 Dojo
- Domopalooza
- 1.1K 日本支部
- 4 道場-日本支部へようこそ
- 27 お知らせ
- 64 Kowaza
- 299 仲間に相談
- 654 ひらめき共有