Append only missing rows

Reply
White Belt

Append only missing rows

I have 2 datasets. I want to append the data togeather and keep all the lines from dataset A but only the lines from dataset B that are not represented in dataset A. I want to do this on a unique identifier. Any hints on how this would best be occoplished. See attached for an example. 

 

Screen Shot 2020-06-02 at 11.49.13 PM.png

 

Tags (1)
Black Belt

In the official Domo literature and training they'd recommend you use a 'Remove duplicates' tile and build a recursive dataflow.

https://knowledge.domo.com/Prepare/DataFlow_Tips_and_Tricks/Creating_a_Recursive%2F%2FSnapshot_ETL_D...

 

Problem with 'remove duplicates' is there's implicit randomness to how 'duplicates' are defined because you cannot explicitly define that you want to keep data from the left.

 

You have a few options.

 

You can APPEND, and then use a Window & Rank tile to assign a row_number.  As part of that process you can apply an ORDER BY clause (date is a popular option) and then PARTITION BY the row_id column.  If you then filter where row_number = 1, you will deduplicate rows.

 

Alternatively, if there is no reasonable column to ORDER BY, you can OUTER JOIN the data together.

FILTER where your LEFT row_id is not null, then SELECT the appropriate columns from the left side.  and if your LEFT row_id is null then SELECT the appropriate columns from the right side.

 

You can do this in Magic or MySQL ... you just have to figure out how to do a row_number with a partition clause in MySQL.

 

Hope that helps!

 

 


Jae Wilson
Check out my Domo Training YouTube Channel

**Say "Thanks" by clicking the heart in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"
Announcements
New & Improved Dojo: Get excited for a new feel and additional features in the online community.

Site will be down February 4-5th for maintenance.

Launch coming February 8th!