I have 2 datasets. I want to append the data togeather and keep all the lines from dataset A but only the lines from dataset B that are not represented in dataset A. I want to do this on a unique identifier. Any hints on how this would best be occoplished. See attached for an example.
Problem with 'remove duplicates' is there's implicit randomness to how 'duplicates' are defined because you cannot explicitly define that you want to keep data from the left.
You have a few options.
You can APPEND, and then use a Window & Rank tile to assign a row_number. As part of that process you can apply an ORDER BY clause (date is a popular option) and then PARTITION BY the row_id column. If you then filter where row_number = 1, you will deduplicate rows.
Alternatively, if there is no reasonable column to ORDER BY, you can OUTER JOIN the data together.
FILTER where your LEFT row_id is not null, then SELECT the appropriate columns from the left side. and if your LEFT row_id is null then SELECT the appropriate columns from the right side.
You can do this in Magic or MySQL ... you just have to figure out how to do a row_number with a partition clause in MySQL.