Magic ETL Exclusion

I'm looking for an efficient way to only keep contacts that appeared in the year 2018 and exclude contacts that appear in the year 2019 AND 2018. It's possible for a contact to appear several times in the same year, so I cannot simply use the Remove Duplicates tile. In Magic ETL, how would I do the equivalent of:

 

SELECT email_addr, year FROM Dataset1

where year = '2018' and email_addr not in

(

SELECT email_addr FROM Dataset1

where year = '2019'

);

 

My current method is using 1 Input DataSet, then using 2 filters. One for 2018 and the other for 2019. Then, I am using Join Data to do Left Outer. Next, I'm using a Filter to exclude any rows where the Year is null for the Datatset with a 2019 filter. It works, but is there a more efficient way to do this?

Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤
    Accepted Answer

    You did that correctly as it looks like you are wanting to get contacts that only visited in 2018 and exclude contacts that visited in 2018 and then again in 2019.

Answers

  • Correct me if I'm wrong, but you just want to have 2018 contacts?

    Why not just filter year 2018?

  • I just want contacts who are 12M+ "inactive". In my dataset, it's possible 2018 contacts could also be in 2019 and these would be "active" (recent) contacts.

  • You could also do a calculation to get the max date for each contact ID.  Then perform a calculation to determine how many days ago that was.  Finally filter out any contacts where there have been more than 365 days since the last activity.

     

    This way you wouldn't be restricted to a calendar year (and having to rework it each year) but would always just have a list of contacts that have activity in the last 365 days.