Get unique records from datasets

Reply
Yellow Belt

Get unique records from datasets

Hi,

 

I have two reports for last and current month which has size, folder name, date as fields. There are duplicates between last month's report and current month's report. When I combine both the reports, the output dataset should not double up - meaning if there are same folder name from previous month report, it should just show if there was a difference between last report and this report.  For example, the folder called "Test1" did not change in size so no updates are needed for that folder. Can you please suggest me on this?

 

Thanks in advance


Accepted Solutions
Black Belt

@Khan_Naziya 

 

you're using confusing language sorry, it's not clear to me if you're asking for an UPSERT scheme which keeps only the most recent version of each row or if you're looking for a partition scheme where you only keep the block of data from the most recent month.  OR if you're looking for a way to report if the value changed from one month to the next (slowly changing dimension).

 

I'm sprinkling in what may sound like jargon b/c they are 'terms-of-art' with 20 years of history each.

 

If you need UPSERT or Partitioning, you can look at a recursive dataflow pattern.  https://www.youtube.com/watch?v=JNQFZCj8JcQ

 

The difference between UPSERT or Partitioning would just be how you define your RANK partition, do you make it at the granularity of one row, or month or folder_month.

 

if you need slowly changing dimensions, you still implement the RANK to get the row version number, but then you add a LAG function to get the value last period.


Jae Wilson
Check out my Domo Training YouTube Channel

**Say "Thanks" by clicking the heart in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"

View solution in original post

The complete guide to building recursive dataflows in #Domo using #MagicETL. We'll build 4 recursive datasets, ''normal recursive dataflows", capturing "ful...

All Replies
Black Belt

@Khan_Naziya 

 

you're using confusing language sorry, it's not clear to me if you're asking for an UPSERT scheme which keeps only the most recent version of each row or if you're looking for a partition scheme where you only keep the block of data from the most recent month.  OR if you're looking for a way to report if the value changed from one month to the next (slowly changing dimension).

 

I'm sprinkling in what may sound like jargon b/c they are 'terms-of-art' with 20 years of history each.

 

If you need UPSERT or Partitioning, you can look at a recursive dataflow pattern.  https://www.youtube.com/watch?v=JNQFZCj8JcQ

 

The difference between UPSERT or Partitioning would just be how you define your RANK partition, do you make it at the granularity of one row, or month or folder_month.

 

if you need slowly changing dimensions, you still implement the RANK to get the row version number, but then you add a LAG function to get the value last period.


Jae Wilson
Check out my Domo Training YouTube Channel

**Say "Thanks" by clicking the heart in the post that helped you.
**Please mark the post that solves your problem by clicking on "Accept as Solution"

View solution in original post

The complete guide to building recursive dataflows in #Domo using #MagicETL. We'll build 4 recursive datasets, ''normal recursive dataflows", capturing "ful...
Announcements
New & Improved Dojo: Get excited for a new feel and additional features in the online community.

Site will be down February 4-5th for maintenance.

Launch coming February 8th!