Deduplicating a table based on the content of certain columns

I have a dataset that has about 20 columns in it. The first column contains ID numbers and a lot of the ID numbers are duplicated multiple times. All the other data in the columns are also duplicated multiple times except one column named last_updated. The last_updated column lists a date.

I would like to de-dupe this dataset and keep the rows with the most recent dates in the last_updated columns. Is there a way to do this?

Tagged:

Comments

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    The easiest way to do this is to use the Group By tile in Magic ETL. Add all your columns in the select except for the last_updated column. Add that one to the aggregated column list and choose Max. This will give you the most recent date for each.




    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.