Magic ETL Remove Duplicates - Configure merge strategy

The Remove Duplicates feature allows me to define the effective composite key to merge by. However, it provides no configuration for which values to prefer on the remaining columns. This reduces the number of situations for which it is practically useful.

 

For arbitrary selection of other column data, it might be more useful to use an approach based on group by. However, remove duplicates seems more appropriate if I want to preserve row integrity.

 

In order to use remove duplicates effectively, I need to be able to choose the winning row somehow. This could maybe be achieved by checking if a column on a joined dataset was non-null or by ordering the values in a column to pick earlier/later timestamp.

 

This would be especially useful when remove duplicates is being used to cleanse existing data.

4
4 votes

· Last Updated

Comments

  • Thanks for submitting this idea @cr1ckt.  Assigning to our product manager @mattchandler for review.

  • Best,
    Matt Chandler
    Domo
  • Thanks @cr1ckt, this is great feedback. I agree that it would be a great improvement and have added it to our roadmap.


    Best,

    Matt Chandler

    Product Manager, Domo

    Best,
    Matt Chandler
    Domo
  • This would be very helpful. I have a number of cases where I append to a history file every night and want to replace some of the old rows with data from the new rows. 

     

     

This discussion has been closed.