Remove duplicates from large dataset

Is there a faster way than ETL to remove duplicates from a very large dataset?

Comments

  • A few options you might try:

    1. Depending on the end view you're after on your cards, you could leverage a distinct operation in a calculated field:Β 

    count(distinct `fieldName`)

    2. Leverage either the R or Python plugins to pull down the data, run a de-duplication function, and then push the data back into Domo:

    R: unique(yourDataFrame)
    Python: drop_duplicates(yourDataFrame)

    Β 

    Stack Exchange Reference:

    R Example

    Python Example

  • jlazerus
    jlazerus 🟑

    That's great, thanks. I'll give those a try.