Generate an Average ignoring duplicate rows

I am trying to generate an average value for a single value card, The data has duplicates of this value due to other data within the set. In the example below, I am looking for the average number of seconds per customer.

I need to be able to dedupe Customer 1, then average the remaining rows. Can this be done with a Beast Mode calculation, if so, how?

In the example above the average should be 72.5, not 61.




  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @mberkeley

    Is it possible for a customer to have different times or are the times always the same? You'll likely need to remove the duplicates via a data flow or data view before brining it into your card to calculate your average.

  • A customer could have different values at different times. One of the other values that would be used to uniquely identify a row would be a timestamp.

  • I am really looking for AVG('time') for each distinct (or avg) time per customer/timestamp)

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @mberkeley if i had to guess you have times duplicated because you JOIN'ed two datasets together.

    If that's the case, instead of JOINing them APPEND them. That way your times don't Duplicate and you don't have to do avg(DISTINCT)

  • Unfortunately not, these are actually web analytics where there are several values for page categories and the value for ‘Time On Page’ that i am trying to average. So I am looking for the average time on page, ignoring all of the extra rows with category info.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @mberkeley then split your data upstream into the two separate columns / sets of data.

    create a stream of data that includes excludes / dedupes category info add a constant "Activity Type" = "page_stats".

    create a second stream that includes category info but doesn't include the metrics for "time in seconds" "ActtivityType" = "category stats" then append them.

    now you have a dataset where you don't have to do complex math on the page_stats columns.