Summing unique values based on a separate column

Having issue trying to sum values that have a one to many relationship without over-counting the value. 

I have marketing events that I'm trying to attribute opportunity created from and get the overall ROI while maintaining drill down functionality within the dataset itself. This is leading to marketing events that resulted in multiple opportunities being over counted when i sum up the marketing event costs. 

 

Is there a way to sum the distinct values of my marketing events based on the marketing ID? Using the sum(distinct (EVENT COST)) formula doesn't work unless every marketing event has a different cost, otherwise I'll actually undercount marketing costs when calculating ROI.

 

Is the only way to get around this to sum up the value of opportunity created by event with no granularity as a separate data flow and use this as the first layer of any card and a more granular data set for drill downs? I'd rather avoid something like this if possible.

Best Answer

  • Tomo
    Tomo 🟢
    Accepted Answer

    Hi.

     

    Add "EventSequenceNumber" as DataFlow as below.

     

    MarketingIDEventIDEventSequenceNumberEventCost
    m01e011110000
    m01e021210000
    m01e022310000
    m02e100130000
    m02e300230000
    m02e500330000

     

    BEAST MODE: MarketingCost

    sum(
    case
    when `EventSequenceNumber` = 1 then `EventCost`
    else 0
    end
    )

Answers

  • Any chance you could provide a snippet of your data?  I'm not sure exactly what you are looking for, and it may help provide some suggestions.

  • Hi, thanks for the solution, it should work for me too.

     

    But could you please tell me how to add that "EventSequenceNumber" column using dataflow?

     

    Thanks!

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    In Tomo's example they are doing a row_number() over the marketing_id column (this can be done in SQL or Magic ETL, but given what you described, I don't think that's what you're looking for.  @ST_-Superman-_ said it best, for problems like this, a sample of data would be most beneficial.

     

    if i had to guess, you have two sets of data, event_costs, and event_attendees, and you're trying to ascertain cost per attendee by event.  or something along those lines.

     

    instead of JOIN'ing the data together, APPEND / UNION the data together, and then use a beast mode to construct the math.

     

    sum(Cost) / COUNT(Attendees).    You can now solve your question without any DISTINCT shenanigans.

     

    Capture.PNG

  • Hi. I have a similar case but I'm not sure how to handle this. Scenario is something like this:

    Let's say I want to build 2 cards using Tomo's sample data table (assuming all data is for 1 month):

    1) Single Value card showing Total EventCost for the month (10,000 + 30,000 = 40,000) since we are assuming other rows are duplicate; and

    2) Filter card (where one can select either MarketingID or EventID).

    If I filter the EventID Filter card = 'e011', the Total EventCost in the single value card should show 10,000.

    If I filter the EventID Filter card = 'e011' AND 'e021', the Total EventCost is still 10,000.

    If I don't have anything selected in the filter card, Total EventCost should show 40,000.

    Is this possible?