Sum of Distinct Count Variance

Hi

 

I need help on below beast mode not working.  I am trying to get the sum of all the variance using distinct count of customers.  I am trying to get my total points of distribution gained. Thank you in advance.

 

SUM(COUNT(DISTINCT
CASE WHEN `Year`= 2020 AND `Month #` <=`Month # for Today`
THEN `Customer Name`
END
))
-
SUM(COUNT(DISTINCT
CASE WHEN `Year`= 2019 AND `Month #` <=`Month # for Today`
THEN `Customer Name`
END
))

Best Answer

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤
    Accepted Answer

    So ... while window functions are amazing for many things, it won't work when you're trying to do a count distinct.

     

    why?  

     

    SUM(

    COUNT(DISTINCT

    CASE WHEN `Year`= YEAR(CURRENT_DATE()) AND `Month #` <=`Month # for Today` THEN `Customer Name`
    END
    ))
    OVER ()

     

    keep in mind what is happening under the hood.  first Domo does a COUNT(DISTINCT) of customer names, for whatever is on the Axis.  (set of items) 

    then it adds up all the results.  

    so if i had item 1 = 15 distinct customers, item2  = 12 distinct customers and item3 = 31 distinc customers, i can't just sum 15,12,31... because presumeably i have overlap in the customers who bought items.

     

    SOLUTION

    APPEND your data onto itself (UNION) but change the Item number in the second UNION to be a constant 'Total Items'.

     

    now you can do a basic count(distinct customer) and you'll have a 'Total Items' ... item.

     

Answers

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @user033690 

     

    You might be able to solve your problem using a windowing function however I'd recommend restructuring your data. I've done a previous write up regarding customized period over period data models. You can refer to https://dojo.domo.com/t5/Card-Building/show-percent-on-period-over-period-graph/m-p/50540/highlight/... for an in dept description of what needs to happen.

     

    The basics being you'd have a period type column for Current and Last Year (which you'd filter on just these two) which you can then use a beast mode to calculate the total for that period and then subtract the two numbers. I'd highly recommend this method as it gives you a lot more flexibility in your analysis.

     

     

    If you must use a beast mode you could try something like:

    SUM(COUNT(DISTINCT
    CASE WHEN `Year`= YEAR(CURRENT_DATE()) AND `Month #` <=`Month # for Today`
    THEN `Customer Name`
    END
    ))
    OVER ()
    -
    SUM(COUNT(DISTINCT
    CASE WHEN `Year`= YEAR(CURRENT_DATE())-1 AND `Month #` <=`Month # for Today`
    THEN `Customer Name`
    END
    ))
    OVER ()

    I've also replaced 2020 and 2019 with YEAR(CURRENT_DATE()) functions to automatically calculate the current year and last year (hence the -1) so you don't have to worry about changing the year value next year and in future years. (Unless you specifically want 2020 and 2019 then use those values)

  • Thank you for the response Grant.  

     

    I tried the BM you suggested. unfortunately it gives me a different result.  Please review the table below.  I am looking to get the FY20-FY19 variance both by item and as a total.  I hope you can help me figuring this out.  Thank you in advance.

     

    Item NumberFY20 Customer CountFY19 Customer CountBM resultShould be result
    1012B96848912
    1042B9183898
    10449084896
    1022B80698911
    1014B6764893
    1092B5852896
    Total48243653446