Using aggregation in case statements

I have a data set of sales where a sale is either a "new sale" or an "add sale".  This value is stored in `calcSaleType`.  I would like to graph the ratio of new sales to add sales over time but keep turning up null values.

 

The value of a sale is stored in `total_cost`.

 

I've tried the following with no luck:

 

(case when `sale_type`='new sale' then sum(`total_cost`) end) /
(case when `sale_type`='add sale' then sum(`total_cost`) end)

 

Even when I try simply graphing the following, I get mostly null values even though every row contains a `calcSaleType` value.  What am I missing here?

Comments

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @user060355 

     

    You're getting NULL values because of your denominator case statement. SQL returns NULL if your denominator is NULL. You don't have an ELSE clause on your CASE statement so it's returning null because there isn't an add sale record.

     

    You can default your null values to 0 in your case statement and bring out your SUM aggregate to surround each case statement. For example:

    CASE WHEN SUM(case when `sale_type`='add sale' then `total_cost` else 0 end) = 0 THEN 0
    ELSE
      SUM(case when `sale_type`='new sale' then `total_cost` else 0 end) /
      SUM(case when `sale_type`='add sale' then `total_cost` else 0 end)
    END

    Also added another case to check for a possible divide by 0 error and handle gracefully.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    while @GrantSmith 's statement that you can't divide by 0 is true, that's not the root cause of your problem. (i suspect).

     

    you can't write this.  even if you ignore the divide by 0 issue,  it's still logically wrong because your beast mode will only ever return either 0 or NULL.  Because after aggregation has been applied, the sale_type can only be either 'new sale' OR 'add sale'  so it's only possible for either the numerator OR the denominator to be populated.

    (case when `sale_type`='new sale' then sum(`total_cost`) end) /
    (case when `sale_type`='add sale' then sum(`total_cost`) end)

     

    it's impossible to have both a numerator AND A denominator because you're evaluating the case statement AFTER aggregation.

     

    so ... what if you DO want to evaluate the CASE statement after the metric?  then rewrite your function.

    CASE

    WHEN 'Sale_Type' = 'option 1' then SUM(col1)

    WHEN 'Sale_type' = 'option 2' then SUM(col1) / SUM(col2)

    ELSE ( SUM(col1) + sum(col2) ) / sum(col3)

    END

    you can of course swap out col1 with a CASE statement ... which we'll see in a second, but the point is, in the above example the metric present is selected AFTER aggregation. 

     

    This is an edge case and usually this isn't what people want.

     

    WHAT YOU PROBABLY WANT.

    As @GrantSmith  described, run your CASE statement INSIDE the Aggregation, so that you evaluate each row BEFORE you aggregate.

     

    In it's root form, you probably want this:

     

    SUM(case when `sale_type`='new sale' then `total_cost` end) /
    SUM(case when `sale_type`='add sale' then `total_cost` end)

     

    The rest of Grant's case statement is error checking to avoid dividing by 0 or NULL which we know is mathematically impossible.

     

    HOW SQL HANDLES NULLS IN MATH

    @GrantSmith 's ELSE 0 clause is related to confusion around how SQL aggregates NULLS.   This is how normal SQL works... i'm told it's counter-intuitive :P.

     

    When you add values at the row-level you can't sum NULLs.

    ex. sum( 5+ 2 + null )  will yield null.

    WHY?  ... well assum NULL means 'i don't know.'  If you took 5 + 2 + "i don't know" you can't assume you'll get 7 right?  

     

    You can however SUM() a COLUMN that contains nulls.

    ex. SELECT SUM (col1) where col1 has rows containing the values 5, 4, 0, and NULL will yield 9.

     

    NULL and 0 aren't the same.  Assume my above example was weekly temperatures from a sensor.  And in the last week, there was a sensor failure so the temperature wasn't measured.

    If you're reporting average temperature would you expect the result to be 9 / 3 or 9/4?

     

    Remember, if you say 9/4 then you assume that on the last week the temperature was 0 degrees.... and that's just not intuitively what we'd expect from an average measurement.  

     

    ... SO BACK TO THE BEAST MODE

    In the main part of the calculation, the ELSE 0 clause in both the numerator and denominator are unnecessary because you can SUM(colContainingNulls) but you cannot SUM(col1+col2ContainsNulls)

     

    CASE WHEN SUM(case when `sale_type`='add sale' then `total_cost` else 0 end) = 0 THEN 0
    ELSE
      SUM(case when `sale_type`='new sale' then `total_cost` ) /
      SUM(case when `sale_type`='add sale' then `total_cost` )
    END

     

    FINAL CLEANUP.

    Grant leads with a CASE statement to check the denominator for zero or NULL (to short circuit the beast mode if it would produce a divide by 0 error)

     

    You have two choices:

    SUM(case when `sale_type`='add sale' then `total_cost` else 0 end ) = 0

    OR

    IFNULL(SUM(case when `sale_type`='add sale' then `total_cost` end ),0) = 0

     

    I like the second method because it means my check is exactly the same as the denominator in the main beast mode and therefore doesn't look like a typo.  I don't like the second method, because now i'm introducing the IFNULL() function (i.e. if my denominator is null replace with 0)

     

    CASE 
    WHEN SUM(case when `sale_type`='add sale' then `total_cost` else 0 end ) = 0 THEN 0
    ELSE
      SUM(case when `sale_type`='new sale' then `total_cost`) /
      SUM(case when `sale_type`='add sale' then `total_cost`)
    END

     

     

    Hope that helps.