Rolling Average for Each Category

I have a dataset containing total monthly units by month and by person. I'd like to calculate the 3 month rolling average for each person. How should I group the data so that the rolling average calculation goes by person and not by date?

Tagged:

Answers

  • JR_S1L
    JR_S1L ⚪️

    This is how my dataset looks so far

  • GrantSmith
    GrantSmith Indiana 🥷

    Hi @JR_S1L

    You'd need to keep your data as is to partition based on name to get the lag for the prior 3 months to get your rolling average. You can use a rank & window tile in Magic ETL to calculate the lag just make sure you're partitioning by the name and that you're sorting/ordering by the month



    **Was this post helpful? Click Agree or Like below**

    **Did this solve your problem? Accept it as a solution!**
  • JR_S1L
    JR_S1L ⚪️

    Hi @GrantSmith ....I am new to lag functions, can you explain a bit further? This is what I have so far when calculating the lag using Magic ETL...how would I then calculate the rolling average?


  • GrantSmith
    GrantSmith Indiana 🥷

    I've done a writeup of lag functions and rolling averages before utilizing Beast Modes but the idea at the end is the same concept. You just need to calculate the Lag for 1 month ago and 2 months ago into three separate fields. Then add all 3 together (current month, 1 month ago and 2 months ago) and divide the total by 3 using a formula tile. That will get your rolling 3 month average.

    This is depending on how you want to calculate the rolling three months. Is that the current month and the prior 2 months or the prior 3 months excluding the current month? If you want the prior 3 months just calculate the lag for 3 months ago and add that value instead of the current value.


    If you're interested here's the writeup using a beast mode (but if your rolling average won't need to have filters dynamically applied to it I'd recommend keeping it in the magic etl): https://dojo.domo.com/discussion/52679/domo-ideas-conference-beast-modes-rolling-averages#latest



    **Was this post helpful? Click Agree or Like below**

    **Did this solve your problem? Accept it as a solution!**
  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    +1 on Grant's solution using Lag(1) .. Lag(2), Lag(3) IF YOU DON"T HAVE GAPS IN YOUR DATA.

    Only thing I strongly dislike about this approach is that you have to create 3 functions (not efficient) AND it must be true that there are no gaps in data.


    Computationally, @GrantSmith i would say this is probably the only good use case for MySQL b/c it allows you to calculate LAG without the assumption of contigusous data (cc. @MarkSnodgrass


    NOTE THIS IS ALL UNTESTED CODE there are probably missing commas, and when you set variables (as opposed to display you must use := and variable names are case sensitive.


    if you write

    SELECT 
    t.*
    , r.*
    FROM 
    table t 
    , (SELECT @person := '' , @date := null FROM ) r
    ORDER BY t.Date
    

    You are CROSS APPLYing a generated table with one row (r) to every row of table t where table r contains variables person and date which are insantiated as blank and null respectively


    from there if you write

    SELECT 
    @date as prevDate,  
    t.*,
    @date := t.date as _setDateVariable
    
    FROM 
    table t 
    , (SELECT @person := '' , @date := null FROM ) r
    
    ORDER BY t.Date
    

    when MySQL reads this function it will process each column of my SELECT statement in order (top to bottom). so when if i was on row 3 of my table...

    prevDate is the value of assigned to @date (which happens to be the value of t.date from row 2 -- i'll explain in a second), then it reads all the values for that row of table t, then it assigns @date the value of t.date for the current row (row 3)

    when i SELECT row 4...

    prevDate is the value of @date (row 3) , t.* is all the contents of row 4, then i assign ( using := ) the value of t.date (row 4) to @date.

    and so on.


    implement partitioning and test for previous day

    SELECT 
    -- execute tests. testing @person implements my Partition clause
    -- testing @lag1Date ensures the no gaps rule (this is the key part that Magic doesn't give you)
    
    case when t.person = @person and @lag1Date = date_add(t.date, interval -1 day) then @lag1Value end as prevDayValue,
    t.*,
    
    -- paritioning assumes an ORDER BY clause
    -- OR reset my variables to null
    @lag1Value := case when t.person = @person then t.value else null end as _setLag1Value,
    @lag1Date := case when t.person = @person then t.date else null end as _setLag1Date,
       
    @person := t.person as _setPersionVariable
    
    FROM 
    table t 
    , (SELECT @person := '' , @date := null , @lag1Value:=null, @lag1Date:=null, FROM ) r
    
    ORDER BY t.person, t.Date
    


    Implement Cascading testing

    SELECT 
    -- execute tests. testing @person implements my Partition clause
    -- testing @lag1Date ensures the no gaps rule (this is the key part that Magic doesn't give you)
    
    case when t.person = @person and @lag1Date = date_add(t.date, interval -1 day) then @lag1Value end as prevDayValue,
    case when t.person = @person and @lag2Date = date_add(t.date, interval -2 day) then @lag2Value end as prevDayValue,
    
    t.*,
    
    -- cascade the values of lag1 to lag2 or reset to null
    @lag2Value := case when t.person = @person then @lag1Value else null end as _setLag2Value,
    @lag2Date := case when t.person = @person then @lag2Value else null end as _setLag2Date,
    
    @lag1Value := case when t.person = @person then t.value else null end as _setLag1Value,
    @lag1Date := case when t.person = @person then t.date else null end as _setLag1Date,
       
    @person := t.person as _setPersionVariable
    
    FROM 
    table t 
    , (SELECT @person := '' , @date := null , @lag1Value:=null, @lag1Date:=null, @lag2Value := null lag2Date:=null FROM ) r
    
    ORDER BY t.person, t.Date
    


    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤
    edited April 4

    Thinking it through another set based way of approaching this without variables would be to do what SQL has to do under the covers (generate windows) for each row in your dataset we'll generate the 4 rows, then we'll assign the value of the transaction for those lagged rows if they exist, then we'll aggregate.

    this assumes we have a numbers table, sys_num, that contains integers between 1 and 3 in a column called n.


    NOTE this is untested code but i think it works.

    P.S. this code you could implement in Magic if you wanted to avoid writing SQL. it would perform faster since we didn't index MySQL.


    duplicate (using CROSS APPLY) the rows in t.table 4 times

    CREATE TABLE referenceTable as 
    
    SELECT distinct
    date_add(t.date, interval n day) as report_date
    t.person
    t.value as lag_value
    n.num as lag_num
    t.date as actual_transaction_date
    FROM
    table t
    , (SELECT num from sys_num where num beteen 0 and 3) n
    


    create your windowed data

    for each row in table t, you should have duplicate rows if table has a matching 0, 1, 2, or 3 day offset (report_date).

    CREATE TABLE blowOut as 
    
    SELECT
    t.person,
    , t.date,
    , r.lag_num
    , r.lag_value
    ,  case when lag_num = 1 then r.lag_value as lag1day end
    , case when lag_num =2 then r.lag_value as lag2day end
    , case when lag_num=3 then r.lag_value as lag3day end
    , case when lag_num=0 then r.lag_value as lag0day end
    FROM
    table t
    JOIN
    referenceTable r
    
    on t.person = r.person
    and t.date = r.report_date
    

    deduplicate rows

    create table lag_table as
    
    SELECT 
    max(lag_1day) as lag1day,
    max(lag_2day) as lag2day,
    max(lag_3day) as lag3day,
    max(lag4_day as lag4day,
    
    date,
    person
    
    from blowOut
    
    GROUP BY
    date,
    person
    

    then join in the data for that person and that row

    select
    t.* 
    <lt.columns>
    
    from table t
    inner join
    lag_table lt
    on t.person = lt.person
    and t.date = lt.date
    
    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • JR_S1L
    JR_S1L ⚪️

    @GrantSmith , @jaeW_at_Onyx thank you both so much!!! I realized it's because we don't have window functions enabled in our domo instance that this was proving impossible for me...I've reached out to our rep to have them enabled so that I can apply the options that you provided! Again, thank you!!

  • JR_S1L
    JR_S1L ⚪️

    @GrantSmith , @jaeW_at_Onyx just to confirm...in order to be able to dynamically filter by name, these calculations need to be done in beast mode correct?