Beast Model for Calculating Average of Values for lines which meets certain conditions

Hi Everyone,

 

I have looked through several posts to see if this was already discussed, but couldn't didn't quite find what I was looking for.

 

I have a dataset that contains a date column representing the week for which the data is for. I would like to calculate the average of values within a 6 week period from the original date that meets certain conditions. I would like to calculate this for every row in my dataset.

 

Example:

For data in row 1: I would like to get the average of values from rows 2 - 7 (not because it is the sequential row number but because those are the weeks within a 6 week period), excluding row 4 and 7 (because Event field= Y).

For data in row 2: I would like to get the average of values from rows 3 - 8, excluding row 4 and 7 in the calculation

...

For data in row 8: I would get the average of values from rows 9 - 14, no exclusions since all these records for Event field = N

NumberLocationProduct SalesWeekEvent?Average 
1123ABC10014-Oct-2018N111.25=AVERAGE(D4,D5,D7,D8)
2123ABC6021-Oct-2018N116.25=AVERAGE(D5,D7,D8,D10)
3123ABC20028-Oct-2018N88.75 
4123ABC5604-Nov-2018Y91=AVERAGE(D7,D8,D10,D11,D12)
5123ABC11011-Nov-2018N  
6123ABC7518-Nov-2018N  
7123ABC60025-Nov-2018Y  
8123ABC802-Dec-2018N96.16667=AVERAGE(D11,D12,D13,D14,D15,D16)
9123ABC909-Dec-2018N  
10123ABC10016-Dec-2018N  
11123ABC11023-Dec-2018N  
12123ABC8530-Dec-2018N  
13123ABC886-Jan-2019N  
14123ABC10413-Jan-2019N  
15123ABC9020-Jan-2019N  

 

Can someone provide some guidance on how I might be able to approach this in beast mode? Thanks!

 

Comments

  • This kind of calculation would need to be handled from inside a dataflow.  Are you always going to want to look at the trialing 6 week average (where event='N')?

     

    What do you do when you don't have 6 weeks of trailing data?  for example, if the data is from three weeks ago, I would only have 2 weeks of trailing data, do I take the average of those two weeks?  Or would I leave it blank?

     

     

  • What is your recommendation for handling it within a dataflow? Can I manage it via MagicETL?

     

    Are you always going to want to look at the trialing 6 week average (where event='N')? Yes, we will always look at the next 6 weeks data, but only take those rows where the event = 'N'. So if I end up with 2 weeks - I will only average the value of those 2 weeks. 

     

    For those records without 6 weeks of trailing data, I will keep them as blank as we don't have sufficient data to do the calculation. 

  • Last question before I get working on this.  Are you looking to aggregate this per location?  Per product (company wide)?  or per product and location?  or are you looking for company wide metrics (across all products and locations)?

     

    Because the calculation is being done in the dataflow, it forces the agregation to be static.  

     

  • Aggregated at a location...

  • First of all... yes, you can do this in an ETL.  However, I am not the best at creating some of these more complex dataflows via ETL.  So I'm sure that there are probably more elegant ways of performing this within an ETL, but this is the best I could do.  I decided to keep the product detail in the output dataset because I imagined that would be one of the follow up questions, "What products are driving these trends?"

     

    I will post my steps here, let me know if you get lost anywhere along the way (apologies in advance because, as I mentioned, this got a little sloppy on me)

    OverviewOverview

     

    Data PrepData Prep

     

    2019-05-24_11-45-07.png

     

    2019-05-24_11-45-51.png

     

    2019-05-24_11-46-24.png

     

    2019-05-24_11-47-02.png

     

    2019-05-24_11-48-23.png

     

    2019-05-24_11-49-17.png

     

    2019-05-24_11-50-11.png

     

    2019-05-24_11-50-41.png

     

    Data JoinsData Joins

     

    2019-05-24_11-52-41.png

     

    2019-05-24_11-54-35.png

     

    CalculationsCalculations

     

    2019-05-24_12-00-13.png

     

    2019-05-24_12-00-47.png

     

    2019-05-24_12-01-21.png

     

    2019-05-24_12-02-07.png

     

    2019-05-24_12-06-48.png

     

    2019-05-24_12-07-16.png

     

    2019-05-24_12-07-55.png

     

    2019-05-24_12-08-45.png

     

    Here is the RegEx that I used:

    ^(\s|\S)*(\S)+(\s|\S)*$

     

    2019-05-24_12-11-09.png

     

    2019-05-24_12-11-46.png

     

    2019-05-24_12-12-27.png

     

    2019-05-24_12-13-17.png

     

    2019-05-24_12-13-55.png

     

    2019-05-24_12-14-37.png

     

    OutputOutput

     

    2019-05-24_12-17-03.png

     

    2019-05-24_12-17-35.png

     

    2019-05-24_12-18-11.png

     

     

    You end up with a data set like this:

    1.png

     

    2.png

     

    3.png

     

    4.png

     

     

    I used a beastmode for the average sales:

    sum(`Sales over next 6 with no event`) / max(`Number of weeks with no Event`)

     

    Hope that helps.  This was a fun exercise in forcing myself to use ETL sm tu.png