CAlculated Average to include in my datasource

Hi All, I need help since I am not familiar with sql, but I need to a column with the calculated average for a particular product which does not have averages available in my datasource.

This is a small sample of my data:

 

As you can see the AUM product does not have an average balance in the source system hence it has to be a calculated field where the calculation is based on the spots balances of that particular account and  product from prior month and current month / 2.

 

Thanks in advance for your help! have a great day.

 

 

 
 

 

Comments

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    yikes... this is ... not the ideal way to structure your data ... but we can make it work.

     

    Goal:  identify the record that = the first day of the month and the last day of the month then caculate the average between the two.

     

    If it were me, I would recommend adding these collumns to a date dimension and then joining the date dimension to your dataset, so that you can recycle your code everywhere, but most people take the lazy solution and just build the logic into the card... 

    https://stackoverflow.com/questions/3960049/create-date-from-day-month-year-fields-in-mysql 

    https://www.oreilly.com/library/view/mysql-cookbook-2nd/059652708X/ch06s14.htmlhttps://blog.sqlauthority.com/2014/04/09/mysql-finding-first-day-and-last-day-of-a-month/

     

     

    sum(
    CASE
    -- is it the start of the month??
    WHEN 'date' = DATE_SUB(date, INTERVAL DAYOFMONTH(date)-1 DAY) then 'Amount'
    -- is it the end of the month??
    WHEN 'date' = LAST_DAY(date) then 'Amount'
    ) / 2

     

  • Thank you!

    What I had in mind was not to add this calculation as a beast mode but to add the column in my current database, since once it is done I have to append it to the databases from other countries. That is what I thought about addressing it via sql. 

    I think I got lost when you say identify the first day of the month, since my data only has the last day of the reporting month.

     

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    oh ... well that makes it easier i guess.  sorry i'm half looking at this at 9PM ?

     

    You can do this with a LAG function in a beast mode.

     

    Contact your CSM to get windowed functions enabled on your instance.

     

    https://www.youtube.com/watch?reload=9&v=cnc6gMKZ9R8

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    You can actually get this field populated pretty easily using Magic ETL. Here's a picture of what your ETL would generally look like:

    ETLRankAvg.PNG

    Here's a breakdown of the steps in the ETL

    1. Split your input dataset into to two datasets by using the filter tiles. Filter one to Product equals AUM. Filter the other one to not equal to AUM. 

    2. Not pictured above, but you need to add a select columns tile on your AUM filtered data set and add all columns except for the average balance column.

    3. Add a Rank & Window tile on the AUM filtered data. Use the AVG function and sort by date ascending and use the preceding and following values of 1 and 1. This will create a rolling average based off the month and the prior month. Name the column average balance (the same name that is in your main dataset).

    4. Append the data back together using shared columns.

    5. Add and name your output dataset.

     

    This should do it for you.

  • Thanks Mark! I added a partition to include the account and it work perfectly. Quick question, do you have any tip to overcome the months with no data.

     

    Ie: I have cases that the account haven't had any balance for more that 1 month hence is is not being fed into the datasource. The average then it is being calculated since the last month that particular account had balance.

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Yes, you can overcome this with a few additional steps. Here is what you would want to do:

    1. Add a 3rd connection from the input to dataset to a Select Columns tile and just select the month column.

    2. Connect a Remove Duplicates tile to that tile and choose Month to get a distinct list of months from your dataset.

    3. Add a join tile to connect the Remove Duplicates tile with the Filter to AUM tile and do a Left Join with Remove Duplicates on the left and join on Month.

    4. Add a Value Mapper tile and replace nulls with 0 on the spot balance column.

    5. Connect to the Rank & Window tile you already configured.

    It should look something like this:

    ETLRankAvg.PNG

     

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @MarkSnodgrass what if you have NO data for any account in August?  Do you want to skip the month of August from your percentage calc?