Calculating the Running Maximum

Is there some way to calculate a running maximum either through ETL/SQL?

 

I have employee hiring data and I want to check if an employee newly hired into a position is a backfill or a new growth. This classification is part of others that are used in existing charts showing how many active/hired/job change/terminations employees we have each month. My strategy at the moment is the calculate the count of employees in each position each month. I want to use this count in finding the running maximum which I will compare to the current count to determine if it a new position or a backfill position. 

 

I'm open to hearing other strategies to approach this problem as well. 

 

 

Image from Gyazo

Comments

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @Crisocir 

    You can utilize a windowing function to calculate the rolling maximum.

     

    MAX(MAX(`JDL Count`)) OVER(ORDER BY `Batch Date`)

     

    Window functions in analyzer require a feature switch. Talk to your CSM if you don't already have them enabled in your instance. 

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @Crisocir  be super careful with the aggregate functions particularly MAX() and COUNT(DISTINCT)  OVER ...

     

    they will work for ONE card with an axis but as soon as you change the axis it'll probably yield the wrong result.

     

    So, while you may get the 'right number' when you look at MAX(MAX()) by Department and Year, if you wanted to do a summary of just MAX(MAX()) by Year (across all departments) or MAX(MAX()) by Region, you'll get the 'wrong result' because of how Window functions work and calculate. 

     

    I know that sounds vague, but window functions are complex and if you've never worked with them before they do have a lot of caveats.  I did a long form webinar on the topic here.  https://www.youtube.com/watch?v=eifSYZIcPzg&t=2247s

     

    Your best bet is to avoid aggregating the data and instead try to capture your data at the POSITION level (not employee) where your granularity is one row per month per position.  Then add an 'isNewHire' flag to capture if it's the first month an employee was with the company, and an 'isNewPosition' flag for if a new position was created.

     

    If you have to use window functions, you can use sum(sum()) of the binary flags, and that will ALWAYS give the 'right answer'.

     

    Hope that helps.