Finding the latest date with specific criteria

I work in the training area at my organization.  We are trying to determine "What is the latest date they have completed all of their required training?"  For example, a new agent comes in, and they are assigned Program A, Program B, and Program C.  We want to know what the latest completion date is when they have completed all 3.  I have tried ETL (image attached) and SQL and cannot seem to get the output I need.  We would also add a few more "filters" such as "Is licensed = Y."

 

SQL:

select `Agent Code`, `Program Title`, MAX(`Program Enrollment Completion Date`) as "RTS Date"

from `bridge_program_enrollment_status`
WHERE `Program ID` = '16'
and `Program ID` = '12'
and `Program ID` = '64'
and `Program Enrollment Status 3` = 'Complete'
group by `Agent Code`

 

Any help would be much appreciated.

Comments

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Without seeing the actual data (or a mockup of it) it is hard to give complete guidance, but here are a few things to note, followed by a cleaned up SQL statement:

    1) Any field this in your select statement that is not aggregated, must be included in your group by clause. I noticed Program Title was not included in the group by clause.

    2) I'm guessing you aren't returning any rows with this select statement. This is likely due to the fact that you are requiring each row to be equal to 3 different program id's. This would not be possible. You should use the IN statement instead if you are looking for rows with these 3 id's. 

    Here's a suggested SQL statement:

    SELECT `Agent Code`, `Program Title`, MAX(`Program Enrollment Completion Date`) as "RTS Date"
    FROM `bridge_program_enrollment_status`
    WHERE `Program ID` IN ('16','12','64')
    AND `Program Enrollment Status 3` = 'Complete'
    GROUP BY `Agent Code`, `Program Title`

    Hope this helps

  • Yes, @MarkSnodgrass , that's what I was missing was the "IN" statement.  I haven't worked with SQL in several years and completely forgot about "IN." Thank you SO much!

  • @MarkSnodgrass when I'm validating the data, I'm seeing that the "in" statement is allowing "any of the program completions" to count instead of they must have "all of the programs complete."  Any thoughts?

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    @DrGWright 

    What constitutes all three being complete? Do you have a 'Program Enrollment Status 1' and 'Program Enrollment Status 2' fields as well that say Complete or Not Complete? If so, I would add to your WHERE clause that 

    WHERE `Program ID` IN ('16','12','64')
    AND `Program Enrollment Status 3` = 'Complete'
    AND `Program Enrollment Status 2` = 'Complete'
    AND `Program Enrollment Status 1` = 'Complete'

    Again, hard to say for sure without knowing what your data actually looks like. If you mock up some sample data of what an incomplete record and a complete record looks like that would help.

  • Thanks, @MarkSnodgrass , and example would be as follows:

    Agent #     Program ID     Completion Date

    1                 12                     1/1/2020

    1                  16                     1/7/2020

    1                  84                      1/5/2020

     

    What I want to show is:

    Agent #     Latest Completion Date

    1                 1/7/2020

     

    I'm wondering if I took out the "Program Title" from the SQL if that would take the duplicate records away, because I don't really care what prorgram they finished last, I just want to know the latest date.  Does that help?

     

    As far as the "Completion Status" field, there is a 1, 2, and 3, but they are all built within the ETL and show "Complete" as the status, so there's no change in status that should affect this.

     

     

  • Also, the Program Enrollment Status 1-3 do not correspond to each specific program, it's just an overall status within our LMS.

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Yes, dropping the Program Title from your select and group by statement will definitely help.

  • @MarkSnodgrass what I'm getting in the SQL results is any of the program's completion dates, not necessarily having all of the programs complete.  

     

    So, for example, what I'm seeing is:

    Agent 1 is completes Program 1 on 2/2/2020, and hasn't completed Program 2 or 3, the data is still giving me the completed date of "Any program" instead of "all programs."  Any additional thoughts?

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    EDIT:: video response (20 min) https://youtu.be/9uNv1_0XXao

     

    EDIT:: oops sorry ... just saw that this was a magicETL dataflow.  read my SQL example and i'll post the Magic version below.

     

    OOOOH a fun brain teaser!

     

    so step 1) build a query that just extracts the last time i completed the 3 programs.  we GROUP BY userID and Program, just in case it's possible to complete a survey multiple times.

     

    in step 2) we'll use a HAVING clause to only keep the rows where you have 3 responses, indicating that you completed 'the right number' of training.

    SELECT
    userID
    , count(*) as NumberOfCompletedPrograms
    , max(CompletionDate) as 'Date all 3 completed'

    FROM (
    // step 1 find the last date you completed each certification
    SELECT
    userID
    , max(completion date) as CompletionDate
    , programID
    FROM
    table
    WHERE
    programID in (list of programs)
    GROUP BY
    programID, userID
    ) as result

    // group by user b/c ... i want one row per user.
    GROUP BY
    UserID

    // only keep the rows where you completed a total of 3 programs
    HAVING
    count(*) = 3

    side note... a lazy developer would build a lookup table for all the trainings with a 1 or a 0 for 'isRequired.  That way you don't have to maintain the code, all you have to do is add new ProgramIDs to the list and change the isRequired flag..

     

    then you could rewrite the query as 

    SELECT
    userID
    , count(*) as NumberOfCompletedPrograms
    , max(CompletionDate) as 'Date all 3 completed'

    FROM (
    // step 1 find the last date you completed each certification
    SELECT
    userID
    , max(completion date) as CompletionDate
    , programID

    FROM
    training_table tt
    JOIN
    lu_listOfPrograms lu
    ON
    tt.ProgramID = lu.programID
    WHERE
    lu.isRequired = 1

    GROUP BY
    programID, userID
    ) as result

    // group by user b/c ... i want one row per user.
    GROUP BY
    UserID

    // only keep the rows where you completed a total of 3 programs
    HAVING
    count(*) = (select count(*) from lu_listOfPrograms where isRequired = 1)

     

     

    Magic Version

    So actually, for Magic the process is still the same.

    your HAVING clause is just a FILTER after you apply a GROUP BY.

     

    If it were me, I would avoid aggregating the data like this because you can't answer 'what am i missing from my training'?

    OPTION 1)

    Use the PIVOT / rows-to-columns transform to spread all the required courses as a columnName with the Date Completed as the Value.  That way you can create a visualization that shows the ones you haven't completed.

     

    OPTION 2)

    Just Output my stage 1 subquery.  That way you can create a filtered (and actionable) list of people and the courses they still need to complete.  You can do most of what you initially asked for in Cards without pre-aggregating the data.  (Pre-aggregating like your requirement is not recommended because your data won't respond to filters (which users completed training 1 but did not complete training 2.)