I work in the training area at my organization. We are trying to determine "What is the latest date they have completed all of their required training?" For example, a new agent comes in, and they are assigned Program A, Program B, and Program C. We want to know what the latest completion date is when they have completed all 3. I have tried ETL (image attached) and SQL and cannot seem to get the output I need. We would also add a few more "filters" such as "Is licensed = Y."
select `Agent Code`, `Program Title`, MAX(`Program Enrollment Completion Date`) as "RTS Date"
WHERE `Program ID` = '16'
and `Program ID` = '12'
and `Program ID` = '64'
and `Program Enrollment Status 3` = 'Complete'
group by `Agent Code`
Any help would be much appreciated.
Without seeing the actual data (or a mockup of it) it is hard to give complete guidance, but here are a few things to note, followed by a cleaned up SQL statement:
1) Any field this in your select statement that is not aggregated, must be included in your group by clause. I noticed Program Title was not included in the group by clause.
2) I'm guessing you aren't returning any rows with this select statement. This is likely due to the fact that you are requiring each row to be equal to 3 different program id's. This would not be possible. You should use the IN statement instead if you are looking for rows with these 3 id's.
Here's a suggested SQL statement:
SELECT `Agent Code`, `Program Title`, MAX(`Program Enrollment Completion Date`) as "RTS Date" FROM `bridge_program_enrollment_status` WHERE `Program ID` IN ('16','12','64') AND `Program Enrollment Status 3` = 'Complete' GROUP BY `Agent Code`, `Program Title`
Hope this helps
Yes, @MarkSnodgrass , that's what I was missing was the "IN" statement. I haven't worked with SQL in several years and completely forgot about "IN." Thank you SO much!
@MarkSnodgrass when I'm validating the data, I'm seeing that the "in" statement is allowing "any of the program completions" to count instead of they must have "all of the programs complete." Any thoughts?
What constitutes all three being complete? Do you have a 'Program Enrollment Status 1' and 'Program Enrollment Status 2' fields as well that say Complete or Not Complete? If so, I would add to your WHERE clause that
WHERE `Program ID` IN ('16','12','64') AND `Program Enrollment Status 3` = 'Complete'
AND `Program Enrollment Status 2` = 'Complete'
AND `Program Enrollment Status 1` = 'Complete'
Again, hard to say for sure without knowing what your data actually looks like. If you mock up some sample data of what an incomplete record and a complete record looks like that would help.
Thanks, @MarkSnodgrass , and example would be as follows:
Agent # Program ID Completion Date
1 12 1/1/2020
1 16 1/7/2020
1 84 1/5/2020
What I want to show is:
Agent # Latest Completion Date
I'm wondering if I took out the "Program Title" from the SQL if that would take the duplicate records away, because I don't really care what prorgram they finished last, I just want to know the latest date. Does that help?
As far as the "Completion Status" field, there is a 1, 2, and 3, but they are all built within the ETL and show "Complete" as the status, so there's no change in status that should affect this.
Also, the Program Enrollment Status 1-3 do not correspond to each specific program, it's just an overall status within our LMS.
Yes, dropping the Program Title from your select and group by statement will definitely help.
@MarkSnodgrass what I'm getting in the SQL results is any of the program's completion dates, not necessarily having all of the programs complete.
So, for example, what I'm seeing is:
Agent 1 is completes Program 1 on 2/2/2020, and hasn't completed Program 2 or 3, the data is still giving me the completed date of "Any program" instead of "all programs." Any additional thoughts?
EDIT:: video response (20 min) https://youtu.be/9uNv1_0XXao
EDIT:: oops sorry ... just saw that this was a magicETL dataflow. read my SQL example and i'll post the Magic version below.
OOOOH a fun brain teaser!
so step 1) build a query that just extracts the last time i completed the 3 programs. we GROUP BY userID and Program, just in case it's possible to complete a survey multiple times.
in step 2) we'll use a HAVING clause to only keep the rows where you have 3 responses, indicating that you completed 'the right number' of training.
SELECT userID , count(*) as NumberOfCompletedPrograms , max(CompletionDate) as 'Date all 3 completed' FROM (
// step 1 find the last date you completed each certification SELECT userID , max(completion date) as CompletionDate , programID FROM table WHERE
programID in (list of programs) GROUP BY programID, userID ) as result
// group by user b/c ... i want one row per user.
GROUP BY UserID
// only keep the rows where you completed a total of 3 programs HAVING
count(*) = 3
side note... a lazy developer would build a lookup table for all the trainings with a 1 or a 0 for 'isRequired. That way you don't have to maintain the code, all you have to do is add new ProgramIDs to the list and change the isRequired flag..
then you could rewrite the query as
SELECT userID , count(*) as NumberOfCompletedPrograms , max(CompletionDate) as 'Date all 3 completed' FROM ( // step 1 find the last date you completed each certification SELECT userID , max(completion date) as CompletionDate , programID FROM training_table tt JOIN lu_listOfPrograms lu ON tt.ProgramID = lu.programID WHERE lu.isRequired = 1 GROUP BY programID, userID ) as result // group by user b/c ... i want one row per user. GROUP BY UserID // only keep the rows where you completed a total of 3 programs HAVING count(*) = (select count(*) from lu_listOfPrograms where isRequired = 1)
So actually, for Magic the process is still the same.
your HAVING clause is just a FILTER after you apply a GROUP BY.
If it were me, I would avoid aggregating the data like this because you can't answer 'what am i missing from my training'?
Use the PIVOT / rows-to-columns transform to spread all the required courses as a columnName with the Date Completed as the Value. That way you can create a visualization that shows the ones you haven't completed.
Just Output my stage 1 subquery. That way you can create a filtered (and actionable) list of people and the courses they still need to complete. You can do most of what you initially asked for in Cards without pre-aggregating the data. (Pre-aggregating like your requirement is not recommended because your data won't respond to filters (which users completed training 1 but did not complete training 2.)