bucket courses by count of grade items, visualize count of courses in each bucket by department
I have a large data set with 29 columns and over a million rows. The rows are each grade item in a course. Courses with no grade items are listed as a single row with a null for the grade item. The data set also includes key details like instructor, semester, department, type of activity associated with the grade item, as well as associated attributes with these fields.
I'm looking to count grade items by course so that courses can be put into buckets based on defined thresholds of quantity of grade items. From there, I want to visualize this by department, showing the count of courses that are in each bucket for that department. It will be important to view things by department because there are over 200 departments and over 85,000 courses. This is one one visualization in a set of other visualizations I already have working.
I do have a beast mode of:
CASE when `GradeObjectId` IS NULL then 'No grade items' when (COUNT(`GradeObjectId`)) = 1 then '1 grade item' when (COUNT(`GradeObjectId`)) < 5 then '2-4 grade items' when (COUNT(`GradeObjectId`)) >= 5 then '5 or more grade items' End
Some key fields in the data set are:
What is is
Faculty User Name
Name of grade item
Activity type associated with grade item
Unique ID for grade item
Unique ID for course
Where I am falling short is I can get a visualization that shows the bucket identifier by the course, but once I try to visualize it by department, it is counting things for the entire department (therefore all departments are tagged as buckets of 5). I've experimented with pivot tables, tables with heatmaps, and horizontal bars (to allow for a long scrolling page). I have also tried doing four beast modes to essentially have a column for each bucket something like:
[beast mode 1] CASE when `GradeObjectId` IS NULL then 1 else 0 End [beast mode 2] CASE when (COUNT(`GradeObjectId`)) = 1 then 1 else 0 End
When used in a pivot table, it is not allowing me to sum things like the beast mode 2 column but is allowing me to sum beast mode 1 column. For things like the beast mode 2 column, it is showing more like a T/F for if this department has an item that meets that criteria.
One method I am debating on using is creating a new data flow based on the original flow, grouping by the course identifier and aggregating for count of grade item, then doing a join to bring back in all of the details I want to use in this one visualization back into the new data set. There are drawbacks. The major one is that each course may have multiple faculty and will have multiple types of grade items. I will lose the ability to report on those types of details if I aggregate (there will no longer be separate rows of the details). I can keep both data flows for different purposes, but then I have two data flows.
I wanted to see if anyone had any thoughts on how I could go about tackling this without creating a separate flow. I am flexible with the type of visualization used and am willing to experiment.