I have a large data set with 29 columns and over a million rows. The rows are each grade item in a course. Courses with no grade items are listed as a single row with a null for the grade item. The data set also includes key details like instructor, semester, department, type of activity associated with the grade item, as well as associated attributes with these fields.
I'm looking to count grade items by course so that courses can be put into buckets based on defined thresholds of quantity of grade items. From there, I want to visualize this by department, showing the count of courses that are in each bucket for that department. It will be important to view things by department because there are over 200 departments and over 85,000 courses. This is one one visualization in a set of other visualizations I already have working.
I do have a beast mode of:
CASE when `GradeObjectId` IS NULL then 'No grade items' when (COUNT(`GradeObjectId`)) = 1 then '1 grade item' when (COUNT(`GradeObjectId`)) < 5 then '2-4 grade items' when (COUNT(`GradeObjectId`)) >= 5 then '5 or more grade items' End
Some key fields in the data set are:
What is is
Faculty User Name
Name of grade item
Activity type associated with grade item
Unique ID for grade item
Unique ID for course
Where I am falling short is I can get a visualization that shows the bucket identifier by the course, but once I try to visualize it by department, it is counting things for the entire department (therefore all departments are tagged as buckets of 5). I've experimented with pivot tables, tables with heatmaps, and horizontal bars (to allow for a long scrolling page). I have also tried doing four beast modes to essentially have a column for each bucket something like:
[beast mode 1] CASE when `GradeObjectId` IS NULL then 1 else 0 End [beast mode 2] CASE when (COUNT(`GradeObjectId`)) = 1 then 1 else 0 End
When used in a pivot table, it is not allowing me to sum things like the beast mode 2 column but is allowing me to sum beast mode 1 column. For things like the beast mode 2 column, it is showing more like a T/F for if this department has an item that meets that criteria.
One method I am debating on using is creating a new data flow based on the original flow, grouping by the course identifier and aggregating for count of grade item, then doing a join to bring back in all of the details I want to use in this one visualization back into the new data set. There are drawbacks. The major one is that each course may have multiple faculty and will have multiple types of grade items. I will lose the ability to report on those types of details if I aggregate (there will no longer be separate rows of the details). I can keep both data flows for different purposes, but then I have two data flows.
I wanted to see if anyone had any thoughts on how I could go about tackling this without creating a separate flow. I am flexible with the type of visualization used and am willing to experiment.
to tie it all up, it sounds like you want to do a visualization that counts the bucketed aggregation of a metric. Unfortunately you can't do that in Domo, because effectively you're asking anayzer to do a GROUP BY twice.
WHEN count(*) BETWEEN 1 and 2 then '1-2 items'
ELSE 'no Items' END as Buckets
WHEN .. then '1-2 items'
ELSE 'no Items' END as Buckets
) as firstGroup
GROUP BY Buckets, Department
It's not possible in Analyzer.
What I would try.
You can try using the new Dataset View beta to create the subQuery where you produce 'firstGroup' but it may not work because Analyzer AND Dataset Views execute queries at runtime, and it's unclear to me how domo's query engine handles queries passed from analyzer to a dataset view.
If the first attempt fails, I'd ask Support to optimize the view, hoping that in doing so they materialize the data view. If the view is materialized it should be have the same as any dataset and therefore 'just work.'
If that fails, I'd consider using 'dataset copy' connector to force Domo to materialize my dataset view, which guaranteed will work.
WHAT OTHER PEOPLE WOULD DO.
If you have time, (no ETL pipeline constraints) just run a MagicETL. Magic2.0 should be VERY fast with the described requirement.