My Domo totals are way higher than then should be when I sum them, for example plays total should be 3,075 and it's over 48,000. When I try using the sum distinct, the values are much lower than they should be. Additionally, when I use the sum distinct I'm getting duplicates.
Does anyone have ideas of how to fix these errors in the ETL or Beast mode?
A little more information about how your data is structured would help or some sample anonymized data would be great too. Without knowing how your data is structures it's difficult to pinpoint the problem.
How is the data being populated? Is it being done via a join? What does a single row in your dataset represent?
I've merged two data sets with ETL edits to a right join. I am including an anonymized version of how it looks and the ETL.
Below are the issues of how it looks with the sum and how it looks with distinct sum:
I would bet a year's salary that the problem is in your JOIN clause.
There are two possibilities,
1) you're JOINING on a column that sometimes contains NULL.
You should never JOIN on a NULL because it creates a cartesian product. in other words, you ultimately multiple each row on both sides of the JOIN by any row that contains a NULL.
2) you have JOIN criteria that are not unique. (i.e. on the LEFT my JOIN criteria has two rows containing 'A' and on the right i have 6 rows containing 'A' I'll end up with 12 rows.)
If your JOIN criteria were correct you would not need Remove Duplicates.
To solve this problem.
1) ensure that you are not JOINing on columns that contain NULL. if in doubt replace NULL with an empty string.
2) to test my theory, put a GROUP BY on the smaller table and group by the JOIN columns. Your Row count will go down to expected values.
@jaeW_at_Onyx Yes, those seem to both be it, I can fix the null for a empty string. I have to join on criteria that are not unique, i.e. joining on a station name and song name and thus they exist in both columns, how can I avoid that? Just with removing the duplicates. THANK YOU!