Hello, I would appreciate any help on this one, I'm trying to achieve this in a beast mode to maintain speed.
I have a data set,
Order_date | Item _ID | Order_id | Source | Addon |
Date | 1 | O_1 | site 1 | addon 1
Date | 1 | O_1 | site 1 | addon 2
Date | 2 | O_2 | site 1 | addon 1
Date | 3 | O_3 | site 1 |
I can output this in a table fine to show the correct count of each addon, but I need to show what % each addon is split by source of the total count of item_id. The problem is total count would count each item_id for every addon within that same order. I need to count all unique item_ids to get the total orders, then use the count of the addon to create the % of.
My thinking was an over window function may be able to achieve this by getting the total of all orders first to use to calculate the %, but not having any joy as this will not include any orders that did not have addons.
SUM( COUNT(DISTINCT `item_id`) ) OVER( PARTITION BY `source` )
The desired output would be in a pivot table with Source as a column
Addon | % of Total
Addon 1 | 50%
Addon 2 | 25%
Any ideas to solve this problem is greatly appreciated. Apologies if this is unclear but any questions I will try my best to elaborate.
Simon you cannot do a SUM(COUNT(Distinct()) b/c you want a count distinct across all Sources not the sum of Count(Distinct()) for EACH source.
It's a bit of a process to understand and setup, but I've tackled the problem you've described in this tutorial video. https://www.youtube.com/watch?v=Xb4QgKYgaqg&t=507s
Hope that helps.
Thanks Jae for your reply and video, I watched your windowed function video a few weeks back which proved useful.
I think this solution will work, I've recreated as per your video but it fails on build. The preview seems to work yet order_site is only shown when the addon is present otherwise its null, not sure if the columns I've assumed from your example are causing issues. It's a large dataset, so not sure if that could be at play.
Content id (order_date)
Library ID (order_site)
user id (id)
Any ideas what this could be?
Try limiting your data with a filter to a few topics so you can visually inspect and understand what's happening in your data. If the data is small enough you could probably export it into Excel.
In table form, use rowCount to understand how many rows are being aggregated and then do the math by hand to know if you're getting the desired / expected results.
Be sure you can state, "for each x, how many different opportunities were there." -- in this case 'opportunities' would be what you're trying to do a count distinct of, and X would be your partition by clause.
Know in advance how many permutations / rows you should have in your denominator and make sure you're getting that.
For brain teasers like this, I'll run the math on a piece of paper first.
If you'd like one on one support, I do have an hourly rate for consulting services.
The build which did work as a preview failed after 3 hours. I'll try limiting the data, but I've got a feeling a big dataset is getting exponentially bigger.