Getting a Sum of Values for X Days When There are Gaps in the Dates
We have our dataset of transactions for a large number of SKUs. Dataset is over 1 Million rows. We want to examine the date of each transaction and go back 30 days to get a 30 day average of sales for each SKU based on the date of that record. Normally I would be thinking Row and Window tile in Magic ETL. But here's the catch. Not every SKU sells every day and we don't have records for a SKU if there were no transactions on that day. So if we use Row/Window with a SUM for 30 leading rows, we will go beyond the 30 days.
The table below shows an example. For each transaction of each SKU we want to go back to the previous transactions for the past 30 days. But even if we go just 7 days back you can see the problem. Look at last transaction of each SKU. If we go up 7 rows, we will get a history of more than 7 days because there are missing days; days when no transactions for a SKU occurred. This means that using a SUM with leading row in Row/Window will not work because we exceed the time frame needed.
Row/Window functions aggregate without any logic capability. There's no Case statement capability in a Row/Window aggregate. So I'm down to a SQL cursor type operation or a LOOP where for each SKU and Date in a given record, find the transactions for that SKU where the dates of those transactions are 30 days before the date of the record currently being examined.
Is this even possible in Magic ETL? Or am I looking at using R or Python scripting?