Hello, Can someone give me some suggestions on how to deal with my case:
I have a table with project start date and end date, I also know $ of each project, now I need to get the total $ for all project for each month and also compare with the previous month $.
This is what I was thinking: get avg $/month for each project, write a beast mode to indiciate if a project is fall in Jan, Feb, Mar etc by comparing start date and End date. For example if I want to get the total $ for March, then Sum up $ when indicator for Mar is 1. However, I do not know how to get this work. If you have better idea, please let me know.
Below is a sample dataset:
For example, I want to to get total spend on March is $1400, Feb:1400, Jan: 1100, Dec 2018: 400.
|Project||Start Date||End Date||Spend||Avg Monthly Spend|
I have a few questions around this one. First, it looks like you are calculating the average spend by just dividing by the number of calendar months that a project spans. For example, project C would indicate 700 dollars for March even though it is only going through the 15th. Would it be better to calculate daily spend?
`spend` / DATEDIFF(`end_date`,`start_date`)
The next question is around how you want to display the data. If you create a beast mode for each month, you will need to display the data in a table, or possibly a stacked series bar chart. If you run these calculations through in a data flow, you could prepare the data to allow more visualization options. Could you make a sketch or an example graph of how you want to see this data?
Thanks for suggesting to get the daily cost, but a monthly is good enough for us.
As for how I want to use the data:
1. provide the spend for last month,
2. Compare Mom andYoY for spend.
3. Provide spend /visit for last month
4. Compare MoM and YoY for spend/visit.
Currently, I used EndDate as the date to sum up the $, it works fine for the last month(in this case for Apr), but when I compare April $ with March $, I missed $ Spend for projects that run through March and April, becuase the enddate is not March.
I thought of creating the indicators in data flow, then I will need to generate many new columns each for each month_year, we already have 50M rows of data and at the edge of failing our dataflow because of 24 hours limits, therefore, do not want to add more in the dataflow.
There maybe a better way to do it, any suggestions?
Hope I expalined it. Thank you for your help.
Have you had any Domo consultants look through your dataflow to make sure that it is optimized?
I am surprised to hear that 50M rows is taking a full day to process. Are you using MySQL or ETL?
I know 50 M is not "Big Data" at all, it does not make sense to me too. We've been talking with Domo for a while, but they said not much can be done. I use Redshift, I looked at histry of running: The excution time (meaning left join, some calculations) is about 4 hrs, but it takes 5 hours to load data to AWS and 7 hours to get it back. How long does it take your data to upload to AWS and how large is your dataset?
I don't have much experience with redshift, but I did find this article that may point you to some better performance:
I have seen similar issues and the resolution in our scenario was to do the left joins on the data before bringing into Domo. You may want to look at exlcuding unneccessary columns and filtering the data down as much as possible. Good luck!