Looking at optmizing Domo's FLOW performance.
We've added all necessary indexes to the datasets.
1) Is it better to have multiple different transforms that add new calculated columns and logic or lump it all togeher in one transform?
2) As data grows in our tables, is it better for performance within FLOWS (or does it even make a difference) if we have our workbench job set to append in stead of replace? I understand there will be a performance gain for reading and trasmitting the data to Domo, but once its in the cloud is there a performance improvement on the back end that it can just "reuse" the old data and not have to re-calculate and only calculate for new data?
Any thoughts are appreciated. Also any tricks you found for increasing flow performance!
In response to item 1), this might just be up to your preference, as I'm not aware of any performance gain by splitting out transforms. Separate transforms would make it easier to debug if necessary.
For item 2), as far as I know, dataflows still have to load the entirety of each input dataset so I don't think appending vs replacing would affect that. Domo Technical Support might be able to shine more light on that though.
But adding the proper indeces like you mentioned have yielded the largest performance gains.
I'd also like to briefly share our experience which seems similar. We had one gnarly dataflow that grew and grew until it was taking 90+ minutes to complete one run and relied on 5 other downstream dataflows to complete, which really is untenable and was a nightmare to debug and maintain.
I'm not aware of your data structure, but moving as much logic into a SQL environment could greatly increase the performance of your flow. We recently completed this transition and the run time went from 90+ minutes down to 10 so it was definitely worth the investment.
Taking a step back, we're trying to use Domo as a prototyping environment for new dataflows, but when putting them into production, we run that through (in our case) a MSSQL environment, specifically an SSIS package.
Thanks for the reply. I've noticed the same type of thing. Instead of loading all the data we have and filtering within Domo, we need to optimize the SQL query on the front end so we only send Domo the bare min of what is needed. Arghs... Kinda defeats the purpose of a "cloud based" BI tool.. other than the accessiblity aspect of it.
Is there any way to perform advanced debugging of the transform to see what is the most expensive operation? Something similiar to SQL execution planner / details? I have a large transformation with what I think are the needed indexes, but it still runs slow.
Beyond going into the dataflow history and clicking on the status of a particular run, such as "Successful" which shows you the time associated with each statement, there's no great way to debug the dataflows. At least not that I've been able to find.
In each transform, you can get some additional SQL details (see attached), but that only gets you so far into how to optimize a given flow.
Hopefully someone else has stumbled on some other tricks they can share though!