Checklist to diagnose elongated runtimes from adding one simple join

I have a complex data flow (A) that was taking about 7 minutes to run. I have another simple data flow (B) that is taking 2 minutes to run. Both have outputs that are properly indexed.

I created a new data flow that creates an inner join between two of the output datasets from each data flow (A & B). This inner join is on one field. The new data flow is now taking over 25 minutes (and running). What should I investigate to understand the potential problem? Are there alternative ways of efficiently creating a new dataset from two different dataflows?

Tagged:

Comments

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    Are you using Magic ETL 2 for you dataflow? That will run faster than MySQL or Redshift dataflows, so I would switch to that if you aren't already using it.

    You can also investigate the individual steps of a dataflow by going to the history tab of your dataflow and then going to the far right of the latest run and click on the 3 dots that appear when you hover over the area and click on view details. This will show you the individual steps of your dataflow and how long they each took. This will help you troubleshoot.




    **Make sure to like any users posts that helped you and accept the ones who solved your issue.**
  • I am using MySQL so the issue is embedded somewhere in the processing of the complex SQL statements. I will also check out the history tab.

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    Hopefully the history tab will be helpful for you. I would also suggest looking into rebuilding your dataflow in Magic ETL. Complex joins can be accomplished by using filter tiles after the join statements and other additional tiles, but the end result typically results in a much faster runtime.




    **Make sure to like any users posts that helped you and accept the ones who solved your issue.**
  • @MarkSnodgrass Thanks for the help. I've confirmed with other work that the Magic ETL is indeed much more performant. Now I will talk my more complex case that started this thread.