I added a new dataset as input to a Redshift data flow. After I made this change, the dataflow runtime went from 10 minutes on average to 40 minutes. How can I shorten the runtime so it's back to an avg of 10 minutes?
I suppose that depends on the size of the data set and how you are joining the data. Can you provide more details? I have found that when I try to join 3 or more tables in the same select statement it can sometimes add significant runtime. If you have any transforms that are doing that you may want to consider splitting that step into multiple steps and only join two tables at a time.