Does a dataflow get rerun if its input source gets updated while the dataflow is running?

I have a dataflow that has 2 input sources, and have configured it to run when either of the input sources get updated.

If one of the dataflow's input sources gets updated while the dataflow is running, will that immediately trigger a rerun of the dataflow?  If it doesn't immediately trigger it, then will it at least trigger it once it has completed?

I was hoping it would abandon the current run of the dataflow and trigger a rerun immediately, otherwise the output dataset will have stale data.  In testing this scenario, it looks like the dataflow doesn't get retriggered at all.

I've tested this by adding an additional, small, redundant dataset (dataset_A) to the inputs of a dataflow (dataflow_B) that takes more than 15 minutes to run.  I then configured dataflow_B to rerun if dataset_A is updated.  I then manually triggered a rerun of dataflow_B, waited a few minutes and refreshed dataset_A which completed within seconds.  It looks like the fact that dataset_A is refreshed doesn't stop or affect the currently running dataflow_B dataflow, and didn't cause it to rerun once it finished.  That is not ideal behaviour because it leads to stale data in the output dataset.

Comments

  • DataJake
    DataJake

    domo

    💎

    @danielj, I have also run into this issue. The dataflow will not reset or kick-off a second time upon completion if another input updates while the dataflow is running. 

     

    Using a datafusion is one solution, as each input is automatically updated in the output. 

  • I have contacted DomoSupport and they say it is an issue that their development team is currently looking in to.

  • @danielj, please keep us updated!

  • Did you ever recieve an update on this? Seems like its still an issue.