We have a dataflow which is run daily to update an issue history dataset. The intent is to have the dataflow start when one of the input datasets is updated. Everything works when the dataflow is started manually, but when I select the option to kick off the dataflow when one of the input datasets changes, an error message "This DataFlow is incomplete." shows up for the dataflow on the DataFlows page, and it does not start when the specified input dataset is updated.
Here is the error: https://drive.google.com/open?id=1b9DP_a2ODmvhLqYaRokMbGlNFWQZob6z
Can anyone shed some light on what is causing the error? Thanks in advance for your attention.
can you share a screen shot of the scheduling settings? You need to make sure that you select which data sets you want the dataflow to trigger off of. So you need to check two boxes. One to indicate you want the dataflow to fire when a data set updates and then at least one more to indicate which of the input data sets that you want the dataflow to watch for updates. (I recommend only selecting one data set, if you have multiple data sets that update once per day, then you would want this data flow to fire after the last dataset updates) You may want to stagger the input data sets to make sure that the one that fires off this data flow is the last one that updates so that all of the other data being used is also current.
Here is a screenshot of the Trigger settings. ghzh_issue and ghzh_batchid are updated by a python script, then the dataflow is started to append the data in ghzh_issue to ghzh_issue_history. The update to ghzh_batchid is the last step in the python script.
Does anyone have a suggestion about resolving this issue? I'd like to schedule the job to run automatically, but cannot as long as manual intervention is required.
Thanks in advance for your attention.