Multiple input datasets for a dataflow to start

I have a dataflow that has 3 input datasets.  The dataflow has to run when all 3 have updated.  The 3 input datasets update at various times and cannot be predecessors to one another. 

 

Is there a way to allow my main dataflow to run ONLY once ALL dataflows have completed?  At present Domo runs the dataflows as soon as 1 of the 3 input datasets has updated.  

Best Answer

  • rahul93
    rahul93 NY 🟠
    Accepted Answer

    The same way you were going to update the dataflow every 15 minutes. Instead of telling the excel job to update every 15 minutes you update it everyday at a certain time.


    Please comment if you find a better solution.

Answers

  • rahul93
    rahul93 NY 🟠

    Hey @Abs,

     

    You can do this in 2 ways:

    1) Make the dataflow update only based on one dataset that loads last (meaning that loads after the other.

     

    2) Create a dummy dataset and make the dataflow update every 15-20 minutes. So, this dataset will exist in the dataflow only for the purpose of updating the dataflow. For this dataset you would want check the "Upload even if data hasnt changed" box in additional settings on the workbench.

     

     

    Let me know if you have doubts,

  • Hi Rahul

     

    1) This isn't an option for us as the datasets all have their own dependancies and update at different times, somedays one finishes later than the other and other days it can be the other way round.  Therefore not an option. 

     

    2) Can you explain in more detail what you mean here, specifically referring to the datasets as "dummy dataset" and "main dataset" or "dummy dataflow" and "main dataflow", so that it is clear.  

    (I am currently building a dummy dataset based on batch_run_dates for each of the loads, but I think this is different to what you are refering to).  

     

    Thanks

     

    Abs

  • rahul93
    rahul93 NY 🟠

    Abs,

     

     

    1) You can change the time when the datasets load so that you know for certain the dataset that will load the last.

     

    2) So, if you create an excel sheet with one record in it. Load it through the workbench and add a schedule say "every 15 minutes" to it and also check the "Upload even if data hasnt changed" option in additional settings. 

    Add this dataset to your dataflow and check the box next to the dataset that updates the dataflow whenever this dataset updates (P.S: uncheck the boxes for other datasets) . Now, the dataflow will update every 20 minutes. 

    Let me know if you still have doubts,

  • 1) All datasets need to start at the same time, but they complete at different times, so this cannot be an option. 

     

    2) I understand what you are saying now.  Basically forcing my "main dataflow" to run every 15 minutes.  This also cannot happen as the "main dataflow" currently takes approximatley 3 hours to run (this is another topic for another day), therefore it is not practical to use this solution.  

     

    Thanks for the suggestions though. 

  • rahul93
    rahul93 NY 🟠

    Abs, 

    What if you use the second point to "force" the dataflow to trigger once everyday after you are certain all 3 of your datasets load.

     

    So, give about 20-30 minutes (maybe more depending on the time they take) after all 3 of them are scheduled to run and then force the dataflow to load.

     

    Hope this helps,

  • 2) Again, this is not the ideal solution but one we have considered as an interim solution.  How do you "force" the "main dataflow" to run based on time? The only options I have within the settings are to run it when one of the input datasets updates.  

  • Apologies, yes, I had discounted the "dummy dataset" in my thinking.  Thanks!