Workbench Job Runtime

Ashleigh Florida 🟒

Does anyone have any tips or best practices for improving runtime on workbench jobs? We have a job that updates almost 82 million rows of data daily and takes about 3 hours to run. It varies day to day anywhere from 2 hours to 3 hours. I do not have access to workbench as it is controlled by my IT but I am looking for tips I could share with them to help improve the runtime of this job. The job was created several years ago so I am sure some changes could be done to make it more efficient but I am just looking for some ideas (quick wins or perhaps features of new WB releases) that you all have come across that usually help run time.



  • GrantSmith
    GrantSmith Indiana πŸ”΄

    @Ashleigh - What type of source is the data being pulled from? A database?

  • Ashleigh
    Ashleigh Florida 🟒

    @GrantSmith yea I think so, its one of our financial reporting systems.

  • GrantSmith
    GrantSmith Indiana πŸ”΄

    Typically when dealing with databases you'll want to narrow down your data as soon as possible to limit the amount of data being processed. Also, you'll want to make sure you're utilizing table indexes in your join and where clauses to improve querying response time. If they don't have any the might want to think about adding some but that's a larger discussion with your DBA.

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟀

    @Ashleigh Workbench 5.1 introduced partition support which might be effective for your use case. You can read about it here:

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟀

    @Ashleigh , @MarkSnodgrass and @GrantSmith are on the money for very different reasons.

    1) as your query get's larger, workbench execution times are 99% subject to the performance of your source database (how long does it take to retrieve your data from SQL). If you're parsing 82million rows on your source database you are murdering your system if it's not properly indexed (work with your DBA to optimize your workbench query in the source database via indexes and database stats). make sure you're not doing SELECT * because no DBA likes seeing that ...

    2) Mark nailed it on the head. If you can minimize how much data you have to update in Domo (via partitioning or UPSERT instead of a full REPLACE) then you can shrink how much data you have to pull out of your source database.