How to periodically REPLACE in addition to APPEND jobs?

I have a VERY large dataset that I've uploaded to Domo. I initially did a REPLACE to load the whole dataset. I then changed the workbench job to APPEND and use an 'update date' in the table to run HOURLY while checking back the past couple days for any new/missed data.

Due to some bugs on our side, sometimes very old data needs to be corrected, so I have to do a REPLACE again, which can take hours. Is there any way to establish a regular REPLACE job, perhaps monthly, which will run automatically without messing up the hourly APPEND job on that same dataset?

Answers

  • GrantSmith
    GrantSmith Indiana 🥷

    You'd need two separate jobs, one for hourly append and one for monthly replace updating the same dataset. There is some risk involved with this as it can cause them both to run at the same time and get unexpected results. What you can do is limit the hourly job to run in between certain hours and then once a month or whatever you schedule the replace job to be within the append job downtime.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**
  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🔴

    @JunkDoom have a close look at Partitioning.

    Many Domo connectors are starting to support partition schemes based on Date. If your data is not partitioned by Date you can use the CLI to create your own partition tags as you upload data.



    the in-Domo ETL version of "Partitioning" can be recreated with Recursive Dataflows. https://domohelp.domo.com/hc/en-us/articles/360057087393-Creating-a-Recursive-Snapshot-DataFlow-in-Magic-ETL


    There is a beta (or is it GA @GrantSmith ) for Magic ETL + Partitions.


    You want to be careful mixing Partition with Replace. Replace will Replace your entire dataset WITHOUT partition_tags. meaning that if you tried to mix Replace with Append+Partition you would duplicate data.

    Jae Wilson
    Check out my 🎥 Domo Training YouTube Channel 👨‍💻

    **Say "Thanks" by clicking the ❤️ in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
  • @GrantSmith Thanks! After some trial and error and confusing schema warnings, here's what's working for me so far:

    -Create a new REPLACE job, and RUN it.

    -Change the job to APPEND, which will then allow you to pick an UPSERT key in the "schema" tab. (may need to uncheck "Protect" on that column.

    -Change the query to look back X days on a date field

    -Create a separate REPLACE job without any date range in the query

    -In the "Overview" tab, under "Domo Details" under "Dataset Name", click the "Browse" button to pick the proper dataset for this to load into. If you TYPE the name, it will create a separate dataset with the SAME EXACT NAME. (yuck!)

    -Schedule your new replace and append jobs as appropriate.

  • GrantSmith
    GrantSmith Indiana 🥷

    Workbench currently supports partitioning. Magic ETL currently does not.

    **Was this post helpful? Click Agree or Like below**
    **Did this solve your problem? Accept it as a solution!**