Doing UPSERTS via Redshift

As my luck would have it, the genius architects back in IT do an awful lot of INSERTS and UPDATES.  At least, it looks that way to me.  Out of a 63M row DB, I find about 600K INSERTS and 1M UPDATES weekly.  

 

What's a MajorDomo to do?  The support for UPSERTS via MySQL and Domo isn't all that good.  I tried but just could not make it work.  

 

I'm having better luck with Redshift at the moment.  With a dataflow, I basically encapsulate the INSERTS and REPLACEs in a transaction, as follows:

 

begin transaction;

--- delete all rows from the source table where the primary key (row_wid) matches those of the rows to be INSERTED.  

DELETE from w_quote_f
USING w_quote_f_inserts
WHERE w_quote_f.row_wid = w_quote_f_inserts.row_wid;

--- do the inserts
--- first id the fields to be changed.  Since the primary key doesn't match (this is an INSERT), primary key is listed in the field listing.  This proved to be very helpful in troubleshooting the next issue, field ordering.  Turns out that the fields need to be listed in the exact order so that the field type definitions (character, bigint, etc.) all line up correctly.  

INSERT INTO w_quote_f
(
list of fields between parentheses
)

--- then get the replacement values
SELECT list of fields in the same order
FROM  w_quote_f_inserts;

end transaction;

The really strange part is that this takes in about 29GB of data but spits out about 49GB.  I have no idea what's going on behind the scenes.  The row counts match.   

 

Weird.  Naturally, because I am lazy, I do the exact same thing for UPDATES.  

 

Anyway, I thought this might help someone.  

Matthew O Coblentz
Tagged:

Comments

  • @DaniBoyI think this might be better placed in Best Practices.  How does one move a post?  

    Matthew O Coblentz
  • We used this method before in every upsert situations. Nowadays when the datasets origin from Workbench we are using it's native Upsert.

     

    Now would be great if the DOMO API could also upsert a dataset. 

    Ricardo Granada 

    [email protected]

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • @RGranada  Where is that in Workbench?  

    Matthew O Coblentz
  • @mcoblentz I think you have to talk to your account manager in order to enable it.

    Ricardo Granada 

    [email protected]

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • Aha!

    Matthew O Coblentz
  • DaniBoy
    DaniBoy

    domo

    💎

    @mcoblentz

     

    I moved it for you!

    Regards,

    Dani

    Dani aka "Mr.Dojo"

    Dojo Admin
    **Say "Thanks" by clicking the "heart" in the post that helped you.
    **Please mark the post that solves your problem by clicking on "Accept as Solution"
    **You can update your Dojo Community name and avatar by clicking on your avatar then the "My Profile" button.
  • In an attempt to measure the difference, I kept the same flow as above and created a second version like this:

     

    begin transaction;

    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);

    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);

    insert into w_quote_f
    (select * from w_quote_f_inserts);

    insert into w_quote_f
    (select * from w_quote_f_updates);

    end transaction;

    VACUUM w_quote_f to 100 percent;
    ANALYZE w_quote_f;

    We will see if this runs better than the first version.   

    Matthew O Coblentz
  • And I created a third version, which I thought would be really good but Domo is yelling at me that one of the columns (a text column no less) is different.  Hmm.  

     


    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_inserts);

    DELETE FROM w_quote_f
    WHERE row_wid IN(SELECT row_wid FROM w_quote_f_updates);


    ALTER TABLE w_quote_f APPEND FROM w_quote_f_inserts;

    ALTER TABLE w_quote_f APPEND FROM w_quote_f_updates;


    VACUUM w_quote_f to 100 percent;
    ANALYZE w_quote_f;

    I checked workbench - definitley uploaded as a string type.  I checked Domo's version and all the ETL boxes say it's 'TEXT'.  I have no idea what it is screaming at me about.   

    Matthew O Coblentz
  • I tried using 

     

    COPY w_quote_f
    FROM w_quote_f_INSERT;
    COPY w_quote_f
    FROM w_quote_f_UPDATE;

    instead of ALTER TABLE but I couldn't get Domo to run with that either.  :( 

    Matthew O Coblentz
  • Can you see considerable performance improvments in using "alter table append" vs "insert into"?

    Ricardo Granada 

    [email protected]

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • The tests are running now.  I will post as I get data.  

     

    the source dataset is about 53M rows.  The UPDATES and INSERTS are about 500K each.  

    Matthew O Coblentz
  • Ok, keep us posted on those results,

     

    Thanks.

    Ricardo Granada 

    [email protected]

    **If the post solves your problem, mark it by clicking on "Accept as Solution"
    **You can say "Thank you" by clicking the thumbs up in the post that helped you.
  • Great tip! Thank you @mcoblentz 

This discussion has been closed.