Doing UPSERTS via Redshift
As my luck would have it, the genius architects back in IT do an awful lot of INSERTS and UPDATES. At least, it looks that way to me. Out of a 63M row DB, I find about 600K INSERTS and 1M UPDATES weekly.
What's a MajorDomo to do? The support for UPSERTS via MySQL and Domo isn't all that good. I tried but just could not make it work.
I'm having better luck with Redshift at the moment. With a dataflow, I basically encapsulate the INSERTS and REPLACEs in a transaction, as follows:
--- delete all rows from the source table where the primary key (row_wid) matches those of the rows to be INSERTED.
DELETE from w_quote_f
WHERE w_quote_f.row_wid = w_quote_f_inserts.row_wid;
--- do the inserts
--- first id the fields to be changed. Since the primary key doesn't match (this is an INSERT), primary key is listed in the field listing. This proved to be very helpful in troubleshooting the next issue, field ordering. Turns out that the fields need to be listed in the exact order so that the field type definitions (character, bigint, etc.) all line up correctly.
INSERT INTO w_quote_f
list of fields between parentheses
--- then get the replacement values
SELECT list of fields in the same order
The really strange part is that this takes in about 29GB of data but spits out about 49GB. I have no idea what's going on behind the scenes. The row counts match.
Weird. Naturally, because I am lazy, I do the exact same thing for UPDATES.
Anyway, I thought this might help someone.
- 10.7K All Categories
- 13 Getting Started in the Community
- 38 Beastmode & Analytics
- 2.1K Data Platform & Data Science
- 59 Domo Everywhere
- 2.7K Charting
- 2.5K Ideas Exchange
- 1.3K Connectors
- 362 Workbench
- 303 Use Cases & Best Practices
- 503 APIs
- 119 Apps
- 48 News
- 753 Onboarding
- 1.2K 日本支部
- Private Company Board