How to replace rows imported from salesforce using last modified field?

The Salesforce connector lets me choose to replace or append data and it lets me identify date or date/time fields to use for identifying what to import.

 

Here is what I want to happen:

 

  1. Import the entire dataset (once, since its big)
  2. update the dataset frequently by replacing rows that have been modified since the last import using an Id column as a key

Here is what actually happens:

  1. Replacing simply replaces the entire dataset instead of just rows that were modified.

It would be nice if this can be done through the connector, but if it can't then is there a way to do it with Magic ETL or does it have to be SQL? In magic ETL the data join options seem like i'd just be adding columns instead of replacing data in the modified rows.

 

Comments

  • What is your data structure like? Do you have a SOQL select statement so I can see what you grab? 

     

    Ex. I have a separate Dataset per object in Salesforce (Cases, Accounts, Contacts, Leads, Opportunities, etc, etc). I do a full replace each time. My largest object is cases (130k+ rows). 

     

    Basically I'm trying to see if you can organize it better so that you can just do a full replace each time and it's not way too many rows VS trying to do fancy ETL's and other things

  • Thanks for the reply. It's all coming from a single custom object, but there is an unavoidably large number of columns (156). The total dataset is 390k rows. Right now, I'm using the the object picker.

  • How fast does that number grow? In 1 year will it be 450k or 1million+? Just trying to get a range. As long as your not going over your API limit in Salesforce and you set it to import early enough that it's done by when you need to use the data I don't see too much issue with a full replace as long as it won't jump to millions of rows very soon, however I myself haven't exactly come up against this same issue. I have 10 Salesforce objects(tables) totalling 330k rows doing full replaces currently

  • The dataset grows fits and starts but not likely to hit the million mark too soon. I expect it to grow by an additional 100-200k over the next year depending on datasharing arrangments the project staff may arrange. One reason I was interested in figuring this out is that I'd like to up the frequency of the updates. Since users tend to load/enter large amounts of data all at once then they immediately want to run reports. One big nightly update is not  the end of the world but ideally this would be on an hourly schedule during normal business hours.

     

    From what I'm hearing it seems like MySQL/Redshift transforms would be the only option to accomplish this. That seem right?

  • Unfortunately I haven't dabbled enough in MySQL to answer for sure, maybe someone else can come along and offer their experience

  • Another approach to take is import your big dataset once. To ongoing whole SF data store in domo.

     

    Have a second daily/frequent data set which uses the date modified field to only get rows updated since last your last update run, rows from SF (can overlap your date ranges slightly for redundancy where missed update would get picked up in next run at the cost of some wasted duplicate updates). Use domo replace setting on this connector. Trigger a dataflow off this small dataset which updates/replaces rows your big dataset based on the ID of the row/object.

     

     

    Dataflow sql, if mysql replace into is available, will delete existing row if one exists and insert if no matching row.

     

    https://dev.mysql.com/doc/refman/5.7/en/replace.html

     

    Otherwise you need to do inserts for new and update for existing rows.

     

    OR another alternate msql approach, drop matching rows in main dataset. Then union the main set and the update rows together.