Best practice for storing historical day to day data?

There are cases where i need to store transactional data on a day to day basis. does domo have a documented best practice that i can base all my ETL processes on?

Comments

  • This is a great question! We are more than happy to assist you with this request. Are you looking to just append the new data on to the historical data each day? Will there ever be data added that replaces previous data that was added?

    Thank you.

  • I am looking to create a snapshot of data at the end of each day, regardless of whether the data changed or not. My concern, is the size limit and impact on performance.

  • Currently Domo does a combination of replacing fields if they change, or inserting a completely new field regardless of change/or a change and this varies with every dataset. Meaning that there is no consistency, hence the creation of my own MySQL ETL.

  • I will be reaching out you offline to work through this due to needing additional information.

  • Waitng on your call, i messaged you my personal landline as well as my instance

  • For Salesforce data I do a daily append so I can track daily forecast snapshots, it works quite well, my executive team loves this feature for forecasting. See image below:

     

    Screen Shot 2015-12-16 at 9.27.32 AM.png

     

    The colors represent Salesforce stages (legend not shown in this image) and shows how the pipeline has evolved over the quarter. It allows the ability for historical snapshots of what the pipeline was on any given day in the quarter/year/whatever. You can't get that out of SFDC directly.

     

  • Exactly Nick, the goal is to be able to represent all the data as at any given day. This would allow me to roll up to weeks/month/years. Are you using the built-in domo dataset append method to capture this daily change? or are you using a Dataflow to capture this data? Do you run into issues in cases were the job refresh is ran multiple times within a day? how large is your current dataset size and have you ran into any issues? 

  • Yes I am using the built in append feature. In this example I am appending the Salesforce opportunites object. You can set it up to append only once a day so you don't run into that issue of multiplying the data. I have had no problems with errors with that and I've been doing this since May. If you needed to refresh it more you could but that wouldn't make sense for obivous reasons. My dataset isn't too large given were a smaller company but the size of the dataset shouldn't matter from what I'm told. I append multiple other objects as well such as lead and account data so I can do similiar analysis on different pipeline attributes. Obviously it only collects historical data once you start running the append but I can see awesome potential a year from now with year over year type analysis once I have the data.

  • thank you for this, I too am using salesforce and have my dataset set to append. However, i have had instances where a job ran twice in one day creating duplicate data for a single day. to avoid such senarios, i have created an etl process that only captures todays data once. Thank you for your response

  • For anyone else with questions about scheduling I use the basic scheduling and have it setup like this:

     

    Screen Shot 2015-12-16 at 10.02.10 AM.png

  • @CoreVest did @nick_datasift reply work for you?

     

    Please mark as a solution if it did.

     

    Thanks!
    Dani

  • Something else you might consider is the "Remove Duplicates" action within Magic ETL. This may be what you're already using, but if you set your DataSet to Append and remove duplicates based on the necessary unique values, you can eliminate the duplicates that result from multiple runs in a day.