Merging new and legacy datasets

Hello, Experts!


I am reaching out in hopes that I get some guidance on the best approach for building out an ETL.


Our company is undergoing a point of sale transition and things are going well. We currently have legacy data in one dataset and new data flowing into Domo separately. My CEO is wanting to see week by week sales data for the year for both datasets stitched together. Essentially, he wants visibility for data all time by week regardless of data coming from different datasets.


What I have done:


I have created a stored proc on my legacy dataset to crank out the weekly data from the beginning of this year to now. When a store stops using the legacy system no new data is written to the database so the data does not appear in the result set of the query in my stored proc. I have this dataset pumping into Domo via WB.


I have instructed our new POS partner to create the same stored proc but from the time we went live with the new system and basically mimic the same data. We are outputting this via MySQL via SSH connector.


I have also created an ETL where I am bringing in the legacy dataset as an input and the new POS dataset as input then joining them together after I filter out then join from our new POS dataset  if a store is live (a flag in new dataset) and then joining to the legacy dataset to output the data I want to see. 


My two joins are both inner joins.


Am I headed in the right direction or is there a better way to do this? Ultimately, what I want to see is something like this (or maybe it needs to be presented differently):



| legacy.store_no | legacy.week_no | legacy.sales | newPOS.sales |


| 001                    |              1              |    14,000         |            0       |


| 001                    |              ...              |          ...          |               ...   |


| 001                    |                 45           |      0              |   20,000       |



Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🔴
    Accepted Answer

    For append, you have a few options, but the most common method is to combine the datasets where the column name and types are the same. You'll see the various options when you configure the append rows tile. You can make them the same by using the Set Column Type tile and the Select Columns tile. This allows you to change the type of any column that needs to match the type from the other dataset if it is different and then use the Select Columns to rename the column to make the column names match. Then you can append them together.


  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🔴


    Typically when you are merging a historical dataset with a current dataset, you would want to use the append method in the ETL, especially if your columns are the same, which it sounds like they are since you made sure to use the same stored proc. Try using the append method instead of the join method in the ETL to merge that data together.

  • @MarkSnodgrass when you say the structure is the same, are you saying the columns are supposed to be named the same? Or are you saying that the data types are supposed to match? I have attached a screenshot of my current ETL.