Appending several datasets in Workbench

Hello Dojo Community!

 

I have 15 different and large datasets that I need to append to create a unique output dataset that I pull into Domo using Workbench (version 4). I would like to know if I can create a single job in Workbench that allows me to append always the same 15 datasets to be able to upload a single dataset into my Domo instance related to a single job in Workbench. I know I could create 15 different jobs in Workbench and append them all in Domo using Magic ETL, but ideally I would like to have only 1 job and 1 dataset in my Domo instance. 

 

Thanks!

 

Best Answers

  • n8isjack-ret
    Accepted Answer

    @iresanjogar In workbench you may be able to load all 15 files in a single workbench job. The base requirements are:

     

    1. All the files must have the same schema (column 1 on all 15 files must be the same data type, column 2 also, and so forth for all columns). Workbench will assume column positions match, not column names.
    2. The files must be in the same folder (should also be alone in that folder without any other files).

     

    When you add the file in the settings remove the filename so that it is pointing to the folder like this:

     

    Screen Shot 2016-11-02 at 3.55.05 PM.png

     

     

  • Kurbz
    Kurbz 🟡
    Accepted Answer

    hi,

    when i created a new csv job it worked!!

    (note, did not work when trying to change an excel job to csv)..

    many thanks!

    sorted.

    K.

  • cwalliser
    Accepted Answer

    My bad. I found a structure difference in one of the files. When I removed that file from the folder, the update worked.

    A lovely thing...

Answers

  • @iresanjogar

     

    Thanks for your question, we are taking a closer look at it.

     

    In the meantime have a look at the FAQ document for Workbench 4.

     

    Regards,

     

  • You are able to combine different datasets in a workbench job to load as one dataset into Domo. The basic way to do this would be by using a UNION or UNION ALL inbetween select statements for each dataset. With unions, you will need to make sure the columns are aligned in your datasets or in the select statement.

     

    Example:

     

    SELECT

      `columnA`

      ,`columnB`

    FROM dataset1

     

    UNION ALL

     

    SELECT

      `columnA`

      ,`columnB`

    FROM dataset2

  • Thanks for your reply! 

    However, could you specify where I should write that code within Workbench? Is this a customized transformation that I can add to my job? If so, could you explain a bit better the process? The type of transformations allowed seems pretty limited and I am not sure which option I should select to do this.

     

    However, if I understood properly, I will still need to create 15 jobs in Workbench (associated to each of the 15 datasets I want to append), and then another one that would append all those datasets and load as one dataset into Domo, am I wrong? 

     

    Thanks once again! 

  • Can you provide more context of what type of data sources you are pulling data from? Database, excel, csv?

     

    In the job settings, you will see the "Query" tab as an option in the "Data Source Property" section. Here you can input custom SQL to pull the data you want. If you are connected to a database, you can simply create one job that references the different datasets available in the dataset:

     

    wb4.png

     

    If you are wanting to load 15 different files, excel files for example, then you will need to create different jobs from each, since you are only able to browse to one local file at a time. In this case you would then combine them once they are in Domo, which I know you are wanting to avoid.

     

    Let me know if you need additional details and include more information about your datasources.

  • This was really useful! Thank you very much!

     

    The type of files I need to append are CSV, which then means that I need to create 15 different jobs and perform the transformation in Domo. However, now I know that if I am connected to a database I will be able to do these type of operations! Thanks! It would have been great though if in the subsection "transport method" of the "Source" information you would be able to add several files.

  • @iresanjogar, did any of the above replies help you out?

  • Thank you @n8isjack-ret for your reply! This is exaclty what I needed (cc: @LizWR)! Since my files fulfill the requirements I can use this method without a problem. Thanks again!

  • @iresanjogar, can you mark the appropriate reply "accept as solution" so others can benefit from this conversation?

     

    Thanks!

    Dani

  • Hi Folks,

    I am trying to do this method, however when I remove the filename and save it saves with a red dot. When I run the job it fails with:

     

    Could not find a part of the path 'E:\MyDataFolder\'.

     

    I have ran the job successfully with one selected file, but I cannot execute it successfully with no filename and just the folder path ?

     

    Am I missing something?

     

    Thanks Dojo's!

  • @Kurbz, please feel free to open a new thread for better exposure to your question ? 

     

  • Can you confirm that you are using a .csv file import?? This trick only is possible with .csv files.

  • I tried this approach to load 3 csv files with different names in the same folder. Domo Upload Source Folder.PNG

    The preview test failed with an error message below. I notice only Q1 & Q2 files show in the log. 

    Is there a file size limit? Some other factor I'm missing? I don't see any structure differences in any of the files.

     

    [08.28.17 03:45:48 PM] Requesting CSV source file from the Local File Provider
    [08.28.17 03:45:48 PM] Loading local file provider properties
    [08.28.17 03:45:48 PM] Loading CSV file: C:\Users\cwalliser\OneDrive - Gigamon Inc\DOMO\SvcMetricsFiles\
    [08.28.17 03:45:48 PM] Parsing file: 'C:\Users\cwalliser\OneDrive - Gigamon Inc\DOMO\SvcMetricsFiles\ServiceMetric_2017_Q1.csv'
    [08.28.17 03:45:48 PM] Parsing file: 'C:\Users\cwalliser\OneDrive - Gigamon Inc\DOMO\SvcMetricsFiles\ServiceMetric_2017_Q2.csv'


    [08.28.17 03:45:49 PM] The number of fields in the record is greater than the available space from index to the end of the destination array.
    Parameter name: array

     

     

  • Hi,

     

    I am trying to import muitlipe CSV's in workbench by date modified order (oldest first) and it seems like the workbench imports the multilple CSV's ordered by filename.

     

    Our developer has used guid while generating the CSV names so any help how to load by date modified will be appreciated.

     

    Thanks.

  • Ok, help me understand the impact of the load order. My first thought is that the order in which they are loaded will not affect the end result in Domo.

     

    Is it because you don't want to load the old files ever again once reloaded?

     

    Are you loading all files every time?

     

    Does the same record appear in multiple files?

  • Hi,
    Let me explain further.
    We are generating delta loads as a CSV files so currently the name of the files are loaded in the following order by default:
    1) ACCOUNT_380f10d4-3cd9-46a2-8196-76dcd85ac4dc.csv   --date modified as of 03/10/2016 6:08 AM (DD/MM/YYYY)
    2) ACCOUNT_3f771f53-9ca4-4251-ae94-96f9284ba306.csv   --date modified as of 03/10/2016 6:02 AM
    3) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6.csv   --date modified as of 03/10/2016 6:00 AM
    4) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6_1.csv --date modified as of 03/10/2016 6:00 AM
    5) ACCOUNT_5e131c62-0c7d-4d02-9d9f-8e2d5f40ae53.csv   --date modified as of 03/10/2016 6:15 AM
    Since its a delta load we would like the files to be loaded by file date modified order. so i am expecting this to be loaded in the following order:
    3) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6.csv     --date modified as of 03/10/2016 6:00 AM
    4) ACCOUNT_5caaaf2d-ca50-43ff-a56c-b1358b80ebb6_1.csv --date modified as of 03/10/2016 6:00 AM
    2) ACCOUNT_3f771f53-9ca4-4251-ae94-96f9284ba306.csv     --date modified as of 03/10/2016 6:02 AM
    1) ACCOUNT_380f10d4-3cd9-46a2-8196-76dcd85ac4dc.csv    --date modified as of 03/10/2016 6:08 AM
    5) ACCOUNT_5e131c62-0c7d-4d02-9d9f-8e2d5f40ae53.csv     --date modified as of 03/10/2016 6:15 AM
    To answer your question: yes the order they are loaded does affects the data so the oldest csv needs to be loaded first. We are loading different files in every run. yes the records appears in multiple file which is named as above with the help of the guid in the file name. Due to some reasons we are not able to name the files as Account_1.csv, Account_2.csv ...
    Thanks.
  • Hi Dojo Community,

     

    I have a follow up question on this topic. When you upload different csv files with the method described in this post, is there an option to add an additional column where the specific name of each csv file can be written down and recorded? 

     

    I would like to be able to distinguish between the different csv files because it is important for my analysis to know from which csv file each row comes from.

     

    The other option is to upload them separately and add that column within the Workbench transformation tool, but given that all csv files have the same structure I would like to use this trick which simplified a lot the process.

     

    Thanks in advance!

This discussion has been closed.