ETL Select Columns "Shortcut"

Hello, I work with Datasets with high counts of columns. When I create ETL's where I have to Select Columns, I can be going through 150+ columns at a time to choose the columns I need. If I only have a handful to add, scrolling through the list and adding one by one is not really an issue, but when I need to add 30+, it get's very time consuming scrolling through the list over and over to add each column. Adding all columns, and then going through the list to remove the columns I don't need, takes just as long when I'm only selecting, say, 40 columns out of the 150+. 

 

Is there a faster way to add multiple columns (30+), than going through the whole list of 150+ fields and selecting columns one by one? For example, is there a way to select multiple columns at once to add, and then once added, go down the selected list and set the properties for each? 

 

If not, having selection boxes next to the column names, or some other way of choosing multiples to add, would be a great enhancement for future upgrades. Thank you!

 

Note: Due to the size of our company, count of users, and the amount of data in some of our Datasets (Millions of rows), access to Workbench is limited to our IT Dept, so I do not have access.

Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤
    Accepted Answer

    If you are just doing SELECT statements, it is not too daunting. I would encourage you to review this KB article that should give you enough basic knowledge to do what you want to do.

    https://knowledge.domo.com/Prepare/Magic_Transforms/SQL_DataFlows/01Creating_an_SQL_DataFlow

    You would select your source dataset as the input dataset and then add a transform and choose table to then select the columns that you want.

     

    A basic SELECT statement would look like this

    SELECT `date`, `value`, `series`
    FROM input_dataset_1

    This would return just those 3 columns from your dataset, for example. While it won't let you drag and drop columns into your SELECT statement, you could highlight the columns from the list on the left when you are in the transform editor and then paste them into a text editor or even Excel and then remove the columns you don't want and add the necessary comma after each column name. 

    You would then add an output dataset and then you are done. 

    Hope this helps.

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    In the select columns, you can use type-ahead to slightly ease the pain of selecting your columns, but you are still limited to do it one at a time. This has sped it up for me as I have a dataset with nearly 600 columns. 

    Perhaps a better option would be to use a SQL dataflow whose only purpose is to select the columns that you want as you could then take advantage of copy and paste functionality and other text editor type features. You would then use the output dataset as the new input dataset in the ETL. 

  • I usually create a separate dataset that has only the most commonly-used fields. For example, a dataset containing all 300+ cols related to a Salesforce activity, then, in the same dataflow that produces the larger set, a second output set containing only the 50 most popular cols. That's the one I generally join to via ETL. Then you can just add all columns.

  • Thank you for your input Mark! Unfortunately, I am not well versed in SQL (slowly learning though), which limits me quite a bit right now.
    I am new to Domo and we have hundreds of datasets and billions of rows of data, so I am learning as I go.
  • You can also use 

    ALTER TABLE your_table DROP COLUMN `Column1`,`Column2`,...;

    If you are only looking to remove a few columns.  (This would be in a MySQL dataflow as well)