How do you extract or select a date from a column header?

For an ETL dataset that I want to update weekly, the date and time period (e.g., "Period : Latest 4 Wks - W/E 09/07/19...") is in the first cell of the Excel file.   When I create the initial dataset (via Box), this field comes in as a column header.  Since it is not a unique column name each week (because of the changing date), in ETL I can't automate selecting this column to collapse it or to split it.  Ultimately, I want to create two constant columns, one column for "Latest 4 Weeks" and one column for the date "09/07/19".  How do I push this column header into a row so I can extract these parts of the string?



Best Answer

  • ST_-Superman-_
    Answer ✓

    Good news and bad news for you @helenh 


    There is a solution to this problem.  However, you are going to have to use some MySQL to do it.  I will show you how to set it up.  I am just going to bring in the header of the first cell into a column.  Then you can put this dataset into and ETL and make any changes you need to.


    I had to make an assumption that essentially cell A1 contained this "Period and W/E" field in it and that your data table really started in cell B1.


    This is a preview of my starting dataset:1.png

    You will need to create a new MySQL dataflow and use your dataset as the input file.  (I named my dataset 'dojo_helenh' so you will need to replace that with the name of your dataset)


    step 1 (a table transform):

    'SELECT ',QUOTE(COLUMN_NAME),' as `Period and Wk`, `Field 1`, `Field 2` FROM dojo_helenh')
    SEPARATOR ' UNION ALL ') as transpose_statement
    WHERE TABLE_NAME='dojo_helenh'
    AND COLUMN_NAME NOT IN ('Field 1','Field 2');

    You will also need to replace the `Field 1`, `Field 2`, [`Any additional fields`] in both lines where they are used.  You will notice that the first time they are used it is with the single back tick ` and when they are called in the WHERE clause (last line), you use a single quote '


    When you create this transform, there is a spot to name the table it is creating in the upper left corner:




    Next step (this is a SQL transform)

    CREATE PROCEDURE transpose()


    SELECT `transpose_statement` INTO @sql FROM `step_1`;

    SET @str=CONCAT('create table new_table as ',@sql);

    PREPARE q FROM @str;
    EXECUTE q;


    Next step (another SQL transform):

    call transpose;

    Then you can add your output dataset:

    SELECT * FROM new_table


    Use this output dataset in your ETL and you will have the values of that first cell in a field called `Period and Wk`



    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman


  • Hello @ST_-Superman-_ !   Thanks for providing the detailed solution.  Good to know that it can be done and to see the logic used to create this.  (And, yes, your assumptions about my data table structure are correct.)  I have two questions, if you don't mind me asking. ?


    While I am familiar with SQL basics, I have not used Procedures.  Can you explain why a procedure is necessary instead of straight SQL?   


    Also,  it looks like you select the dated column with this line:  "COLUMN_NAME NOT IN ('Field 1','Field 2').  Is that right?  Very interesting. I love it.


    Thanks so much.  Looking forward to trying it out now (with just simple substitutions.)

  • Hi Helen,


    Thanks for the questions.  I'll answer your first question as best as I can, but I am actually not all that good with dynamic SQL.  


    Everytime that I have had to write a dynamic SQL query, I have just modified an example that I stole from someone else online.  I don't recall the website now, but it had to do with creating a table with dynamic field names.  


    That is, essentially what we are doing with the code I provided.  We don't know the exact name of the field so we cannot call it out in a statement.


    As for your second question... You are correct, I am essentially identifying the unknown column name by telling MySQL to grab all of the column names that are not `Field 1` or `Field 2`.  I then concat that list (in this case it was only one) with the rest of the select statement that I plan on creating the final table with.  That's the part in single quotes from step 1.  


    The procedure then takes that concated statement created in step one and injects that into a "create table as" statement.


    Hope that answers your question.  Unfortunately, most of the ingenuity was stolen from someone else who was nice enough to post online.  I'll try to do some digging to see if I can find that article, but for now, I hope my answers were sufficient.

    “There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
This discussion has been closed.