Converting text to date in ETL

Hi all,

 

I'm trying to convert a text into a date field in magic ETL, and I'm running into errors and unexpected outputs.  Screenshots below.

When trying to just convert the (numbers as text) into dates, I'm receiving the error "An invalid date was found".  When I conver that column first from text to numbers there's no issue, but then when I go to convert that new column to dates, I'm receiving all dates as 1/1/1970.

I've already accounted for non-numerical text fields in the filters previously.

Any help would be appreciated.

 

Structure of Data

Capture.JPG

Output of changing the above field to date

Capture2.JPG

 

Output of changing field to number first then to date

Capture3.JPG

 

Best Answer

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤
    Accepted Answer

    Ok. This should work for you. I've added some additional columns just for learning purposes so you can see how it is put together. The column that I named NewDate is really the only column that you would need in your output. I've added some comments to explain what each part is doing.

    SELECT "datelen",
    --Gets the year
    RIGHT("datelen",4) Yr,
    --Gets the day
    LEFT(RIGHT("datelen",6),2) Dy,
    --Gets the month
    --check to see if this is a 2-digit month or not
    CASE WHEN LENGTH("datelen") = 8 THEN
    LEFT("datelen",2)
    ELSE
    --add a leading zero for one-digit month
    LPAD(LEFT("datelen",1),2,'0')
    END Mo,
    --concatenate the parts together to a convertible format
    CONCAT(
    CASE WHEN LENGTH("datelen") = 8 THEN
    LEFT("datelen",2)
    ELSE
    LPAD(LEFT("datelen",1),2,'0')
    END,
    CONCAT(LEFT(RIGHT("datelen",6),2),RIGHT("datelen",4))) FullDate,
    --put it all together and convert it to a date format
    to_Date(
    CONCAT(
    CASE WHEN LENGTH("datelen") = 8 THEN
    LEFT("datelen",2)
    ELSE
    LPAD(LEFT("datelen",1),2,'0')
    END,
    CONCAT(LEFT(RIGHT("datelen",6),2),RIGHT("datelen",4))),'MMDDYYYY') NewDate

    FROM "df_ga_int_emarsys_campaign_attribution_sql_prep"

    Hopefully, this works for your data, but it depends if there is some bad data that isn't taken into account.

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    I seem to recall running into a similar issue and I could never get the ETL to cooperate. I ending up doing a SQL dataflow to get the dates in an actual date format and then used that output dataset as my new input dataset in the ETL. Here is an example of what I did in my SQL dataflow 

    CASE WHEN "EXTRACTDTE" < 19000101 THEN NULL ELSE date(to_char(TRIM("EXTRACTDTE"),'99999999')) END AS DateEXTRACTDTE

    I had about 10 different date fields in my dataset and I applied this same logic to each field. 

    Hope this helps.

     

  • Thanks - I'm trying that and running into an error:

    The database reported a syntax error: FUNCTION to_char does not exist
    See attached screenshot.

     

     

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    It looks like I was using the Redshift SQL for this conversion. Do you have the option to use that? If not, we should be able to track down the equivalent functions in MySQL.

  • Using redshift, this is my code:

    SELECT
    `Campaign`,
    `Date`,
    `Revenue`,
    CASE WHEN `DateLen` < 19000101 THEN NULL ELSE date(to_char(TRIM(`DateLen`), "99999999")) END AS DateEXTRACTDTE
    FROM
    `df_ga_int_emarsys_campaign_attribution_sql_prep`

    And I'm getting the error:  

    The database reported a syntax error: [Amazon](500310) Invalid operation: syntax error at or near "`" Position: 231;
    Thanks for your help, I'm not very familiar with using these types of dataflows.
  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Redshift and MySQL have different preferences on their use of single quotes and double quotes. 

    Try this:

    SELECT
    "campaign",
    "date",
    "revenue",
    CASE WHEN "datelen" < 19000101 THEN NULL ELSE date(to_char(TRIM("datelen"), '99999999')) END AS DateEXTRACTDTE
    FROM
    "df_ga_int_emarsys_campaign_attribution_sql_prep"
  • The SQL worked but that field just came back blank...Capture.JPG

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    I would say that means it is falling into the first part of the case statement and it thinks the values are less than 19000101. Can you add another screenshot that shows the datelen column without any logic applied to it?

  • Works perfectly, thank you for your help!