SQL/Beast Mode to extract substrings and output to multiple new columns

Reply
JJK
Visitor

SQL/Beast Mode to extract substrings and output to multiple new columns

I have a column in my data with strings that look like this:

 

Clusters affected:
|Finance, Services & Innovation| |Planning & Environment| |Transport & Infrastructure|
|Finance, Services & Innovation| |Planning & Environment|
|Finance, Services & Innovation| |Health| |Industry| |Planning & Environment| |Premier & Cabinet|

 

What I would like to do is extract the substring between each "| |" delimiter and save the output to a new column. Each row has a different number of entries in the string. The optimal output for the first row would look like:

First ClusterSecond ClusterThird Cluster
Finance, Services & InnovationPlanning & EnvironmentTransport & Infrastructure

 

Greatly appreciative for any solutions or guidance!

Black Belt

Re: SQL/Beast Mode to extract substrings and output to multiple new columns

Have you tried using the Magic ETL "Split Column" tile?

 

You should be able to delimit the column by "| |" and then do some more clean up for the first and last column


______________________________________________________________________________________________
“There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
______________________________________________________________________________________________
Black Belt

Re: SQL/Beast Mode to extract substrings and output to multiple new columns

This is how I would do this in MySQL:

 

Transform 1:

SELECT
    `Clusters affected:`
    ,substring_index(`Clusters affected:`,'| |',1) as `First Cluster`
    ,substring_index(substring_index(`Clusters affected:`,'| |',2),'| |',-1) as `Second Cluster`
    ,substring_index(substring_index(`Clusters affected:`,'| |',3),'| |',-1) as `Third Cluster`
    ,substring_index(substring_index(`Clusters affected:`,'| |',4),'| |',-1) as `Fourth Cluster`
    ,substring_index(substring_index(`Clusters affected:`,'| |',5),'| |',-1) as `Fifth Cluster`
    ,substring_index(substring_index(`Clusters affected:`,'| |',6),'| |',-1) as `Sixth Cluster`
FROM `dojo_jjk`

Transform 2:

SELECT
    `Clusters affected:`
    ,trim(BOTH '|' FROM `First Cluster`) as `First Cluster`
    ,trim(BOTH '|' FROM `Second Cluster`) as `Second Cluster`
    ,trim(BOTH '|' FROM `Third Cluster`) as `Third Cluster`
    ,trim(BOTH '|' FROM `Fourth Cluster`) as `Fourth Cluster`
    ,trim(BOTH '|' FROM `Fifth Cluster`) as `Fifth Cluster`
    ,trim(BOTH '|' FROM `Sixth Cluster`) as `Sixth Cluster`
FROM `transform_1`

Transform 3:

SELECT
    `Clusters affected:`
    ,`First Cluster`
    ,case when `Second Cluster` = `First Cluster` then null else `Second Cluster` end as `Second Cluster`
    ,case when `Third Cluster` = `Second Cluster` then null else `Third Cluster` end as `Third Cluster`
    ,case when `Fourth Cluster` = `Third Cluster` then null else `Fourth Cluster` end as `Fourth Cluster`
    ,case when `Fifth Cluster` = `Fourth Cluster` then null else `Fifth Cluster` end as `Fifth Cluster`
    ,case when `Sixth Cluster` = `Fifth Cluster` then null else `Sixth Cluster` end as `Sixth Cluster`
FROM `transform_2`

This will give you a dataset like this:1.png

 


______________________________________________________________________________________________
“There is a superhero in all of us, we just need the courage to put on the cape.” -Superman
______________________________________________________________________________________________
Announcements
Dojo Day is Coming! Mark your calendars for May 30th 0900-1700 MST Click here for more information