Split varying full names into First Name, Middle Name, and Last Name

Hi there,

Back with another question.


I have a Full Name column in varying format:

Alex McCandless

Alexander David Thoreau

Bobby Del Rio

Cameron Diaz Camilo Carter

Dave O'Donald

Jack E. Topez

...

and I need to parse this into three columns:

  1. First Name - the first word of the string, seems easily accomplished with the split column tile
  2. Last Name - the last word of the string, how can this be done?
  3. Middle Name - anything in between First Name and Last Name, I was thinking substring of the Full Name up to LENGTH of the First Name and ends with a space, but unsure how to accomplish

Thank you in advance for any advice!

Answers

  • @tlammie1806

    The best way to get 100% of the way there is to fix the data input. If that isn't a realistic option, splitting each word into columns and saying that the first word is FN the last word is LN and everything in between is middle might be the best option. Without knowing which name is exactly which, there is no 100% answer.

    If this helps, feel free to agree, accept or awesome it!

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    Agree with @damen that there is not a perfect solution since you have to assume a single first name and single last name, but here are some things you can do to get close if you can't fix the source data. Leveraging the formula tile in Magic ETL, you can create formula fields and then reference within that same tile. Here are some useful ones to create.

    Start by eliminating multiple spaces between words and beginning and trailing spaces with the squash_whitespace function

    full_name = squash_whitespace(`full_name`)
    

    find the length of the name

    str_length = length(`full_name`)
    

    remove all spaces and find the length

    str_length_no_space = LENGTH(REPLACE(`full_name`,' ',''))
    

    subtract the two to find the number of spaces

    num_spaces = `str_length` - `str_length_no_space`
    

    first name is straightforward using the split_part function

    first_name = SPLIT_PART(`full_name`,' ',1) 
    

    last name will utilize the num_spaces with the split_part function

    last_name = SPLIT_PART(`full_name`,' ',`num_spaces`+1)
    

    middle name is going to be a little trickier. I would suggest utilize a case statement using the num_spaces field and then concat accordingly

    CASE WHEN `num_spaces` = 2 THEN 
    	SPLIT_PART(`full_name`,2)
    WHEN `num_spaces` = 3 THEN 
    	CONCAT(SPLIT_PART(`full_name`,2),' ',SPLIT_PART(`full_name`,3))
    END
    

    You could extend this out further if there are additional spaces.

    Hope this helps.

    **Check out my Domo Tips & Tricks Videos

    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.