DOMO Regex Extraction ETL

rwalsh25
rwalsh25 ⚪️
edited September 8 in Dataflows

I'm looking to extract a work order number which contains the letter W followed by 15 digits (Ex. W571191476628566).

The order number lies within a string containing notes content that has been exported into domo. (Ex. "Customer called in to discuss work order W571191476628566, and asked about order status").

In the domo ETL I can use regex to locate the work order number, but the tiles only allow me to replace the regex or use it as a delimiter. Is there a way to extract the regex and keep it in a populate it in a column/field?

Tagged:

Answers

  • GrantSmith
    GrantSmith Indiana 🔴
    edited September 8

    Hi @rwalsh25

    If you're wanting to keep the original text and split work order number into a new column you'd need to split your dataset, pull just the key fields and the note field, then use the regex and then join the data back together using your key fields. If you're using ETL 1.0


    If you're using ETL 2.0 you can just use a formula tile to define your new column name and run the regex_replace function on your notes field.

    REGEXP_REPLACE(`note`, '^.*(W\d{15}).*$', '$1')
    


    ^ = start of the string

    .* = 0 or more of any characters

    (W\d{15}) - Captures the work order number in the format of W followed by 15 digits.

    $ = end of the string

    Simply match the entire string and replace it with only the work order number

  • Thank you @GrantSmith ! It looks like the formula tile is able to extract the work order on occasions, but I am still noticing a few situations where the formula is not extracting just the work order:


    Using the Formula tile REGEXP_REPLACE(`Note content`, '^.*(W\d{15}).*$', '$1') ) to create Work Order Number column:

    Example 1: Does not extract WO# only

    Note Content:

    R261485821 [url link to page being referenced that contains work order number] W144650992267872 [url link to page being referenced that contains work order number]

    Work Order Number:

    R261485821 [url link to page being referenced that contains work order number] W144650992267872 [url link to page being referenced that contains work order number]


    Example 2: Does extract WO# only

    Note Content:

    Replacement for W820711785371520 [Url link that contains work order Number]

    Work Order Number:

    W820711785371520


    My goal is to have a new column that contains only the work order number so that I could join this dataset to other datasets.


    Thanks again!

  • Even a stranger occurrence that I found:


    Note Content:

    Row 344: W662956731987580 [https:/work_orders##queue.filter.work_order_number_starts_with=W662956731987580;queue]

    Row 345: W907563388633513 [https:/work_orders##queue.filter.work_order_number_starts_with=W907563388633513;queue] ​


    Work Order Number:

    Row 344: W662956731987580 [https:/work_orders##queue.filter.work_order_number_starts_with=W662956731987580;queue]

    Row 345: W907563388633513


    Same exact character and format length in the notes content, yet for some reason the work order number was only extracted for one, while the other kept the entire note content.

  • GrantSmith
    GrantSmith Indiana 🔴

    You might have some line breaks in your string causing issues. We can try and pass in the 'm' modifier flag to treat it as a multiline string:

    REGEXP_REPLACE(`note`, '^.*(W\d{15}).*$', '$1', 'm')