How populate a new column with data based on data in other rows

I have a dataset consisting of Google Analytics data. One of the columns identifies their Userid (for our ecommerce portal) if they authenticated. Another column identifies their GA session ID (during the visit to our site before and after they authenticate). I am interested in capturing all the activity for a user before they authenticated. I can connect this by referencing the session ID.

Event -- session ID -- UserID -- New column

A -- 12345 -- NULL -- {Pre-Authenticated}

B -- 9876 -- NULL -- {NULL}

C -- 12345 -- 9385 -- {Authenticated}

How can I set up in ETL a way of populated a new column to identify whether the event row was authenticated, pre-authenticated, or null?

Tagged:

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    I would create an ETL that splits your data into two using two filter tiles. One filter where userid is null, which is your pre-authenticated data, the other where userid is not null is your authenticated data. You can add a constant tile to create a column called authtentication status. You can then use a join tile to join them back together and join on session id. You would then have your user id next to your pre-authenticated data when it matched on session id.

  • GrantSmith
    GrantSmith Indiana 🔴

    Unless you know each of those events are for the same user you won’t be able to tie different sessions together because each visitor to your website will get a new session ID if they haven’t visited your site within 30 minutes or whatever you have your Google analytics configured to. I am not certain if you can use googles user identification to tie all of the different sessions together but that might be something you can also look into. If that is the case you can use the rank and window tile to partition based on googles user identifier and then order by event time and use the lag too pull your user id field

  • Thanks guys. Grant - you're correct in that a user will get a new session id for each of their sessions. I just want to identify all their activity for sessions where they did authenticate.

  • GrantSmith
    GrantSmith Indiana 🔴

    @User_32265

    If you're simply wanting to do a translation stating they're authenticated you can use a Beast Mode on a card or a Formula Tile in a new Magic ETL dataflow to define a new column. I'm not certain what you logic would be to determine if they're "Pre-Authenticated" but assuming the user ID is populated means they're authenticated you can do something like this:

    CASE WHEN `UserID` IS NOT NULL THEN 'Authenticated' END