Regards to transforming data prior to joining with another data set.

Attached are 2 different samples of data. Both are in similar format, and have plenty of same columns.

The price list is where the bulk of what I am going to use will come from. It is the full inventory for a site. the total number of units, number of occupied, number of vacant etc... It is rolled up in to a single row for each unit type at every site. The vacant unit sample is only of those units that are not occupied, and it gets broken out where every single unoccupied unit gets it's own row. This is where things are getting complicated. From the Vacant unit file I just need the info from the daysvacant column, since there is say 7 rows of 5x5 drive ups at the same site all with different amounts of 'daysvacant' I was thinking the best option would be to a min, max, and med, columns for 'daysvacant'. The bigger aspect is i am not sure how to counter the multiple rows of the same units, and have it be able to cooperate with the price list that has single rows for single unit types.

Any suggestion or direction is greatly appreciated!


Best Answers

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤
    Accepted Answer

    @user048760 ,

    you've stumbled onto the age old problem. :P


    1) i assume that the price list report changes / updates regularly (let's assume weekly). at some point you'll probably want to ask "what did these metrics look like 5 weeks ago." to do that you'll need to have accumulated the history of this snapshot report.

    add a "report_date" column and then build a recursive dataflow that captures the history of those changes this video has a thorough explanation of the pitfalls of recursive dataflows, but if you start the video where linked, you'll see a much easier design pattern implemented than the KB article

    https://youtu.be/JNQFZCj8JcQ?t=567


    2) build another recursive dataflow for the vacant units (make sure they get the same report_date)


    3) UNION the datasets together. You only want to overlap on columns that ARE NOT metrics ( so location is fine, Site ID is has to overlap).


    In visuzalization even though the data are on separate rows, you don't have to worry about weird MIN / MA Average because you did not JOIN the data together.

    https://www.youtube.com/watch?v=PVbOeLSae9o

  • user048760
    user048760 ⚪️
    Accepted Answer

    @jaeW_at_Onyx both these reports will refresh on a daily basis. Transactions take place, people move in, and move out daily, so each day will be a fresh update. I will give this a try. Thanks for the guide line!