Aggregating Rows

I have a dataset that breaks everything everything out by rows. any time there is a rental, or vacate it adds a value in to a new row with a siteID for a store. the data refreshes on a daily basis, so it site 3196 has 4 rentals in a day it would repeat site 3196 on 4 different rows with a value of 1 in the rental column for each row. What I want it to show is site 3196 show 4 rentals on a single row for that day. Attached is an example of what the data looks like right now.

So far have pulled in the dataset in to the ETL connected it to the select columns tile so I am only working with the data I need, This next step is to clean it up and have aggregated line item data at the site level.

Tagged:

Answers

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    In the ETL, add a Group By tile and put SiteID and Batch Last Run in your select columns and then choose Count and SiteID for the columns to aggregate. In your example, this would give you a count of 4 for Site 3196 for that day. If you want it to have a count of 1 for Site 3196 for that day, choose Count Distinct values.

  • @MarkSnodgrass By doing this will it also aggregate the data in the movein and moveout columns? so that for example 3196 would go from looking like this:

    To:


  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🟤

    Doing the aggregate I suggest will drop the movein, moveout, and transfer columns and just leave you with the SiteID, Batch Last Run and your aggregate count.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @user048760 , why do you need to aggregate rows in ETL?

    If you just put SiteID on an axis in a card and then take the sum of those columns it would do the same thing without the extra step.


    Is there more to your use case? (where i'm going with this is generally you want to avoid aggregating your data in ETL).

  • @jaeW_at_Onyx This is the first step to create a data set that will house many columns of daily KPI's that are most frequently used aggregated at a store level. To start, I have a webform that has unique identifiers for each site from every different source we pull data from, unfortunately every system we use, uses a different nomenclature to identify sites. there are 50 stores, so 50 rows and I want to pull in these different KPI's in columns in that data set. So, for this example dealing with Rentals and Vacates on a daily basis get split out in to individual rows would want them to be rolled up in to one value so it could be on a single row correlating to that site.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤
    edited April 28

    I stand by my previous recommendation


    @user048760 instead of having one row per site with KPIs in columns i would recommend the schema


    Site, Date, KPI Name, KPI value


    This will be a much more flexible model that won't force you to pre-aggregate your data and allow you to build more extensible models at different granularities.


    If you get into the habit of pre-aggregating your data, when users ask "i want to know the rentals at Site A for a customer, now you have to build a new dataflow. If you avoid pre-aggregation, one dataset can do it all AND still meet your visualization requirements.


    I know budgets and forecasts seems a far cry from what you're currently trying to accomplish, but the concept of UNION'ing different KPI datasets instead of JOIN'ing them is the same.


    There's a whole conference session on scaling your Domo Implementation.

    https://www.youtube.com/watch?v=rS2e2_fv5yk&list=PLUy_qbtzH0S6-5oDbx3BsIv2Xk-JxJxWi&index=17