Need help getting pointed in the right direction for what might be causing inflating counts

I have a table that has details of inventory at each site that illustrates KPI's like Occupancy, vacancy, days vacant, and occupied units. It is put together using 3 different data sources.

  1. Price list - This is a majority of it. it is all the units aggregated row by row with those occupied, vacant, and total unit details.
  2. Web discount detail. This is just a list of web discount names that gets pulled in to the site and units they correlate with from the price list.
  3. vacant units - This is where it gets a little weird. this only shows all the units that our currently vacant. it includes details like days vacant and excl. online. it should match with what shows on the price list. it gets joined with the price list by making a key from combing columns site,size, and unit type, then also join on unittypeID and the price. all are shared between both reports.

For this example there seems to be something inflating some of the counts in the price list.

The price list is everything left starting from the 'Vac' column. everything to the right is from the vacant units report. All the stuff to the right is reflecting correctly with where it should compared to the price list. but if you you look at 2 of the 0x0 lockers they are getting doubled the one showing a total of 10 should show 5, and the first one showing a total of 4 should show 2. Everything else looks correct.

This dataflow does have a recursive aspect. the VΔ column is showing week over week activity at the unit level. I'm not sure maybe that is the culprit.

After checking and rechecking the vacant units piece that I thought was the culprit because the one unit available was being applied to all 4 0x0 lockers and somehow caused an inflation to the total and occ columns didn't fix it I am not sure what could be causing it.


Best Answer

  • nick_dessanti
    nick_dessanti ⚪️
    Answer ✓


    If I were to be going through this I would first look in the ETL and run a preview to see the output of each ETL tile (depending on the amount of data you have because it only shows a preview). You could spot duplicates or something looking off and know which tile needs to be edited.

    I would also go to your data output of this ETL in the data center and look at the data, filter it down to a specific case where you saw an issue in the pivot. For example, the 0x0 locker that is showing 10 instead of the desired 5. This may help you spot the issue.

    If you still cannot pinpoint the error, I would also look in the pivot table you have created and make sure in each value aggregation type you are taking a sum where you need or maybe you want to be taking the count instead. It is possible you may want to be filtering by a distinct count of an attribute as well, this is where a beast mode would come into play.

    Another note: You did mention a recursive aspect to this ETL and typically, you want some logic in the ETL to make sure you are not counting duplicates. Hopefully this gives you some more ideas to help you troubleshoot and pinpoint the issue.