Improving Dataset performance by removing unused columns

Hi there,

I have a dataset that has become sluggish and increasingly gives people problems when viewing cards or dashboards created with this dataset in terms of taking very long to load (or load at all), and is sluggish in analyzer as well.

To attempt to improve performance of this dataset, I am going to remove a lot of unused columns.

The dataset is a "View", but has many ETLs leading up to it.

What I am wondering is, for the purpose of this clean up, would it make a difference if I removed the unused columns within the final View, or if I removed the columns earlier in the lineage? Would one method of column removal work better than the other in terms of hoping the View will perform faster for people?


Thanks

Comments

  • MarkSnodgrass
    MarkSnodgrass Portland, Oregon 🥷

    This is a long article, but definitely worth a read to understand the Domo architecture.

    Data Fundamentals: Understanding Relational Data, Domo Architecture, and Data Pipeline Optimization – Domo

    I would start with the view itself and trim things there. Beyond the number of columns, I would look at the types of joins and aggregations you are doing to ensure they are as efficient as possible. To further refine, you should only need to go back "one level" to the datasets that are used by the view and limit the columns in there.

    Hope this helps.




    **Make sure to <3 any users posts that helped you.
    **Please mark as accepted the ones who solved your issue.
  • Ashleigh
    Ashleigh Florida 🟢

    @Jbrorby I have noticed Domo has been slower this week compared to normal. I would check how many Beast modes are on the card as sometimes those can slow down performance. Maybe also try changing the cards to be driven from an ETL dataset rather than a view (some people have been noticing issues with views). Like Mark mentioned the biggest things that can effect run time of a dataflow are Joins and aggregations (specifically group by tiles). Also the amount of inputs and outputs you have in the dataflow can effect run time so make sure everything you are outputting is needed.