Best strategy for appending data from multiple sites to one DataSet?
I posted a question earlier that had way too many parts, so I'm breaking out individual questions. Hopefully, these are easier to answer now and easier for other people to discover later. I can figure much of this out by testing myself, but am hoping others already know and can answer without much effort.
Currently, we're storing a lot of custom data in a central server. To get it into Domo, we summarize it, generate spreadsheets, and feed these into Domo via Workbench. It works really well, partly because we've been able to do full DataSet replacements rather than "appends." As we're accumulating more data, the process is slowing down. Also, as we build out, we'll have multiple servers, not a single consolidated servers. This information is the background for what we're looking at doing next.
We've got our DatSets, page filters and PDP rules set up to allow us to reuse pages and cards to show very different slices of data. Our DataSets use consistent column names which makes it easy to write straightforward PDP rules that make Page Filters super nice. Rather than rework cards or pages, we've got consolidated DataSets + filters for custom views for each site/department/user. What we're hoping to do is continue along these lines, but also have multiple sites appending data to a single DataSet.
Is this possible, or does each site need a distinct DataSet that we then combine on the back-end? If we do need to combine DataSets, should I look at a DataFlow or a DataFusion? The columns will be identical for all DataSets. (It's been a few months since I tried out DataFlows and I've never tried DataFusion.)
Hopefully, we wont' need to clutter up our system with lots of little DataSets + DataFlow/Fusion and can append directly the master DataSet. I'm sure others have faced exactly the situation we're in and would be grateful for advice. I've had a look at several options for how to handle multiple sites pushing into Domo:
- Workbench + append.
- Centralizing everything in Postgres and pulling from there in Domo directly, without Workbench.
- ODBC calls from Workbench
- Using the DataSet API to perform custom appends.
- Using the Stream API to perform custom appends.
- ? What have I missed?
Of these options, the DataSet API seems like the most straightforward. I've used a bunch of the "read" APIs in the past to pull down DataSet, etc. details for auditing and review. The APIs were fine to work with. I just tried a little "hello world" code and was able to create a DataSet easily enough. Below are my initial observations on each option I've found, using a nightly update file on each site as a sample scenario.
Workbench + append
Problematic. Workbench looks for a specific, exact file path and does not manage a watch folder. Therefore, ff Workbench fails to process a nightly job, it's lost. (It gets overwritten by our code the next night.) I can't see a sensible way to automate detecting when our pushes have gaps. Also, the forums have many mentions of bugs/problems with APPEND from Workbench. This may no longer be active bugs, I don't know.
Centralizing everything in Postgres and pulling from there in Domo directly, without Workbench.
Love Postgres, don't really want to do this. Also, the Domo PostgreSQL Connector is....not super efficient. I've watched the raw SQL coming through and it generated ~25,000 log lines to pull 10 simple rows. So, I'd rather not lean too hard on the Domo PostgresSQL connector. (It looks to be using some kind of Java framework that uses cursors and excessively iterative code.)
ODBC calls from Workbench
No. Just no. I'm totally alergic to ODBC.
Using the DataSet API to perform custom appends
Using the Stream API to perform custom appends
Looks good...but more complicated. We can do this way if necessary.
Thanks very much for any help.