DataFusion Reuse DataSet Under Different Aliases

The general use case is a single DataSet with multiple columns that constitute foreign keys referencing different records on the same table. DataFusion does not support this right now because each DataSet can only be included once.

 

Here's a concrete example:
Tables

  1. Users
    1. id
    2. name
  2. BlogPost
    1. id
    2. content
    3. author_id (FK into Users.id)
    4. editor_id (FK into Users.id)
    5. manager_id (FK into Users.id)

If I want to make the author, editor, and manager names available in a card using a DataFusion combining these two tables, I cannot simply join multiple instances of the Users table in the DataFusion definition. I am forced to either denormalize the data before I suck it into Domo or use some kind of misdirection with another DataFusion/ETL process within Domo to present the exact same Users table as a "different" DataSet to feed into the DataFusion.

 

It would be wonderful if DataFusion supported aliasing so that this situation just works.

1
1 votes

· Last Updated

Comments

  • Thank you for posting this idea.  I am assigning to our product manager @mattchandler for review.

  • Hi @cr1ckt,

     

    Thanks for the question! I'm sure there's a reason that joining once on ID isn't giving you the results you want (or else you'd just do that), but with my limited understanding of the data I'm having trouble understanding why that is. Could you explain further? If it's easier, we could also set up a call through private messages to discuss futher.

     

    Best,

    Matt Chandler

    Product Manager

    Domo

    Best,
    Matt Chandler
    Domo
  • Best,
    Matt Chandler
    Domo
  • Sure Matt,

     

    The problem is that I need to fuse one BlogPost record with three different records from the user table (the author, the editor, and the manager). In most SQL engines, you achieve this by joining the user table three times with different foreign keys and aliases for each join.

    select * from BlogPost b
    join users authors on b.author_id = authors.id
    join users editors on b.editor_id = editors.id
    join users managers on b.manager_id = manager.id

    That would let me retrieve the names of my author, editor, and manager for each blog post.

    The DataFusion interface only lets each source get used at most once, so it can only support the following:

    select * from BlogPost b
    left outer join users u on

             b.author_id = u.id
    AND b.editor_id = u.id
    AND b.manager_id = u.id

    This will only return me the name of the user if all three foreign keys happen to point to the same user. Converting to an inner join will throw out the BlogPost entirely.

    It is possible to get around this by creating an ETL job to copy the user table into other DataSets so you have three "unique" input DataSets for the DataFusion, but this requires some extra configuration and processing time to maintain and lacks the elegance of utilizing the capability of the underlying SQL engine.

  • @cr1ckt That makes sense—thanks for all the details! I'll add an item to our roadmap to address this use case. In the mean time, you could also use a MySQL-powered DataFlow in Domo, and just write the SQL statements there. Do you have access in your instance? If not, we could get it turned on for you. If you do have them turned on, it would be an option when you click "New DataFlow" on the DataFlows tab of the DataCenter.

    Best,
    Matt Chandler
    Domo
  • Yes I do have access to the MySQL DataFlow.

    Thanks for taking a look at this.

This discussion has been closed.