SQL dataflow help

Need some help from the SQL guru's out there.  Table as seen below.
 
    User Name    User Org      Client Name    Client Org

      Phil              ABC Comp         Jack          DEB Comp

      Phil              ABC Comp         Jack          EFT Comp

      Phil              ABC Comp         Jack          ABC Comp

      Phil              ABC Comp         Jack          OKA Comp

      Phil              ABC Comp         Jack          IPA Comp

      Phil              ABC Comp         Jill             CDE Comp

      Phil              ABC Comp         Jill             XYZ Comp

 

The situation is Jack is the same client associated to many companies.  If Jack is associated to the company that Phil is associated to, I don't want it to show up in my output of the data flow.  In the above example.  Jill would show up in my output because she is not associated to the company that Phil is associated to.  Output as seen:

 

    User Name    User Org      Client Name    Client Org

      Phil              ABC Comp         Jill             CDE Comp

      Phil              ABC Comp         Jill             XYZ Comp

 

Thanks for the help, Wendi

Best Answer

  • RGranada
    RGranada 🟢
    Accepted Answer

    I can't see why it should not work. 

     

    Try this version, it has minor changes:

     

    Select * from tst_dj_clients_poc
    WHERE `Client Name` NOT IN (SELECT DISTINCT `Client Name` From tst_dj_clients_poc WHERE `User Org`=`Client Org`)

     

    Can you share a sample of your real data?

     

    Regards,

Answers

  • Hi,

     

    Those are two rather similar ways of approaching your problem:

     

    First approach: 

    Select * from your_table_name 
    WHERE [Client Name] NOT IN (SELECT [Client Name] From your_table_name WHERE [User Org]=[Client Org])

     

    Second approach: 

    SELECT a.* FROMyour_table_nameAS a INNER JOIN 
    (SELECT [Client Name] FROM your_table_name WHERE ([User Org] = [Client Org])) AS b ON a.[Client Name] <> b.[Client Name]

     

    These two queries are very similar in terms of performance, personally, I would go with the second one. They were tested in RedShift but should work in MySql with no problems.

     

    Here's some great info on how to create a dataflow,  in case you need:

    http://knowledge.domo.com?cid=createsqldf

     

    Hope this helps, don't hesitate to ask if you have any problems with this.

  • Thanks Ricardo however it didn't work.  The first approach gave me the results (using my example table):  It still provided Jack as a client name only where Client Org didn't = User Org.  I don't want to see Jack at all.

    User Name    User Org      Client Name    Client Org

          Phil              ABC Comp         Jack          DEB Comp

          Phil              ABC Comp         Jack          EFT Comp

          Phil              ABC Comp         Jack          OKA Comp

          Phil              ABC Comp         Jack          IPA Comp

          Phil              ABC Comp         Jill             CDE Comp

          Phil              ABC Comp         Jill             XYZ Comp

     

    The 2nd approach did not run.  It was running for over a day. I have 14 million rows in one dataset and 8 million rows in another dataset and so the dataflow would not work.  Any other thoughts?  Thanks, Wendi

  • Hi @wgechter,

     

    It's strange, the first approach runs for me. The following example table : 

     

    2017-10-31_094905.png Using the following query :

     

    Select * from tst_dj_clients_poc
    WHERE `Client Name` NOT IN (SELECT `Client Name` From tst_dj_clients_poc WHERE `User Org`=`Client Org`)

     

    Results in:

    2017-10-31_094835.png

    We should be missing something... I'm testing in DOMO with a MySql dataflow.

     

    Basically what the query does is select all the records from the example table where the client name is not contained in the set of client names where the user org is the same as Client Org.

     

    Please check your example table and query against mines. Tell me your conclusions, please.

     

    Best regards.

     

  • It may be that I have more columns than just the 4.  I actually have several more about the client and about the user.  This may be causing the discrepancy.  

  • Can you share the schema?

     

    Regards.

  • event date, event, user login, username, user org, client name, client dob, client number, client org, client org number.

     

    The static fields here are; user login, username, user org, client name, client dob, client number.  The rest of the fields may vary but if the client org = the user org I don't want the client displayed at all for any event.  Does that make sense?

  • Actually this worked!!  I was looking incorrectly at my data but this has worked!  Such an easy query for the fix.  Thank you so much, Wendi