pydomo export_dataset - exporting Domo dataset with millions of rows failing

I have the below that exports a dataset to a string, I'm trying to export a dataset with 257 million rows but I don't see any logs as to why it fails It just fails!

When I try the below with a smaller dataset (250k rows) I'm able to export, is there a way to programmatically break out the dataset into chunks?

domo = Domo(client_id, client_secret, logger_name='foo', log_level=logging.INFO, api_host=API_HOST, **kwargs)
 
dataset = domo.dataset
 
include_csv_header = True
datasets_parent = dataset.data_export(dataset_id, include_csv_header)

logging.info(f'datasets_parent: {datasets_parent}')


Tagged:

Answers

  • Have you tried using the ds_query function of pydomo? It allows you to run a query against the dataset and return the results. If you have something like a sequential id or a date you could use that to pull sections of the data and export them.

    It might look something like this:

    data = domo.ds_query(dataset_id, 'SELECT `field1`, `field2` FROM table WHERE id >= ' + min + ' AND id < ' + max)

    You could then write the code to iterate through mins and maxes to get through all the records. If you wanted to pull the count first you could make it iterate intelligently until it gets all the records.

    I hope that helps!

  • Oh!! this would work, thank you @zcameron !

  • KristiKovacs
    KristiKovacs ⚪️
    edited November 10

    @zcameron - I keep seeing the following error when I try to use the ds_query method:

    'Domo' object has no attribute 'ds_query'
    

    I tried with

    domo = Domo(client_id, client_secret, logger_name='foo', log_level=logging.INFO, api_host=apihost)
    
    
    domo.query(...)
    


    but I see the error:

    AttributeError: 'Domo' object has no attribute 'query'
    

    I've also tried with

    dataset = domo.datasets
    dataset.query()
    
    # and dataset.query()
    
    
    AttributeError: 'DataSetClient' object has no attribute 'query'
    


    but same object doesn't have query or ds_query attribute

  • I tried the following code this morning with a successful outcome:

    from pydomo import Domo
    
    client_id = '<client_id goes here>'
    client_secret = '<client_secret goes here>'
    api_host = 'api.domo.com'
    dataset_id = '<dataset_id goes here>'
    outputFileName = 'output.csv'
    
    domo = Domo(client_id, client_secret, api_host)
    data = domo.ds_query(dataset_id, 'SELECT * FROM table WHERE `id` > 100')
    data.to_csv(outputFileName)
    
    


    Any chance that highlights what you were missing? In your first example, I'm not seeing the "ds_" prefix, but I wasn't sure if that was because you had tried that before and were trying other variations.


    Let me know if that helps!

  • Ahh @zcameron I found out its actually because I had a really old version of pydomo and just updated to the newest one!. ty!

  • @zcameron - Since my dataset is really big, 257 mill rows, I have it set to query for only 3 days of data at a time (we have data back from 2018!) so the API keeps timing out. :( What is the rate limit for the API?

  • The query command will export up to 1M rows at a time. I guess that's still a lot of iterations to get to all your data if you have 257M. Is this a one-time operation or something that will be ongoing?


    Something else to consider might be the Domo CLI tool. It has an export-data command that can handle larger datasets. I recently exported a 4.5GB csv file with it. You can also do queries with that command if you need to chunk it for time/memory considerations as well. Have you worked with the CLI tool before?

  • KristiKovacs
    KristiKovacs ⚪️
    edited November 15

    @zcameron - I've just downloaded the jar file and I have the cli open, I tried to run:

    export-data -i {my_dataset_id} -f test.csv

    and I see Unrecognized command


    I then tried to run the connect statement thinking perhaps that would work, I am an admin but I see the unauthorized error:

    connect -s {domain_name} -t {my_token}

    Failed to get connected User. Error: Unauthorized

    You are not connected to a Domo server. Check your domain, username/password or token.

  • Did you create an access token to use with the connect command? You can do that in the admin section of Domo. Once you have that, you can use the connect command like the following example:


    connect -s myinstance.domo.com -t aflasdkfn0923ro2ffaoafo23oawefoe2

    Note that the domain contains ".domo.com".

    After you're connected, you should be able to run the export-dataset command.


    Did that get you any further?

  • @zcameron Thanks for responding, that worked! I'm now connected but I came across another error when running the export-data I see Could not find acceptable representation

    export-data -i {my_dataset} -f test.csv

    ---------

    An error occured while executing the command: Not Acceptable

    RequestFailedException{failureType=CLIENT_ERROR, failureBody='{"status":406,"statusReason":"Not Acceptable","message":"Could not find acceptable representation","toe":"C3U952WAGX-ABF16-5VRO6"}'}

    at com.domo.client.DomoClient.processResponseStatus(DomoClient.java:1800)

    at com.domo.client.DomoClient.getDataFile(DomoClient.java:619)

    at com.domo.client.manager.DatasetManager.exportData(DatasetManager.java:2232)

    at com.domo.client.manager.DatasetManager.exportData(DatasetManager.java:2213)

    at com.domo.util.commands.ExportDataCommand.execute(ExportDataCommand.java:110)

    at com.domo.util.Main.main(Main.java:78)

  • Does it do that if you try it with a different, smaller dataset? Also, what type of dataset is the the one you're trying to export (dataflow output, view, connector dataset, etc.)?

  • KristiKovacs
    KristiKovacs ⚪️
    edited November 15

    @zcameron - That dataset is a Google BigQuery

    I tried with another dataset that's a Google Sheet Connector and saw the same error response. This dataset had 30k rows unlike the Google BigQuery which is 257 million rows

  • For the sake of being thorough, you're grabbing the dataset ID from the URL when viewing the details page of the dataset in Domo, right?

    Any chance you could post a screenshot of the command and response? Don't post anything showing your connection credentials, of course. Just the export-data command.