Please I need help with setting up a retention chart

Hi Domo gurus. I'm trying to create a "line + grouped bar chart" similar to the ones attached below. My goal is to create a chart that shows monthly cohorts of users who have made '100 api calls" and showing x% who returned to make other 100 api calls at the interval: 31 - 60 days, 61 - 90days, 91 - 120days, 121 - 150days. I'm not sure how to create a beast mode to make this happen. I intend to put "Count of users" in the Y axis of the chart, while I put "100th Api call date" on the X axis and then use the calculated beast mode column(s) for the series.

How do you reckon I go about this?



Answers

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @Johnson i really don't understand what you're trying to accomplish. can you explain your goals in 'plain english'?

    how are you defining a cohort?

    what are you trying to track? how many made 100API calls?

    regarding "100 other api calls" what does that mean?

  • @jaeW_at_Onyx thanks for your response. I am trying to understand how many percentage &/or number of our active users come back to take our key activation action. That key action is "an api call" and we have defined "100th api calls" as our activation metric( meaning the 100th time our users make an api call). So what I'm working on now is to build a monthly cohort with that and then try to get the percentage of returning users from that. In our dataset, we have a column named "100th API call date" which includes the datetime for when our users have made their 100th api calls. We also have other columns as well such as the UserId.

  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @Johnson

    So do you have several rows for each customer when they make their 100th API call? Do you have a date for when the customer was created and you're calculating the time difference between those dates or do you only have the 100th api call date? Are you calculating the days between the 100th API call dates or from when the customer was created? Can you post an anonymized sample of your data so we can get an understanding of how your data is structured?

  • Johnson
    Johnson ⚪️
    edited August 31

    Hi @GrantSmith . To answer your questions yes, there are several rows for each customer. To be clear, my primary objective isn't to calculate the time difference between the date of sign up and 100th api call date. Want I want to achieve is to break down the monthly cohorts in the first picture (check attached below) to something similar to the last picture to see the percentage of each monthly cohort users that came back to make another 100th api request at the intervals ; 31- 60days after their first 100th api call date( i.e within the 2nd month), 61 - 90 days (within 3rd month), 91 - 120days (within 4 month), and more than 120days (After 4 months).

    I also attached a sample of how the data looks like in the second picture below. The third picture shows how I approached doing it using the 'line + grouped bar chart' , but I'm obviously doing something wrong. I think it has something to do with beast mode calculations for the series. For the series, I have Month X (which means "within 2nd month after first 100 api call), Month Y (which means "within 3rd month after first 100 api call), Month Z (which means "within 4th month after first 100 api call).

    I used separate beast mode calculations for each of the series respectively:

    Month X = CASE When DATEDIFF(CURRENT_DATE(), `dt_api_request_100`) > 30 AND DATEDIFF(CURRENT_DATE(), `dt_api_request_100`) < 60 Then 'Within 2nd month' end

    Month Y = CASE When DATEDIFF(CURRENT_DATE(), `dt_api_request_100`) > 60 AND DATEDIFF(CURRENT_DATE(), `dt_api_request_100`) < 90 Then 'Within 3rd month' End

    Month Z = CASE When DATEDIFF(CURRENT_DATE(), `dt_api_request_100`) > 90 AND DATEDIFF(CURRENT_DATE(), `dt_api_request_100`) < 120 Then 'Within 4th month' End

    From the retention chart I did (third picture), the month of May has 116 users in its cohort, out of which 487 users came back to make 100 api calls in the 4th month. That doesn't make sense.





  • GrantSmith
    GrantSmith Indiana 🔴

    Hi @Johnson

    Your beast modes are calculating the difference between when they 100th api call happend to today. Not necessarily time between API calls so when you're calculating metrics you're seeing a higher number for future months instead of the first month.

    Assuming you have multiple records / dates for each customer you an get the first 100 api call ate utilizing a window function and then calculate the difference between that date and the second 100 API call date.

    MIN(MIN(`dt_api_request`)) OVER (PARTITION BY `userid`)
    


    CASE When DATEDIFF(MIN(MIN(`dt_api_request`)) OVER (PARTITION BY `userid`), `dt_api_request_100`) > 30 AND DATEDIFF(MIN(MIN(`dt_api_request`)) OVER (PARTITION BY `userid`), `dt_api_request_100`) < 60 Then 'Within 2nd month' end
    


    Window functions are a feature switch - talk with your CSM if you don't have it enabled already.

  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @GrantSmith the downside of this approach is that in order to show the result, each userID would have to be displayed on the axis.

    it might be cleaner to actually build the cohort identifier (we're assuming the min(dt_api_request) into the dataset. this can either be done in ETL OR JOIN in the attribute using a dataset view.

  • Thanks for your quick response @jaeW_at_Onyx @GrantSmith . To be honest, I'm very confused now. Disregarding my previous questions and comments, how do you reckon I go about creating a cohort analysis chart for users with "100 Api Calls date"? I want to know how many came back to do another "100 Api calls" after x days. Check the sample dataset below.


  • jaeW_at_Onyx
    jaeW_at_Onyx Budapest / Portland, OR 🟤

    @Johnson i personally would try to avoid this aggregated view of the data ... but let's set that aside.


    1) you need to add a column to your dataset where you define your cohort. ex ("March Start" or "April Start" ... use last_day() if you're defining cohorts by the month of their first call)

    2) then you need to define a column for 'what makes them interesting (ex. completed 100 calls within 10 days of their first call" -- case when datediff() <= 100 then 1 else 0 end ). keep adding columns for 'things that make them interesting (ex. completed 200 calls OR completed 200 calls within 10 days).

    3) profit.