"Column referenced but not found" Python error creating date-sensitive columns in ETL
To take an existing dataset that shows event start and end dates, and a total value and parse that info into a daily value that appears in columns representing all dates between start and end.
Using Domo Dimensions Calendar, select only rows from current date to +2 years.
Use the result in a Python script that appends a new column to an existing table, one per calendar row. i.e. todays script will add columns '2022-12-21' to '2024-12-21' to the data table.
However, the script regularly fails with "Column referenced but not found: 2022-12-09" (This is today's example) when writing the output dataframe.
I used to think that this was maybe a timezone problem (I am n GMT) and that the referenced column represented yesterdays date in my timezone, but today on the servers timezone.
However, today's error referenced a date 12 days ago!
This problem is not consistent though and tends not to occur in the afternoon GMT. Again, hinting at a timezone problem. I am now not so sure and need expert guidance as I am a Python noob
# Import the domomagic package into the script from domomagic import * import pandas as pd import numpy as np from datetime import * # read data from inputs into a data frame cal = read_dataframe('CAL.WHERE') eventdf = read_dataframe('EVENTS.APPEND') # for each value in the dt column of the calendar # create an array of length = number of calendar columns cols = [None] * cal.shape # create an array of default values for each new column to be added to the dataframe vals =  * cal.shape rng = range(cal.shape) ind = [str(x) for x in rng] today = date.today() for i, row in cal.iterrows(): dt = row['dt'].date() cols[i]= str(dt) # now we need to turn our cols (columns) and vals (rows) into a dictionary # so that we can then in turn convert that into a new dataframe zip_iterator = zip(cols, vals) dictionary = dict(zip_iterator) # create a new empty dataframe from the dictionary new_df = pd.DataFrame(dictionary, index=ind) # create a new dataframe from our two input dataframes: df and new_df output_df = pd.concat([eventdf, new_df], axis=1, ignore_index=False) # iterate through each row in the dataframe to populate date columns between # start and end dates with the daily value rowcount = 0 skip = 0 for i, row in output_df.iterrows(): eventid = row["ID"] if type(eventid ) != "<class 'str'>": eventid = str(eventid) start_dt = row["Start Date"].date() end_dt = row["End Date"].date() # if either of the date values are NaT, then skip this row if pd.isnull(start_dt): print('Row ' + str(rowcount) + '. ' + eventid + " skipping as start date is null") skip = skip + 1 continue if pd.isnull(end_dt): print('Row ' + str(rowcount) + '. ' + eventid + " skipping as end date is null") skip = skip + 1 continue # we are only interested in future values, so ignore any dates prior to today if start_dt < today: print(eventid + " begins before today. Setting start date to today") start_dt = today # if the event ends before today, skip it if end_dt < today: print('Row ' + str(rowcount) + '. ' + eventid + " skipping as end date before today") skip = skip + 1 continue effort = row["Story Daily Effort"] dates = pd.date_range(start=start_dt, end=end_dt) #print(story_id + ': ' + str(start_dt) + ' - ' + str(end_dt)) col_nm = str(start_dt) j = output_df.columns.get_loc(col_nm) output_df.iloc[i, j:j+len(dates)] = effort rowcount = rowcount + 1 print("Skipped " + str(skip)) print("Successfully processed " + str(rowcount)) # write a data frame so it's available to the next action write_dataframe(output_df)
- 7.3K All Categories
- 13 Getting Started in the Community
- 141 Beastmode & Analytics
- 1.8K Data Platform & Data Science
- 54 Domo Everywhere
- 2K Charting
- 1K Ideas Exchange
- 903 Connectors
- 236 Workbench
- 342 APIs
- 77 Apps
- 19 Governance & Productivity
- 234 Use Cases & Best Practices
- 50 News
- 473 Onboarding
- 572 日本支部