-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore(sinan DAGS): create DAG to fetch dengue data from SINAN #201
base: main
Are you sure you want to change the base?
Conversation
89c1f65
to
c0cee37
Compare
except ProgrammingError as error: | ||
if str(error).startswith("(psycopg2.errors.UndefinedColumn)"): | ||
# Include new columns to table | ||
column_name = str(error).split('"')[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that obtaining the missing column name from the error message is not a good approach, because if psycopg2 changes the wording in their error messages it will break our code. I think we should instead look at the list of column names of the parquet files and compare them with the columns in the current schema. From the difference in these lists, which can be efficiently obtained as list(set(cols1)-set(cols2))
, we can then create the alter table query adding the new columns to the database table. With this approach, we don't even need to rely on an exception being raised. This determination of the missing columns can be done before the first insert.
logging.debug(f"{file} inserted into db") | ||
try: | ||
insert_parquets(parquets.path, year) | ||
except ProgrammingError as error: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am thinking if it would make sense to merge these three DAGs into a Single SINAN DAG, which would take the disease name as a parameter, much like we have in PySUS, as a single function to fetch all the "agravos"
f6f0cd7
to
c76c431
Compare
No description provided.