Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore(sinan DAGS): create DAG to fetch dengue data from SINAN #201

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

luabida
Copy link
Contributor

@luabida luabida commented Oct 9, 2023

No description provided.

@luabida luabida force-pushed the create-sinan-dags branch from 89c1f65 to c0cee37 Compare October 9, 2023 14:22
@luabida luabida marked this pull request as ready for review October 10, 2023 14:26
@luabida luabida requested a review from fccoelho October 10, 2023 14:26
except ProgrammingError as error:
if str(error).startswith("(psycopg2.errors.UndefinedColumn)"):
# Include new columns to table
column_name = str(error).split('"')[1]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that obtaining the missing column name from the error message is not a good approach, because if psycopg2 changes the wording in their error messages it will break our code. I think we should instead look at the list of column names of the parquet files and compare them with the columns in the current schema. From the difference in these lists, which can be efficiently obtained as list(set(cols1)-set(cols2)), we can then create the alter table query adding the new columns to the database table. With this approach, we don't even need to rely on an exception being raised. This determination of the missing columns can be done before the first insert.

logging.debug(f"{file} inserted into db")
try:
insert_parquets(parquets.path, year)
except ProgrammingError as error:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as above

@luabida luabida requested a review from fccoelho October 11, 2023 19:03
Copy link
Contributor

@fccoelho fccoelho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am thinking if it would make sense to merge these three DAGs into a Single SINAN DAG, which would take the disease name as a parameter, much like we have in PySUS, as a single function to fetch all the "agravos"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants