Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update code style for airflow db commands to SQLAlchemy 2.0 style #31486

Merged
merged 5 commits into from
May 26, 2023

Conversation

ephraimbuddy
Copy link
Contributor

This commit introduces changes to the code styles of airflow db commands to remove 'RemovedIn20Warning' and ensure compatibility with SQLAlchemy 2.0.

To see these warnings, you need to set SQLALCHEMY_WARN_20=True when using the db commands

Part of #28723

@uranusjr
Copy link
Member

Hmm, db init is failing with

sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "created_at" violates not-null constraint
DETAIL:  Failing row contains (1, {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log, {dag_id}-{task_id}-{execution_date}-{try_number}, null).

But LogTemplate.created_at has default=timezone.utcnow, which should kick in to add a default value on the client side to the insert query, according to documentation. This has been working fine in 1.x and I don’t see any notes indicating anything changed in 2.0. Something is off.

@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch from 9619d4b to d5943f4 Compare May 24, 2023 06:22
@ephraimbuddy
Copy link
Contributor Author

Hmm, db init is failing with

sqlalchemy.exc.IntegrityError: (psycopg2.errors.NotNullViolation) null value in column "created_at" violates not-null constraint
DETAIL:  Failing row contains (1, {{ ti.dag_id }}/{{ ti.task_id }}/{{ ts }}/{{ try_number }}.log, {dag_id}-{task_id}-{execution_date}-{try_number}, null).

But LogTemplate.created_at has default=timezone.utcnow, which should kick in to add a default value on the client side to the insert query, according to documentation. This has been working fine in 1.x and I don’t see any notes indicating anything changed in 2.0. Something is off.

It should be fine now. I think I went overboard and used the metadata we use in creating DB models to reflect tables

@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch from d5943f4 to 43d43cb Compare May 24, 2023 07:29
@ephraimbuddy ephraimbuddy marked this pull request as ready for review May 24, 2023 07:43
@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch 2 times, most recently from be4a85a to fdeaf05 Compare May 24, 2023 14:56
@uranusjr uranusjr added use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) and removed use public runners Makes sure that Public runners are used even if commiters creates the PR (useful for testing) labels May 24, 2023
@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch from fdeaf05 to f7bef4f Compare May 25, 2023 05:42
else:
for tbl in tables:
try:
table_name = tbl if isinstance(tbl, str) else tbl.__tablename__
metadata.reflect(only=[table_name], extend_existing=True, resolve_fks=False)
metadata.reflect(bind=bind, only=[table_name], extend_existing=True, resolve_fks=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the bind=bind part still needed? You already set metadata.bind above.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. It was surprising to me too. Here's the warning that it throws:

/opt/airflow/airflow/utils/db.py:1023 RemovedIn20Warning: The ``bind`` argument for schema methods that invoke SQL against an engine or connection will be required in SQLAlchemy 2.0. (Background on SQLAlchemy 2.0 at: https://sqlalche.me/e/b8d9)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the metadata.bind line also then? Seems odd to need both.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have removed it. I remember having some issues around the metadata but let's see if the CI passes cause I can't reproduce locally

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happened was that the metadata that was returned by the function was not bound to any database. The other binds are for the reflect methods. Despite the session being bound to metadata, they still want the bind to happen on its method

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe instead we can add bind=bind explicitly to the method calls instead? Not sure how feasible that is without actually digging into the call stack.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seeing this now...taking a look

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See 8a08c05, let me know what you think

else:
for tbl in tables:
try:
table_name = tbl if isinstance(tbl, str) else tbl.__tablename__
metadata.reflect(only=[table_name], extend_existing=True, resolve_fks=False)
metadata.reflect(bind=bind, only=[table_name], extend_existing=True, resolve_fks=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need the metadata.bind line also then? Seems odd to need both.

@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch 2 times, most recently from 35991d5 to c2423ca Compare May 25, 2023 18:17
@ephraimbuddy
Copy link
Contributor Author

Please don't merge yet, investigating MSSQL failure locally(it's canceled on CI)

@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch from c2423ca to 98643ee Compare May 26, 2023 08:58
@ephraimbuddy
Copy link
Contributor Author

Please don't merge yet, investigating MSSQL failure locally(it's canceled on CI)

Resolved: 98643ee

This commit introduces changes to the code styles of `airflow db` commands to remove 'RemovedIn20Warning'
and ensure compatibility with SQLAlchemy 2.0.

To see these warnings, you need to set SQLALCHEMY_WARN_20=True when using the db commands
@ephraimbuddy ephraimbuddy force-pushed the removed-in-20-dbreset branch from 98643ee to edbbb7e Compare May 26, 2023 10:37
@ephraimbuddy ephraimbuddy merged commit afa9ead into apache:main May 26, 2023
@ephraimbuddy ephraimbuddy deleted the removed-in-20-dbreset branch May 26, 2023 19:30
@ephraimbuddy ephraimbuddy added this to the Airflow 2.7.0 milestone Jul 6, 2023
@ephraimbuddy ephraimbuddy added the type:improvement Changelog: Improvements label Jul 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:core type:improvement Changelog: Improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants