Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(utils/decorators): rewrite remove task decorator to use ast #43383

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

josix
Copy link
Contributor

@josix josix commented Oct 25, 2024

due to #42766 the dropping of python3.8 support, I rewrite the remove_task_decorator to achieve the same purpose through AST.

TODO:

  • implementation
  • update unit tests

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@josix josix force-pushed the refactor/remove_task_decorator branch 2 times, most recently from 991cccc to 4e926d5 Compare October 25, 2024 16:02
@josix josix marked this pull request as ready for review October 25, 2024 16:14
@josix josix marked this pull request as draft October 25, 2024 16:18
@Lee-W Lee-W self-requested a review October 25, 2024 16:19
@josix josix force-pushed the refactor/remove_task_decorator branch 2 times, most recently from d5e6684 to 606b2ea Compare October 25, 2024 20:02
Comment on lines 128 to 131
def test_remove_decorator_including_comment(self):
py_source = "@task.virtualenv\ndef f():\n# @task.virtualenv\nimport funcsigs"
res = remove_task_decorator(python_source=py_source, task_decorator_name="@task.virtualenv")
assert res == "def f():\n# @task.virtualenv\nimport funcsigs"
Copy link
Contributor Author

@josix josix Oct 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unsure if it is fine to remove the comments after executing ast.unparse. If it is not an allowed behavior, we might not be able to adopt ast as a solution, since it would not preserve comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be better to use libcst https://pypi.org/project/libcst/ - which we alredy consider to use for our "upgrade check" tooling -> #41641 (comment)

this way we could preserve comments, line numbers etc. I think the important thing we might want to keep here is debuggability and particularly line number references, and with AST we will likely loose it - not only the comments.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no expertise in this matter so I can not judge this.

In general I'd favor is we refactor to ensure pytests are staying the same. I am also not sure why the line feed was removed in the other tests. Is this a compatibility limitation or just a cleanup?

Copy link
Member

@potiuk potiuk Oct 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the difference of AST vs. CST.

  • (AST) Abstract Syntax Tree - > is the tree of syntax of Python code that contains only the "meaningful" tokens - that represent a "runtime" python bytecode (with stripped out comments and other irrelevant source code tokens - such as punctuations, idents, parentheses etc.) - https://en.wikipedia.org/wiki/Abstract_syntax_tree

  • (CST) Concrete Syntax Tree -> is the tree of syntax of Python code that represent the source code - not Python runtime bytecode (CST includes EOL characters and comments, indents and parentheses and punctuation marks) - but also all other non-runtime tokens - https://en.wikipedia.org/wiki/Parse_tree

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(that's a bit simplified definition of course - but I think it describes the difference well).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're doing code modification instead of checking only, we probably should go with libcst I think 🤔

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I happened to share this topic at PyCon Taiwan this year. If anyone is interested, this was the slide I used. The content around page 67 is the most relevant.

https://speakerdeck.com/leew/unleash-the-chaos-developing-a-linter-for-un-pythonic-code?slide=67

Copy link
Contributor Author

@josix josix Oct 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think it's better to leave dag authors' code information as much as possible in the rendered script.py of virtualenv operator, ideally the unit tests should not be changed, let me try LibCST instead of AST here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you need any example, I used it to check default_deferrable here https://github.com/apache/airflow/blob/main/scripts/ci/pre_commit/check_deferrable_default.py

@josix josix force-pushed the refactor/remove_task_decorator branch from 1ec8a0a to 28d8f2c Compare October 26, 2024 03:45
@josix josix marked this pull request as ready for review October 26, 2024 05:09
@Lee-W Lee-W marked this pull request as draft November 26, 2024 07:13
Copy link
Member

@Lee-W Lee-W left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As we decide to use CST, just mark it as draft

@josix josix force-pushed the refactor/remove_task_decorator branch from 28d8f2c to 2a82c10 Compare December 16, 2024 16:58
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants