Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type mismatch in email.utils.decode_rfc2231 #10431

Closed
cushionbadak opened this issue Jul 10, 2023 · 1 comment · Fixed by #10437
Closed

Type mismatch in email.utils.decode_rfc2231 #10431

cushionbadak opened this issue Jul 10, 2023 · 1 comment · Fixed by #10437

Comments

@cushionbadak
Copy link
Contributor

Summary

Modification Target: decode_rfc2231 in stdlib/email/utils.pyi
Current type hint: (s: str) -> tuple[str | None, str | None, str]
Proposed type hint: (s: str) -> tuple[str | None, str | None, str] | list[str]

See the current type hint here: utils.pyi#L63

Description

The function decode_rfc2231 can return a list, contrary to what its type hint suggests(tuple), if the string argument s contains two or more single-quote characters '.

def decode_rfc2231(s):
    """Decode string according to RFC 2231"""
    parts = s.split(TICK, 2)
    if len(parts) <= 2:
        return None, None, s
    return parts

If the string argument s includes two or more single-quote character (TICK = "'"), the variable parts becomes a list, as its length exceeds 2, and it bypasses the if condition. As a result, the decode_rfc2231 return type hint should include the list type.

This mismatch (returning a list instead of tuple) was identified in the CPython email unit test, Test8BitBytesHandling.test_get_rfc2231_params_with_8bit found in test/test_email/test_email.py. The minimized example below demonstrates the type mismatch:

import email
import textwrap
msg = email.message_from_bytes(textwrap.dedent("""\
            Content-Type: text/plain; charset=us-ascii;
             title*=us-ascii'en'This%20is%20not%20f\xa7n"""
                                               ).encode('latin-1'))
msg.get_param('title')

In the above example, the function decode_rfc2231 returns ['us-ascii', 'en', 'This is not f�n'], which is called through the following control flow:

email.message.Message.get_param
-> email.message.Message._get_params_preserve
-> email.utils.decode_params
-> email.utils.decode_rfc2231

P.S. All CPython links refer to commit hash 0481b8... from the current CPython 3.12 branch.

@srittau srittau added the stubs: false positive Type checkers report false errors label Jul 10, 2023
@srittau
Copy link
Collaborator

srittau commented Jul 10, 2023

Thanks! So the return type is either tuple[None, None, str] or list[str], where the list has exactly three components. Unfortunately there is no way to annotate fixed-size lists yet (see python/typing#592), so I believe that tuple was used as a hack in the typeshed annotations. Without this hack, the following code would not type check, although it's perfectly legal at runtime:

cs, lang, s = decode_rfc2231(...)

The only alternative I see is using Any as return type, but we would lose all type checking. Overall I think we should keep the hack, although a comment explaining the hack wouldn't go amiss.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants