-
-
Notifications
You must be signed in to change notification settings - Fork 30.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-128136: Possibility to customize hardcoded xml declaration #128095
Conversation
Added possibility to customize hardcoded xml declaration
fixed lint error triling spaces unchanged default declaration to pass tests
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Please add a title tag, e.g.
And you also need to add a NEWS to describe your fix. We can use the |
Lib/xml/etree/ElementTree.py
Outdated
@@ -679,6 +679,7 @@ def iterfind(self, path, namespaces=None): | |||
def write(self, file_or_filename, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should add a unit test to test it's behavior. Verify it whether the requirements are met.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some unit tests but there is'nt an issu to add an issue title.
and i think the change has little impact on Python users.
test cases for custom xml declaration
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Features should be announced using What's New entry and a NEWS entry (a "blurb"). Hence, this requires an issue and possibly a discussion for that feature. In particular, adding a new parameter may break existing code (and also, XML is a C extension modules, so you should also port those changes to the C interface, if any). |
fix: unknowen r in code :)
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Marking it as a draft until the CI is green (this is to indicate that the work is currently ongoing). |
Note: the NEWS entry shouldn't contain newlines IIRC. A more verbose explanation should be put in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO only the version should be allowed to be modified. In addition the XML declaration should be a valid one according to XML specs so this is probably a bit more tricky. For the default one, we should instead use None
and internally hardcode the one that will be the default format string. Otherwise, it should be left to the caller the responsibility for formatting it correctly and we only say that it's used through .format(version=version, encoding=encoding)
.
@@ -0,0 +1,8 @@ | |||
New parameter added to the write function that allows you to customize the xml declaration, which was previously hard-coded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I mentioned, the NEWS entry must be shorter (one sentence in general).
Examples should be put in the documentation instead. In addition, a What's New entry must be created for new features.
I don't think there is a "standard declaration" in this sense. The following examples are provided here. <?xml version="1.0"?>
<?xml version="1.0" encoding="UTF-8" ?>
<?xml version="1.0" standalone='yes'?>
<?xml encoding='UTF-8'?>
<?xml encoding='EUC-JP'?>
<?xml version='1.0'?> The corresponding rule in the DTD is:
The VersionNum differ at most between 1.0 and 1.1. From my experience so far, this is only necessary because there are programs which, for whatever reason, can only cope with a certain variant of the declaration. This variant represents a non-breaking change and makes it easiest for the caller to choose the declaration variant that is necessary for him. |
move new parameter to end of argument list to non positional parameters.
It does. For instance: file_or_filename = ...
encoding = ...
xml_declaration = ...
default_namespace = ...
method = ...
# before: "default_namespace" is mapped to the parameter "default_namespace" (OK)
x.write(file_or_filename, encoding, xml_declaration, default_namespace, method)
# after: "default_namespace" is mapped to the parameter "xml_declaration_definition" (not OK)
x.write(file_or_filename, encoding, xml_declaration, default_namespace, method)
What I meant by standard is that we don't want to break existing code that relies on this specific form. So we should have <?xml field=... field=... field=... ?> so we should only be able to specify the different fields IMO.
Which is exactly the reason why this should documented (not at the code level but at the level of https://docs.python.org/3/library/xml.etree.elementtree.html#xml.etree.ElementTree.ElementTree.write). To be clear: I don't oppose the addition of this feature but I'd like some additional confirmations (btw, I don't think we need a C implementation since this part appears to be purely in Python):
|
This should no longer be necessary with the last change.
This is not primarily about the parameters themselves.
Yes, it should be documented anyway :)
Thats wrong, what we actually get is:
And if we add the declaration with <?xml version='1.0' encoding='us-ascii'?>
<data>
<country name="Liechtenstein">
<rank updated="yes">2</rank>
<year>2008</year>
<gdppc>141100</gdppc>
<neighbor name="Austria" direction="E"/>
<neighbor name="Switzerland" direction="W"/>
</country>
<country name="Singapore">
<rank updated="yes">5</rank>
<year>2011</year>
<gdppc>59900</gdppc>
<neighbor name="Malaysia" direction="N"/>
</country>
<country name="Panama">
<rank updated="yes">69</rank>
<year>2011</year>
<gdppc>13600</gdppc>
<neighbor name="Costa Rica" direction="W"/>
<neighbor name="Colombia" direction="E"/>
</country>
</data>
Shouldn't we leave it up to the developer? If he gives an incorrect coding like "HAHA" then he should know why he is doing this.
It is correct that version 1.1 is rarely used, but should we limit the possibility of using it for this reason? I would expect something like that from closed source software, but not from open source. If I understand correctly, in your opinion a That sounds feasible and an alternative approach, but would greatly increase the need to adapt the current code and the associated unit tests. Another nice variant would be to remember the original declaration tag and continue to use it, but this would only work for parsed documents. Perhaps in the future someone with in-depth Python knowledge will be found who has the time to do this. |
If it were a 3rd-party library, no but here we are talking about the standard library which sometimes makes design choices and restrict features on purpose. The reason is maintenance cost vs usability vs use-cases. If we don't have a lot of use-cases, we usually don't include a feature in the standard library and this is best left to pypi packages.
In this case, we should still document this.
A separate PR (no need for an issue) for updating the docs would be appreciated!
Yes, what I meant is that we only allow version, encoding and standalone to be specified but not the format itself. The format and field names are standard but the field values may be customized.
How come? existing tests should not be changed. For instance, if we don't specify
If this is only for |
OK, that now exceeds my willingness to invest any more time in this topic. I hope that it will be possible at some point to achieve an adequate result in the standard library and in the meantime I will continue to use a manual write function in which I write my declaration and then return the string of the element tree. I wish you blessed holidays and a good transition into the new year. |
I understand that the process can be tiring but I'm afraid we need to be conservative when changing things or improving things for a standard library.
In general, a good indication of whether something is worth adding to the standard library is how many use cases there would be (at least, feature-wise).
You too! Considering this, I will close this PR and the dedicated issue as "not planned" for now. Ping me if you want to re-open it and continue the discussion. Note that you can also build up support on Discourse: https://discuss.python.org/c/ideas/6. |
Added possibility to customize hardcoded xml declaration
As rfc and dtd allows single- and doublequotes for the declaration tag.
Manual under https://docs.python.org/3/library/xml.etree.elementtree.html shows doublequotes but that output will never be possible with the hardcoded part in code.
without changing the default behaviour to pass tests.