Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Legacy XML reports should preserve new line characters in message attributes #4174

Closed
gtach2o opened this issue Dec 5, 2024 · 16 comments · Fixed by #4186
Closed

Legacy XML reports should preserve new line characters in message attributes #4174

gtach2o opened this issue Dec 5, 2024 · 16 comments · Fixed by #4186

Comments

@gtach2o
Copy link

gtach2o commented Dec 5, 2024

Hello!

After switching test execution from using gradle with junit platform to console launcher I noticed that
value of message attribute in JUnit xml report loosing new line characters.
e.g.
Using gradle xml :

<error message="line 1&#10; line 2&#10; line 3&#10;"...

Using Console launcher xml:

<error message="
line 1
line 2
line 3
"

Steps to reproduce

  1. Run tests using gradle with junit platform. Throw multiline exception in test.
  2. Run tests using console launcher, throw multiline exception in test.

Context

org.junit.platform:junit-platform-console-standalone:1.11.3
org.junit.jupiter:junit-jupiter-engine:5.11.3
JRE 23

As I understand console launcher uses new open/xml report generator and gradle platform launcher uses legacy report generator, right?

@marcphilipp
Copy link
Member

@gtach2o I'm not sure what you mean. In your example, there are newline characters in the attribute value. While Gradle seems to replace \n using &#10;, the Console Launcher includes them as-is. This is well-formed XML. Does it cause any issues downstream?

@gtach2o
Copy link
Author

gtach2o commented Dec 6, 2024

@gtach2o I'm not sure what you mean. In your example, there are newline characters in the attribute value. While Gradle seems to replace \n using &#10;, the Console Launcher includes them as-is. This is well-formed XML. Does it cause any issues downstream?

Well, that xml is processed by some other services and message value extracted with javax.xml.bind.annotation.XmlAttribute and stored in DB, to be later shown on html dashboard.
There is no problem with tests executed with gradle and their message is displayed correctly in the browser, with newlines. The message from tests executed with the console launcher does not displayed with line breaks. It's just one long string because there is no \n in the original value.

While Gradle seems to replace \n using &#10;

Can console launcher do this as well?

@marcphilipp marcphilipp self-assigned this Dec 6, 2024
marcphilipp added a commit that referenced this issue Dec 6, 2024
While XML attribute values may contain whitespace such as line breaks,
the XML spec [1] dictates that XML processors must replace them with
spaces which causes downstream tools to misrepresent the original value.

[1] https://www.w3.org/TR/xml/#AVNormalize

Resolves #4174.
@marcphilipp
Copy link
Member

I've looked into it in #4180. However, the closest I can get with the current STaX-based infrastructure is this:

<failure message="line 1&amp;#10; line 2">...</failure>

That's because when passing "line 1&#10; line 2" to javax.xml.stream.XMLStreamWriter#writeAttribute(String, String) it will replace all & characters with &amp;.

@gtach2o Could you please check whether that kind of "double-escaping" works with your XmlAttribute-based processing?

@gtach2o
Copy link
Author

gtach2o commented Dec 6, 2024

That's because when passing "line 1&#10; line 2" to javax.xml.stream.XMLStreamWriter#writeAttribute(String, String) it will replace all & characters with &amp;.

I think we only need to replace \n with &#10;.
Thrown exception (error or failure) contains \n not &#10; (it is usually just a text with a stacktrace) after processing with XMLStreamWriter xml will have

<failure message="line 1&#10; line 2&#10;">...</failure>

One more note: if message contains quotes " they already escaped with &quot; so there is some escaping mechanism applied. Just need add &#10; for \n

@gtach2o
Copy link
Author

gtach2o commented Dec 6, 2024

How gradle with

test {
    useJUnitPlatform()
}

does it?

@gtach2o
Copy link
Author

gtach2o commented Dec 6, 2024

Maybe some double-escape avoiding mechanism can be applied

String escapeXmlAttribute(String input) {
        StringBuilder result = new StringBuilder();
        int length = input.length();
        for (int i = 0; i < length; i++) {
            char c = input.charAt(i);
            if (c == '&' && input.startsWith("&#10;", i)) {
               result.append("&#10;");
                i += 4;
            } else if (c == '\n') {
              result.append("&#10;");
            } else {                

result.append(StringEscapeUtils.escapeXml10(String.valueOf(c)));
            }
        }
        return result.toString();
}

or something like that, but to not reinvent the wheel would be better to look up how it s done during useJUnitPlatform()

@marcphilipp
Copy link
Member

I did some more digging. The problem is that the default implementation of XMLStreamWriter does not replace \n, \r, and \t in attribute values while it does do so for "/' and &. I haven't found a way to customize that behavior.

However, adding Woodstox to the classpath fixes the problem since it does escape those characters by default. It would even allow to configure a custom escaping strategy since it's STaX2-compatible but we don't need that.

I don't think we should add Woodstox as a dependency to junit-platform-reporting, though. Doing so could affect the code under test which would suddenly use a different XML parser/writer implementation. Thus, I've added test coverage and documentation in #4184.

@gtach2o Could you please verify that adding Woodstox to your test runtime classpath solves the problem for you as well?

@marcphilipp
Copy link
Member

How gradle with

test {
useJUnitPlatform()
}

does it?

FWIW, Gradle uses its own XML writer.

@gtach2o
Copy link
Author

gtach2o commented Dec 7, 2024

@gtach2o Could you please verify that adding Woodstox to your test runtime classpath solves the problem for you as well?

There is no way to do it without adding third party dependency ?

@gtach2o
Copy link
Author

gtach2o commented Dec 8, 2024

Can you create a custom wrapper class around XMLStreamWriter that overrides the writeAttribute method to handle escaping these characters? The rest of methods can use the original writer.

public class CustomXMLStreamWriter implements XMLStreamWriter {
    private final XMLStreamWriter delegate;

    public CustomXMLStreamWriter(XMLStreamWriter delegate) {
        this.delegate = delegate;
}

@Override
    public void writeAttribute(String localName, String value) throws XMLStreamException {
       
        String escapedValue = value
            .replace("\n", "&#10;")
            .replace("\r", "&#13;")
            .replace("\t", "&#9;");
        delegate.writeAttribute(localName, escapedValue);
}
. . .

Then use it in legacy.xml.XmlReportWriter

      XMLStreamWriter xmlWriter = new CustomXMLStreamWriter(factory.createXMLStreamWriter(out)); 

Maybe this and no need in woodstox?

@marcphilipp
Copy link
Member

marcphilipp commented Dec 8, 2024

No, that wouldn't help since the delegate would then escape & so we'd end up with &amp;#10;.

But maybe I can wrap the underlying Writer and replace them there. I'll give that a try.

marcphilipp added a commit that referenced this issue Dec 8, 2024
While XML attribute values may contain whitespace such as line breaks,
the XML spec [1] dictates that XML processors must replace them with
spaces which causes downstream tools to misrepresent the original value.

[1] w3.org/TR/xml#AVNormalize

Resolves #4174.
@marcphilipp marcphilipp linked a pull request Dec 8, 2024 that will close this issue
@marcphilipp marcphilipp changed the title console-standalone: junit xml report doesn't preserve new line characters in message attribute Legacy XML reports should preserve new line characters in message attributes Dec 8, 2024
@marcphilipp
Copy link
Member

@gtach2o
Copy link
Author

gtach2o commented Dec 8, 2024

Seems to work:

* [Escape whitespace chars in XML attribute values #4186](https://github.com/junit-team/junit5/pull/4186)

Cool!
So, woodstox is there only for tests?
Are there jar files somewhere I can download to try it out?

@marcphilipp
Copy link
Member

Yes, Woodstox is only used for testing (as an additional test task).

I've now merged the PR and you can consume 5.12.0-SNAPSHOT versions from https://oss.sonatype.org/content/repositories/snapshots/

@gtach2o
Copy link
Author

gtach2o commented Dec 9, 2024

@marcphilipp Thank you! It works as expected! I presume 5.12 will be released in ~Feb 2025 ?

@marcphilipp marcphilipp modified the milestones: 5.12 M1, 5.11.4 Dec 9, 2024
@marcphilipp
Copy link
Member

We will include this in 5.11.4 which will be released toward the end of this week.

marcphilipp added a commit that referenced this issue Dec 11, 2024
While XML attribute values may contain whitespace such as line breaks,
the XML spec [1] dictates that XML processors must replace them with
spaces which causes downstream tools to misrepresent the original value.

[1] w3.org/TR/xml#AVNormalize

Resolves #4174.

(cherry picked from commit 432d556)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment