Funadmental Flaws inherent in the design of AtlasEngine #16252
Replies: 5 comments
-
Correct me if something has changed in the implementation, but the general concept underlying AtlasEngine is that a series of Unicode points (extended grapheme cluster). The basic assumption inherent to the design is that Unicode points far away, cannot affect the rendering of other Unicode points. However, this just isn't the way Unicode/OpenType text rendering is specified. Some exceptions with some brief research:
Input
Becomes
The crux of the issue is that the concept of AtlasEngine fundamentally violates the documented requirements of Uniscribe to operate on "entire paragraphs" of text at a time, as the smallest possible unit. Some of these issues likely can't be fixed without rewriting Uniscribe. Others might be worth fixing by rewritting small parts of Uniscribe outside of it (like range tracking the RTL stack as colors are done now). And this will be an ongoing issue, where for each version of Unicode that is released, will need to be reviewed for new exceptions the Terminal needs to implement rather than being transparently taken care of by Uniscribe. However, if the performance benefits are deemed worth the non-conformance with OpenType / Uniscribe for certain scenarios -- it should be documented exactly what features are missing from the Windows Terminal's custom implementation of OpenType and Uniscribe so users can make an informed decision if AtlasEngine is best for their use case or if their use case demands higher correctness. [1] https://unicode.org/reports/tr29/ |
Beta Was this translation helpful? Give feedback.
-
Can you share what version of the Terminal you're using, and your settings.json file? We're pretty sure this is supposed to work 😄 |
Beta Was this translation helpful? Give feedback.
-
(For the rest of your notes that don't pertain specifically to Numderline but to Unicode clustering, shaping, and our compliance as a whole, thanks for writing them up so concisely! We'll need to wait until @lhecker is back from his time off before we have a comprehensive response though.) |
Beta Was this translation helpful? Give feedback.
-
Sorry I wasn't able to come up with a good minimal repro yet of the purest form of what I wanted to demonstrate, as I was running into other bugs (design choices?). Playing with this more, I think that the current implementation seems to be turning real lines from the file into psuedo-lines, that break on N bytes of data instead of N glyphs of data (or some measured width). This completely breaks the rendering in the middle of glyphs. (regardless of AtlasEngine) 1 - Line Break on Bytes1..100 | ForEach-Object { Write-Host "a" -NoNewLine }; 1..10 | ForEach-Object { Write-Host "`u{0364}`u{0365}" -NoNewLine } ActualExpected
Breaking the combining glyph is definitely undesirable, but it's stacking all the combining marks on top of each other is due to some flag passed to Uniscribe, perhaps designed to constrain line height, but Word shows it could be changed to chrome style rendering with anti-aliased text + transparency. 2 - No Unicode Line BreakingThe next related issue is that the line breaks do not use anything close to the Unicode Line Breaking rules. So this happens: 1..10 | ForEach-Object { Write-Host "111000" -NoNewLine }; Write-Host " " -NoNewLine; 1..10 | ForEach-Object { Write-Host "111000" -NoNewLine ActualExpectedI think this could be more arguably justified, or perhaps given as an option to users to use proper Unicode line breaking or not. But the main issue which becomes obvious is that the underlines no longer underline the expected sets of 3 digits. 3 - Irreversible window resizesAlso the way lines attempt to be recombined when resizing the window feels quite janky if the user has no scrollback buffer, because as the user widens and narrows the window, they lose their data, as the resize operation is not isomorphic. To me, the notion of a true logical line understood by the system would feel more natural. The user has no way to guarantee they can scroll back, since they might not be able to control if 1 long line consumes their whole 1000 lines of scrollback buffer. 4 - Scoped Control CharactersI tested with the RTL override and it didn't seem supported at all by the terminal. But these seem like a pretty scary / open question. Write-Host "`u{202E}ABC`u{202C}_`u{202E}" -NoNewLine; 1..100 | ForEach-Object { Write-Host "ABC" -NoNewLine } ActualExpectedU+206E (National Digit Shapes) also seems to be ignored. Requires changing Control Panel -> Regional Format -> Arabic (Saudi Arabia). Write-Host "1234567890 `u{206E}1234567890" ActualExpected5 -Wide Spanning OpenType lookup tablesSo related to the first repro example I gave doesn't hold up as the assumption I wrote about the engine doesn't seem to be true at the moment (but perhaps that's the next step in the works?), I think that the terminal currently gets lucky that it has not yet implemented #1860 with support for infinitely wide lines, because that will open the full extend of this bug, assuming whole lines need to be shaped all at once and will sometimes be too big to all be in memory/processed at once. But I think it is reasonable for users to expect paragraphs of text they output on the terminal to still support contextual alternates (like Numderline) and other shaping within their paragraph (long line in this case) without the terminal injecting its own formatting / breaking the user's formatting. I'd argue this is a quite common occurrence, more than the fist glance 1 of 80 characters is a forced line break make it a rate 1.25% occurrence per line, as users with small or resized / actively resizing windows they will go through every size and hit all of those edge cases. |
Beta Was this translation helpful? Give feedback.
-
Sorry for responding late. I forgot to set myself a reminder for responding to this. 1 - Line Break on BytesThat issue is fixed in the latest AtlasEngine version in Windows Terminal 1.18 and later: 2 - No Unicode Line BreakingThat is unfortunately something we do intentionally. Terminals traditionally do not adhere to many parts of the Unicode spec since they were designed before Unicode was a thing. For instance, 3 - Irreversible window resizesWe're tracking this at #15976. It'll unfortunately take a while to get this addressed. 4 - Scoped Control CharactersRTL overrides are tracked in #12711. I'll look into the U+206E support. 5 - Wide Spanning OpenType lookup tablesI'm not entirely sure I understand you there... Are you saying we should shape entire lines of text at time, without the terminal breaking them into lines to fit them into the viewport width? (This might be difficult to achieve due to the previous "No Unicode Line Breaking" point.) All in all, none of the above are related to AtlasEngine specifically yet, apart from the U+206E support. We could open smaller, more specific issues instead. |
Beta Was this translation helpful? Give feedback.
-
Windows Terminal version
No response
Windows build number
No response
Other Software
No response
Steps to reproduce
Expected Behavior
The underlines properly display under digits in the thousands places as configured by the font.
Actual Behavior
The text displays as 123456 with no underlines.
Beta Was this translation helpful? Give feedback.
All reactions