You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Some functions are improperly treating strings that contain higher-plane Unicode characters (i.e., those with code points at U+10000 and higher, including most emoji) as if each of those characters were two characters long and unintelligible.
Split("🦓🦊🐺𐊀","")
→ Table({Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"})
(Should be Table({Value:"🦓"},{Value:"🦊"},{Value:"🐺"},{Value:"𐊀"}))
Left("xyz🅰🅱🅲", 6)
→ "xyz🅰�" (note the ending character which is apparently half of the 🅱 emoji; this should just evaluate to the same string as was passed in)
I believe something is going awry with how PowerFx is handling characters wider than 16 bits, and strings aren't being kept in a consistent translation format, which is leading to these errors.
The text was updated successfully, but these errors were encountered:
In a UTF-16 string, which is what JS uses internally, it does take two 16-bit "characters" to make a single code point from a higher plane. It can be even more than that; emoji like 👩🏾💻 which are formed from a base character, a skin-tone, a zero-width joiner, and another emoji, are seven whole 16-bit "characters" wide (the base, skin-tone, and following emoji count as 2 each, and the ZWJ is the additional 1), but they consist of 4 real code points and display as a single unit. I'm arguing that the intuitive representation here should be what PowerFx uses; both 🌻 and 👩🏾💻 should be treated as 1 character each, both to conform to what a user would expect functions like Split or Left to do and to prevent characters from being improperly split.
Some functions are improperly treating strings that contain higher-plane Unicode characters (i.e., those with code points at U+10000 and higher, including most emoji) as if each of those characters were two characters long and unintelligible.
Split("🦓🦊🐺𐊀","")
→
Table({Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"},{Value:"�"})
(Should be
Table({Value:"🦓"},{Value:"🦊"},{Value:"🐺"},{Value:"𐊀"})
)Left("xyz🅰🅱🅲", 6)
→
"xyz🅰�"
(note the ending character which is apparently half of the 🅱 emoji; this should just evaluate to the same string as was passed in)Similarly,
Right("🅰🅱🅲def", 6)
→"�🅲def"
, andMid("🅰🅱🅲def", 2, 4)
→"�🅱�"
.I believe something is going awry with how PowerFx is handling characters wider than 16 bits, and strings aren't being kept in a consistent translation format, which is leading to these errors.
The text was updated successfully, but these errors were encountered: