-
-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Result difference between uniseg.GraphemeClusterCount and Android android.icu.text.BreakIterator #60
Comments
what version of uniseg and android.icu do you use? |
So the codepoints you posted are assigned the following grapheme break properties according to this table:
The way I see it, the following rules apply:
So basically, I have no experience with android.icu so I can't comment on its functionality. |
uniseg version is 0.4.7 |
@shogo82148 I searched and found that Android may use the official java implementation of icu. I wonder if there is any difference between the implementation of uniseg and the standard icu?
In addition, I found that Android seems consider multiple consecutive "\n" as one grapheme cluster, while uniseg will consider it as multiple |
Hello!
I found that in the following text, the results of the two libraries are different. I'm not sure which one is correct, could you help to confirm if the result is expected with uniseg?
Thank you!
Text:
क्ष
it's unicode codepoint is
"\u0915\u094d\u200d\u0937"
uniseg.GraphemeClusterCount Result:
2
android.icu.text.BreakIterator Result:
1
The text was updated successfully, but these errors were encountered: