Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support a built-in type for well-formed strings #60765

Open
5 tasks done
hudlow opened this issue Dec 15, 2024 · 1 comment
Open
5 tasks done

Support a built-in type for well-formed strings #60765

hudlow opened this issue Dec 15, 2024 · 1 comment
Labels
Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript

Comments

@hudlow
Copy link

hudlow commented Dec 15, 2024

πŸ” Search Terms

"Unicode", "well-formed Unicode", "valid Unicode", "lone surrogates", ""UTF-16", "UTF-8", "isWellFormed()", "toWellFormed()"

βœ… Viability Checklist

⭐ Suggestion

ES2024 now has String.isWellFormed() and String.toWellFormed(), which are supported in TypeScript's ES2024 type definitions.

But significant value from these functions is not realized in TypeScript because of the lack of a well-formed string type.

What I'd like to see is a "well-formed string" type (itself a super-type of String) for which isWellFormed() serves as a type guard and toWellFormed() (as well as functions like TextDecoder.decode()) return the well-formed string type.

Additionally, string literals could be determined to be of the well-formed string type at compile time.

This way TypeScript developers could get type safety for scenarios where strings need to be guaranteed to be well-formed.

πŸ“ƒ Motivating Example

I'm working on a TypeScript implementation of CEL which requires passing well-formed UTF-8 strings into an evaluation environment. If I want to bridge TypeScript's type safety to CEL's type safety, I'll need a well-formed string type in TypeScript.

πŸ’» Use Cases

I can do something like this in my project:

interface WellFormedString extends String {
  __brand: "WellFormed";
}

interface String {
  isWellFormed(): this is WellFormedString;
  toWellFormed(): WellFormedString;
  toUpperCase(): this extends WellFormedString ? WellFormedString : string;
  toLowerCase(): this extends WellFormedString ? WellFormedString : string;
}

interface TextDecoder {
  decode(input?: AllowSharedBufferSource, options?: TextDecodeOptions): WellFormedString;
}

function useWellFormedString(a: WellFormedString) {
  // ...
}

// good -- no error
useWellFormedString("hello".toWellFormed());

// good -- no error
useWellFormedString("hello".toWellFormed().toUpperCase());

// good -- no error
const h = "hello";
if (h.isWellFormed()) {
  useWellFormedString(h); 
}

// good -- no error
// (the decoder coerces a lone "WTF-8" surrogate to "\ufffd\ufffd\ufffd")
useWellFormedString(new TextDecoder().decode(new Uint8Array([0xed, 0xba, 0xad])))

// good -- error
// (malformed string with lone UTF-16 surrogate)
useWellFormedString("\udead");

// bad -- error
useWellFormedString("hello");

// bad -- error
useWellFormedString("hello" as WellFormedString);

But there are some significant disadvantages here:

  1. Well-formed string literals are not recognized as well-formed.
  2. Uses a branding hack.
  3. The compiler complains about casting (maybe this is fixable, but I don't know how).
@RyanCavanaugh RyanCavanaugh added Suggestion An idea for TypeScript Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature labels Dec 16, 2024
@timostamm
Copy link

I'm not sure how well this fits into the existing world, but this would be fantastic to have for systems that work with UTF-8. It's prohibitively expensive to let every software component check wellformed-ness of its input.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Awaiting More Feedback This means we'd like to hear from more people who would be helped by this feature Suggestion An idea for TypeScript
Projects
None yet
Development

No branches or pull requests

3 participants