Cute is a fast tokenizer/lexer for TypeScript. This aims to help you recognize and classify patterns in texts, and also provides tools to parse the results.
This is heavily inspired by moo, and also applies the ES6 sticky flag on a single compiled RegExp for optimized performance.
Using cute.compile, you can create your lexer.
import cute from "https://deno.land/x/cute/mod.ts";
const tokenizer = cute.compile({
plus: "+",
times: "*",
number: {
match: /\d+/,
value: (s) => Number(s),
},
whitespace: { match: / +/, ignore: true },
});
Tokenizers are simple functions that return iterators:
const results = tokenizer("1+2*3*4+5");
Like any iterable, you can get your tokens in different ways:
const nextToken = results.next();
// or
for (const token of results) {
console.log(token);
}
// or
Array.from(results).map((token) => token.type); // [...results]
You can use a tokenizer with different rules sets (or states) in more complex scenarios.
// JS-style string interpolation
const tokenizer = cute.states({
main: {
strstart: { match: "`", push: "lit" },
ident: /\w+/,
lbrace: { match: "{", push: "main" },
rbrace: { match: "}", pop: 1 },
colon: ":",
space: { match: /\s+/ },
},
lit: {
interp: { match: "${", push: "main" },
escape: /\\./,
strend: { match: "`", pop: 1 },
const: { match: /(?:[^$`]|\$(?!\{))+/ },
},
});
const results = tokenizer("`a${{c: d}}e`");
const types = Array.from(results).map((token) => token.type);
console.log(types); // strstart const interp lbrace ident colon space ident rbrace rbrace const strend
You can see the full documentation on doc.deno.land
// DON'T
const tokenizer = cute.compile({
string: /".*"/, // greedy quantifier *
// ...
});
const results = tokenizer('"foo" "bar"');
results.next(); // -> { type: 'string', value: '"foo" "bar"' }
// DO
const tokenizer = cute.compile({
string: /".*?"/, // non-greedy quantifier *?
// ...
});
const results = tokenizer('"foo" "bar"');
results.next(); // -> { type: 'string', value: 'foo' }
results.next(); // -> { type: 'space', value: ' ' }
results.next(); // -> { type: 'string', value: 'bar' }
const tokenizer = cute.compile({
ws: /[ \t]+/,
string: {
match: /"(?:\\["\\]|[^\n"\\])*"/,
value: (s) => s.slice(1, -1), // function to transform token.value
},
literals: {
match: /`(?:\\["\\]|[^\n"\\])*`/,
},
});
const results = tokenizer('"hello"`world`');
results.next(); // { value: 'hello', text: '"hello"' }
results.next(); // { value: '`world`', text: '`world`' }
You can test it by running deno test
MIT Licence. See the file LICENSE for more details.