GitHub - llbarbosas/cute: Tokenizer/lexer generator for Deno like moo

Tokenizer/lexer generator for Deno

About

Cute is a fast tokenizer/lexer for TypeScript. This aims to help you recognize and classify patterns in texts, and also provides tools to parse the results.

This is heavily inspired by moo, and also applies the ES6 sticky flag on a single compiled RegExp for optimized performance.

Usage

Using cute.compile, you can create your lexer.

import cute from "https://deno.land/x/cute/mod.ts";

const tokenizer = cute.compile({
  plus: "+",
  times: "*",
  number: {
    match: /\d+/,
    value: (s) => Number(s),
  },
  whitespace: { match: / +/, ignore: true },
});

Tokenizers are simple functions that return iterators:

const results = tokenizer("1+2*3*4+5");

Like any iterable, you can get your tokens in different ways:

const nextToken = results.next();

// or
for (const token of results) {
  console.log(token);
}

// or
Array.from(results).map((token) => token.type); // [...results]

You can use a tokenizer with different rules sets (or states) in more complex scenarios.

// JS-style string interpolation
const tokenizer = cute.states({
  main: {
    strstart: { match: "`", push: "lit" },
    ident: /\w+/,
    lbrace: { match: "{", push: "main" },
    rbrace: { match: "}", pop: 1 },
    colon: ":",
    space: { match: /\s+/ },
  },
  lit: {
    interp: { match: "${", push: "main" },
    escape: /\\./,
    strend: { match: "`", pop: 1 },
    const: { match: /(?:[^$`]|\$(?!\{))+/ },
  },
});

const results = tokenizer("`a${{c: d}}e`");

const types = Array.from(results).map((token) => token.type);

console.log(types); // strstart const interp lbrace ident colon space ident rbrace rbrace const strend

You can see the full documentation on doc.deno.land

Tips

Don't forget to use non-greedy quantifiers

// DON'T
const tokenizer = cute.compile({
  string: /".*"/, // greedy quantifier *
  // ...
});

const results = tokenizer('"foo" "bar"');
results.next(); // -> { type: 'string', value: '"foo" "bar"' }

// DO

const tokenizer = cute.compile({
  string: /".*?"/, // non-greedy quantifier *?
  // ...
});

const results = tokenizer('"foo" "bar"');
results.next(); // -> { type: 'string', value: 'foo' }
results.next(); // -> { type: 'space', value: ' ' }
results.next(); // -> { type: 'string', value: 'bar' }

Value vs. Text

const tokenizer = cute.compile({
  ws: /[ \t]+/,
  string: {
    match: /"(?:\\["\\]|[^\n"\\])*"/,
    value: (s) => s.slice(1, -1), // function to transform token.value
  },
  literals: {
    match: /`(?:\\["\\]|[^\n"\\])*`/,
  },
});

const results = tokenizer('"hello"`world`');
results.next(); // { value: 'hello', text: '"hello"' }
results.next(); // { value: '`world`', text: '`world`' }

Testing

You can test it by running deno test

Licence

MIT Licence. See the file LICENSE for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github		.github
src		src
tests		tests
LICENSE		LICENSE
README.md		README.md
mod.ts		mod.ts
tsconfig.json		tsconfig.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tokenizer/lexer generator for Deno

About

Usage

Tips

Don't forget to use non-greedy quantifiers

Value vs. Text

Testing

Licence

About

Releases

Packages

Languages

License

llbarbosas/cute

Folders and files

Latest commit

History

Repository files navigation

Tokenizer/lexer generator for Deno

About

Usage

Tips

Don't forget to use non-greedy quantifiers

Value vs. Text

Testing

Licence

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages