Commit Graph

4 Commits

Author SHA1 Message Date
Shadowfacts 2f18ad3cf4 Process runs of unmodified characters as characterSequence tokens 2023-11-28 20:58:01 -05:00
Shadowfacts f7f35e09f7 Use Unicode.Scalar instead of Character
All the chars we care about are a single scalar, so this avoids spending
time on the grapheme breaking algorithm.
2023-11-28 11:56:56 -05:00
Shadowfacts 134803b72d Faster tokenizing for named character references 2023-11-26 18:26:22 -05:00
Shadowfacts a4d791a995 Add tokenizer 2023-11-24 15:18:37 -05:00