Commit Graph

11 Commits

Author SHA1 Message Date
Shadowfacts c26e3ee53c Fix lowercase hex digit character references not being tokenized
Fixes shadowfacts/Tusker#459
2024-03-17 12:08:56 -04:00
Shadowfacts aed61d84d3 Fix stack overflow when whitespace appears after attribute name and before tag-closing > 2024-02-06 18:46:56 -05:00
Shadowfacts 38b1d2949b Return temporary buffer as a .characterSequence 2023-12-23 11:49:43 -05:00
Shadowfacts 2f18ad3cf4 Process runs of unmodified characters as characterSequence tokens 2023-11-28 20:58:01 -05:00
Shadowfacts f7f35e09f7 Use Unicode.Scalar instead of Character
All the chars we care about are a single scalar, so this avoids spending
time on the grapheme breaking algorithm.
2023-11-28 11:56:56 -05:00
Shadowfacts f412369cf7 Don't use enum with associated values for current token
They prevent in-place modification, resulting in a bunch of extra copies
2023-11-28 10:36:04 -05:00
Shadowfacts 31bd174a69 Use loops instead of recursion in hot path
Small but measurable perf win
2023-11-27 00:04:10 -05:00
Shadowfacts 29a065049e Ditch InlineArray3
Turns out Array is still faster
2023-11-26 22:35:10 -05:00
Shadowfacts 134803b72d Faster tokenizing for named character references 2023-11-26 18:26:22 -05:00
Shadowfacts e22f778f8f Make things public 2023-11-25 22:58:00 -05:00
Shadowfacts a4d791a995 Add tokenizer 2023-11-24 15:18:37 -05:00