The lexer needs to always return braces and brackets separately so that
the parser can decide if they are part of some construct like an array
subscript or a variable expansion. This means that there was no point
in moving bare-word tokenization into the external scanner. I've moved
it back into the normal scanner.
The tricky part is how to deal with the separate '}' and ']' tokens
in the case where they are *not* part of a subscript or an expansion.
For example, in code like `echo {hi}`, the syntax tree should still
clearly indicate that only *one* argument is being passed to echo.
For now, we achieve this by grouping the '{', hi, and '}' tokens into
a single `concatenation` node, which is a bit odd, but it's the best
we can do.
I've moved tokenization of bare words into the external scanner. This
way we can keep the grammar simple, but support some fancy rules that
I've inferred from experimenting with bash:
- Only allow '}' inside of a bare word if '}' isn't a valid lookahead
token (i.e. we're not inside of a variable expansion).
- Only allow ']' at the *start* of a bare word if neither ']' nor ']]'
are valid lookahead tokens (i.e. we're not inside of a square bracket
command or an array subscript).
- Parentheses seem to never be allowed in bare words. You have to quote
them.
For alphanumeric words, I fall through to the normal scanner so that it
can continue to distinguish reserved words from other words.
Fixes#5