The method that was being used to read the header from the input was
inadvertently dropping the first non-header line on the floor; although
this happens to be okay in some cases (where there is an empty line
after the header) in the case of newer versions of the
ScriptExtensions.txt file, this was causing the generated code to be
missing the first entry, for U+00B7 (MIDDLE DOT)
Without explicit annotation, some type checkers infer that the type of
the 'default' argument can only be type[KeyError]. This was the case
in unicodedata_test.py, where pyright disallowed "LTR". This commit adds
annotations to avoid this, fixing the issue in the test (and external
code dependent on the API).
Some of the other functions in this file have the same semantics and
suffer from the same type error, and so this fix could also be extended
to them as usage requires.
* Replaced all from ...py23 import * with explicit name imports, or removed completely when possible.
* Replaced tounicode() with tostr()
* Changed all BytesIO ans StringIO imports to from io import ..., replaced all UnicodeIO with StringIO.
* Replaced all unichr() with chr()
* Misc minor tweaks and fixes
same as harfbuzz hb_script_get_horizontal_direction.
We just hard-code the set of RTL script here, as it doesn't change often anyway.
The function is just syntactic sugar as it all does is basically looking up the
constant RTL_SCRIPTS set.
It's nice to have it here in a central place alongside 'script', 'script_name', etc.
`script_code` does the reverse of `script_name`: it takes a long
script name and returns a 4-letter script code.
Both `script_name` and `script_code` raise KeyError by default,
but can optionally return a default value instead.
The new `fontTools.unicodedata` module re-exports all the public
functions from the built-in `unicodedata` module, and also adds
additional functions.
The `script` function takes a unicode character and returns the
script name as defined in the UCD "Script.txt" data file.
It's implemented as a simple binary search, plus a memoizing
decorator that caches the results to avoid search the same
character more than once.
The unicodedata2 backport is imported if present, otherwise
the unicodedata built-in is used.