12 Commits

Author SHA1 Message Date
Cosimo Lupo
1765ed772a [unicodedata] add script_name and script_code to __all__
and cast to str to avoid error with import * in python2.7

TypeError: Item in from list'' must be str, not unicode
2017-11-22 18:37:14 +01:00
Cosimo Lupo
99ea0a3986 [unicodedata] add script_code func and 'default' fallback arg
`script_code` does the reverse of `script_name`: it takes a long
script name and returns a 4-letter script code.

Both `script_name` and `script_code` raise KeyError by default,
but can optionally return a default value instead.
2017-11-22 17:46:44 +01:00
Cosimo Lupo
afd2490a6c [unicodedata] add script_name function
Converts four-letter script codes to human-readable long names
2017-11-22 17:41:23 +01:00
Cosimo Lupo
012688ac20 [Tests] adjust unicodedata_test to expect short script codes 2017-11-22 17:41:23 +01:00
Cosimo Lupo
54fa00499e [Scripts] use short codes, add NAMES dict with aliases 2017-11-22 17:41:23 +01:00
Cosimo Lupo
697b8d9af5 [unicodedata] add block and script_extension functions 2017-11-20 18:16:02 +01:00
Cosimo Lupo
8b50ed56d9 add auto-generated Blocks.py and ScriptsExtensions.py 2017-11-20 18:15:09 +01:00
Cosimo Lupo
1ed78b12f5 [unicodedata] rename scripts.py to Scripts.py
let's use the same names as the original UCD data files for simplicity
2017-11-20 17:37:45 +01:00
Cosimo Lupo
b53b878bdc [scripts] update auto-generated module
it now contains two list, one for the ranges and another for the script names
2017-11-20 13:38:49 +01:00
Cosimo Lupo
3442da1529 [unicodedata] use bisect.bisect_right function
CPython comes with a fast C implementation of bisect module.
This gives 4 to 5 times speed-ups over my pure-python version.
2017-11-20 13:30:17 +01:00
Cosimo Lupo
52d6131525 [unicodedata] add new module and 'script' function
The new `fontTools.unicodedata` module re-exports all the public
functions from the built-in `unicodedata` module, and also adds
additional functions.

The `script` function takes a unicode character and returns the
script name as defined in the UCD "Script.txt" data file.

It's implemented as a simple binary search, plus a memoizing
decorator that caches the results to avoid search the same
character more than once.

The unicodedata2 backport is imported if present, otherwise
the unicodedata built-in is used.
2017-11-17 19:17:17 +00:00
Cosimo Lupo
96dafe4afc [unicodedata] add auto-generated 'scripts' module
containing the script ranges and names from Scripts.txt
2017-11-17 19:16:45 +00:00