Two small changes that significantly speed up subsetting of large fonts
such as Noto Sans CJK:
1. When emptying a charstring, simply empty its program rather than
attempting to decompile it first. (Only relevant when retaining GIDs.)
2. When reindexing charstrings, swap an accidentally-quadratic
implementation for one that is linear in the number of retained glyphs.
* Add default auto doc options
* Ensure all references are unique
* Use anonymous links to avoid duplicate references
* Remove default options, fix wrong module name
* Don’t index repeated class
* Remove repeated classes included through automodule
* Fix warnings
* We don’t use our own static directory
* Correctly format XML in docs
* Fix indentation
* Fix overline
* Bring TOC to top
* Fix definition list
* Offset definition lists and examples
* Fix erroneous markup
* Fix markup
* Already included in automodule
* Fix args markup
* Correct markup for example
* Don’t reindex repeated module
* Correct XML code block markup
* Fix markup errors, change example to doctest
* Correct list markup
* Make ttx docstring both valid RST and valid help output
* Various other boring markup fixes
* Fix example indenting
* Make docstring valid RST and valid help output
* Mock import for reportlab
* It’s ok if manual links don’t appear in toctrees
* Oops typo, I guess doctests are useful
when decompiled from binary, the SVG.docList contains (unicode) strings, decoded as UTF-8. lxml fromstring accepts either bytes or str, but when given str with the xml header declaring an explicit encoding, it rejects them (since the header is lying). So we encode to bytes before calling fromstring in case the SVG contains an explicit encoding (UTF-8 is the only one allowed anyway). When serializing to XML with tostring, we similarly decode to str as UTF-8. Not only to match SVG decompile (which gives us str), but if we didn't do that, then attempting to dump to XML would fail, because XMLWriter.writecdata expects str, not bytes.
With this I can finally follow xlink:href and url(#...) sort of
references within the SVG doc and subset the elements accordingly so
that only those that are reachable from the initial set of glyph
elements are kept.
support for namespaces and xpath is insufficient in built-in ElementTree; supporting both lxml and ElementTree is too complicated, let's simply require lxml to be able to subset SVG for now
this drops svg document records when they no longer intersect the subset. It keeps them in their entirety (for now) when they still intersect the subset, only renaming all the id='glyphXXX' to point to the new glyph indices after subsetting. Unused, unreferenced elements are not pruned yet.
relying on ClipList.compile to drop unused clips based on updated glyphOrder won't work when font is loaded lazily (default for subsetter), because ClipList gets decompiled too late (after glyphOrder has already been modified) and this produces warnings about missing glyphIDs.
Better to make the subsetter explicilty prune unused clips.
This is a breaking change (but the COLRv1 API was already marked as unstable and subject to change)
The changes in this PR are meant to match the changes from the COLRv1 draft spec at:
https://github.com/googlefonts/colr-gradients-spec/pull/302
This does two things:
1. Intersect subsetter glyphset with the table's Coverage before
passing to ClassDef1 for subsetting. Anything that doesn't
get past Coverage wouldn't ever get to ClassDef1,
2. Never reuse class0 of ClassDef2. There's unspoken assumption
that ClassDef2's class0 is never used for actual kerning, since
that's the unbounded "every other glyph" class. Previously our
ClassDef subsetter was reusing class0 if "every other glyph"
happened to become empty because of the subset glyphset. Don't
do that for PairPos's ClassDef2. As a result of this assumption,
don't keep a PairPosClass2 subtable if only ClassDef2's class0
survived subsetting.
Would be good to add tests for both.
Related to https://github.com/harfbuzz/harfbuzz/issues/2703
* Replaced all from ...py23 import * with explicit name imports, or removed completely when possible.
* Replaced tounicode() with tostr()
* Changed all BytesIO ans StringIO imports to from io import ..., replaced all UnicodeIO with StringIO.
* Replaced all unichr() with chr()
* Misc minor tweaks and fixes
In COLR.closure_glyphs augment the subset with the glyphs rechable from the COLRv1 base glyphs already in the subset.
In COLR.subset_glyphs, subset and rebuild LayerV1List and BaseGlyphV1List with the base glyphs to keep. Drop COLR if emptied
Allow specifying which LangSys tags to accept for each script, using
script.lang form. For example:
--layout-scripts=arab.dflt,arab.URD,latn
To keep DefaultLangSys and “URD” language for “arab” script, and all
languages for “latn” script.
While False does get the job done, the value is not always treated as
a boolean. It is overloaded so that 1 or greater is True, but more than
1 has a different meaning than 1. Hence usage should always be as an
integer as documented so that it's clear(er) this isn't just an on/off
toggle.