This is based on bungeman's https://github.com/fonttools/fonttools/pull/2627
Previously, an entire `SVG ` table would be marked as compressed if any
of the decoded SVG documents in it were compressed. Then on encoding all
SVG documents would be considered for compression. The XML format had no
means to indicate if compression was desired.
Instead, mark each svgDoc with its compression status. When decoding
mark the svgDoc as compressed if the data was compressed. When encoding
try to compress the svgDoc if it is marked as compressed. In the XML
format the data itself is always uncompressed, but allow an optional
`compressed` boolean attribute (defaults to false) to indicate the
svgDoc should be compressed when encoded.
We also try to make sure that older code that relies on docList containing
sequences of three items (doc, startGID, endGID) will continue to work
without modification.
when decompiled from binary, the SVG.docList contains (unicode) strings, decoded as UTF-8. lxml fromstring accepts either bytes or str, but when given str with the xml header declaring an explicit encoding, it rejects them (since the header is lying). So we encode to bytes before calling fromstring in case the SVG contains an explicit encoding (UTF-8 is the only one allowed anyway). When serializing to XML with tostring, we similarly decode to str as UTF-8. Not only to match SVG decompile (which gives us str), but if we didn't do that, then attempting to dump to XML would fail, because XMLWriter.writecdata expects str, not bytes.
With this I can finally follow xlink:href and url(#...) sort of
references within the SVG doc and subset the elements accordingly so
that only those that are reachable from the initial set of glyph
elements are kept.
support for namespaces and xpath is insufficient in built-in ElementTree; supporting both lxml and ElementTree is too complicated, let's simply require lxml to be able to subset SVG for now
this drops svg document records when they no longer intersect the subset. It keeps them in their entirety (for now) when they still intersect the subset, only renaming all the id='glyphXXX' to point to the new glyph indices after subsetting. Unused, unreferenced elements are not pruned yet.