...by calling split_cubic_into_three() twice. Gives another 5..9% speedup. The thing is, while higher n values are lower-frequency, the savings are also bigger. So the two offset out.
...by calling split_cubic_into_three() twice. Gives another 5..9% speedup. The thing is, while higher n values are lower-frequency, the savings are also bigger. So the two offset out.