It seems like sets are hashed differently in CPython and PyPy. Because of this, the returned list of class sets may have a different sort order (within each class size) between the two implementations. For now, I make the test pass on both CPython and PyPy by casting the returned list of sets into a set of (frozen) sets, and asserting that its *content* is correct, without considering the *order* of the sets in the list.