The initial subdivision context counting ended up using unnecessarily
complicated logic to count the number of final vertices. In a first pass
it added vertices for every coarse edge. Then it added the same number
of vertices for every loose edge. That adds up to the same number of
vertices for the edges, so the separation is redundant and can be
replaced with a single multiplication.
In a test of a mesh with 2 million faces and 3 million vertices, this
saved 2.8ms (though the whole subdivision process takes around 700ms).