This commit significantly speeds up many of the attribute nodes when multiple threads are available in linear situations when parallelism cannot be achieved elsewhere. See the differential for a table of timing comparisons tested on a Ryzen 3700x. For an attribute with 4 million elements, the nodes were about 3 to 9 times faster. The changes are not exhaustive, other nodes could still be parallelized in the future. Also, it would be possible to further optimize the grain size in `parallel_for`, but I'd rather make sure it isn't too small. I tested some different values, but also relied on intuition-- increasing grain size for less complex operations and vice versa. Differential Revision: https://developer.blender.org/D11139