This replaces code (pseudo-code): spin_lock(); update_child_dag_nodes(); schedule_new_nodes(); spin_unlock(); with: update_child_dag_nodes_with_atomic_ops(); schedule_new_nodes(); The reason for this is that scheduling new nodes implies mutex lock, and having spin around it is a bad idea. Alternatives could have been to use spinlock around child nodes update only, but that would either imply having either per-node spin-lock or using array to put nodes ready for update to an array. Didn't like an alternatives, using atomic operations makes code much easier to follow, keeps data-flow on cpu nice. Same atomic ops might be used in other performance-critical areas later. Using atomic ops implementation from jemalloc project.