Recent failures requiring investigation have exposed some shortcomings that this addresses: - When creating the diff image for offline comparison, use a higher threshold to prevent idiff from printing more output which will often contradict the primary failure output just above it (very confusing) - For metadata failures, make sure these get printed so it's obvious what kind of failure we're dealing with Pull Request: blender/blender#107058