If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."

That's a cognitively brutal task.

Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.

I propose any productivity gains will be consumed by false negative review failures.