If you replace a junior with #LLM and make the senior review output, the reviewer is now scanning for rare but catastrophic errors scattered across a much larger output surface due to LLM "productivity."
That's a cognitively brutal task.
Humans are terrible at sustained vigilance for rare events in high-volume streams. Aviation, nuclear, radiology all have extensive literature on exactly this failure mode.
I propose any productivity gains will be consumed by false negative review failures.
Yeap. I've been thinking about this as well. Maybe especially because of my #adhd (which - let's face it - is not rare in the industry) I'm keenly aware that scanning walls of existing code for bugs and existing test suites for holes is way, way harder and more error prone than writing the same code/test cases yourself. Which is why we ask devs to produce small, scoped, change requests and try and spread the review load.