I think the severity of the issue may be exaggarated simply because the only way to definitely test something like this (without hundreds of samples) is to raise the stat by an arbitarily large amount; while this is fine for testing we have to remember to take into account the massively inflated stat when analyzing the results.