Thanks for the explanation. I agree that when a shooter relies on consistency within a lot it doesn't matter a whit whether they think of "I got this with two boxes". In this situation they're not really separate anyway. When two boxes from the same lot perform differently it's an entirely different matter. Hence the original question of this thread.
Well, every single shot performs differently. There's some randomness to every single shot. Hard to say how the ammo shoots if you only fire one shot. A larger sample size will give you a better idea. A larger sample size reduces the error in your estimation of the lot's overall performance. One box shooting better or worse than the next isn't really looking at two separate samples. They're not the isolated islands that you seem to be saying they are. You're still looking at the same lot. Looking at them as if they are isolated islands unto themselves only muddies up the water more. They are in fact part of the same pool, so if you treat them as such you'll get a clearer view. Saying something like "Box one had an SD of 7.17 and box two had an SD of 6.54, so I wonder why box two performed so differently." doesn't make as much sense as it might seem to at first glance. With a sample size of 50 you've got a 90% confidence interval of (0.883, 1.119), but with a sample size of 100 that changes to (0.918, 1.083). Think of those as your error windows. With a sample size of 50 your error window is from 0.883 below to 1.119 above, but going to a sample size of 100 reduces that to 0.918 below to 1.083 above. So you could be off by +/- 12 % when using a sample size of 50, but could be off by +/- 8% when using a sample size of 100. The larger your sample size the more confident you can be that your estimation of its performance is correct.
With a sample size of 50, looking at those two boxes as separate entities:
7.17 +/- 12% gives you a range from 6.33 to 8.02.
6.54 +/- 12% gives you a range from 5.77 to 7.32.
You can't really say they performed differently because each of your calculated values falls within the other's error range anyway.
There's overlap. And that overlap excludes the possibility of them being different. "But 7.17 is worse than 6.54, so they must be different." The amount of uncertainty involved indicates both boxes are probably similar, at least as far as you can tell given the chosen sample size. There must be an absence of overlap in order to call them different with any certainty.
With a sample size of 100 the combined SD changes to 6.83, and the confidence interval changes:
6.83 +/- 8% gives you a range from 6.27 to 7.40.
And in this case, an SD of 6.83 with the smaller confidence interval will tell you a better story concerning how consistent that ammo is than you get from looking at the two boxes separately. Looking at them separately involves accepting more error, more uncertainty, which is what that confidence interval tells you. That's how wide your error window is. The more you test, the smaller that interval gets. And the smaller that interval gets, the more confident you can be that your answer from your sample size will apply to the entire pool. And this is why I grouped all 100 shots from each lot together when I was comparing them, both for group size and for muzzle velocities. Spending $260 for a brick of it to test made sense before spending $2600 on a case of the stuff. And that $260 was enough to want to make sure that my test told me as much as it possibly could, so that I could say with as much confidence as possible that the answer I was looking at was as accurate as it could be. It was going to be a considerable purchase, and I wanted to make sure that if I was going to spend that much money that I was spending it as wisely as possible. I'd rather know the actual answer is probably somewhere between 6.27 and 7.40 than somewhere between 5.77 and 8.02.
Taking another look at your RWS numbers, with one rifle you have a sample size of 396 shots. Combining those SDs into one gives an overall SD of 9.544, and a sample size of 396 gives you a pretty nice confidence interval of (0.958833, 1.041577).
9.544 * 0.958833 = 9.1511
9.544 * 1.041577 = 9.9408
So you can be very confident that your entire lot of ammo shot from that gun has an SD in the range of 9.1511 to 9.9408, for the entire lot, with an average velocity of 1099.1515 fps, based on your large 396-shot sample size.