TL; DR - Shooting three shot groups is fine, Anna Nicole married for love, and other lies you tell yourself at night
One question I have for you in from your main post in your illustration is that you advised not to do the 2 shot method early in load dev to eliminate the clearly least useful loads when loading for a hunting rifle as opposed to a heavier barreled or competition type rifle. Would you be willing to expound on that? Is it just due to the lower expected consistency of a run-of-the-mill hunting rifle to where such a test could give misleading results?
I used a barrel tuner as an example of when to use negative confirmation, the part about don't think you have to do that test on a hunting rifle is because shooting 10-shot strings chasing the absolute accuracy and precision offered by a barrel tuner might be (cough almost certainly is cough) an unnecessary waste of barrel life. Hunting rounds tend to be about power and speed, whereas competition rounds if there's no power factor then all that counts is precision. I don't want someone to think they need to burn 50 shots through a 26 Nosler trying to fine tune out the last quarter-inch of a group and end up with a shot out barrel for no real benefit. In competition good is never good enough, but when hunting the kill zone of a deer is pretty big relatively speaking, and doesn't get smaller if someone else shoots better than you. The 2-shot group/ barrel tuner example also assumes that you're at a point in load development where the majority of major issues have been addressed - seating depth, charge weight, and brass prep - and you're down to tuning barrel harmonics.
You can use 1-shot or 2-shot tests at many points in development, but only when seeking negative confirmation - that a load ISN'T something. You can say a 2-shot group probably isn't going to get better, but you can't say IS anything (pedantically, you can't even say for sure that it's worse than other groups), because the weakness of the data set precludes making any inferential conclusions. (I loved AP Stats way back when, sorry, I'm a nerd) which basically means that two shots might show "that could be good, or "it probably won't get better than how bad that was", but doesn't give you a real clue as to what the next shot will do.
And that's really what we're talking about - what will the next shot do. Trying to predict where the next bullet will land is the ultimate goal. An important part of that is how well you need to predict the next shot. In F-Class you need to know within 5" at 1,000 yards to hit the x-ring. Hunting, you have more leeway and need to know to within 10", and out to however far your cartridge makes good killing power/ the bullet will expand reliably/ whatever your preferred metric is.
The first two major steps in loading are usually seating depth and charge weight, in whichever order you prefer. For one I can accept negative confirmation, for the other I want to infer an conclusion.
When I shoot charge weight ladders, I usually initially do one load per step (I'll shoot 3-5 sometimes, but generally speaking here, average type stuff). Those shots on the ladder are really one-shot groups of each charge, which means I can't say for sure which charge is good in relation to a node, but I can find out pretty quickly where I'm not going to spend any more time. If my initial charge was off 100FPS out of 2500FPS for whatever reason (bad charge weight, chrono error) if by the end of the ladder I'm shooting 3100FPS I don't really care how badly it was off because it's so far removed from where I'll end up it's not relevant. I don't care about any predictive value from that charge weight going forward, I care that it shot safely and I didn't blow up my action by starting with too high a charge. One-shot groups worked great in that case by not wasting components and not rearranging my face, even if they aren't useful in predicting results going forward. I'll pick a "node" based on the logic that there's probably one in there somewhere, and it's probably somewhere around a flat spot on the chrono chart, but I don't expect the FPS readings to repeat very consistently until the load is fully dialed in - charge weight, seating depth, primer, etc. Charge weights should be resilient to at least 0.1gn, if not 0.2gn - absolute certainly that I'm square in the middle of a node after the first test isn't the goal of the coarse charge weight ladder. That comes later shooting groups around my speed and depth nodes.
Seating depth, I usually shoot five shots per depth, because in this case I am looking to make an inferential conclusion - I want try to know that there's a meaningful difference between the two groups. You can use two-shot groups here to determine if any seating depth probably isn't good enough (if the first two shots are three inches apart at 100-yards, maybe don't waste the next three shots), but with a decent barrel and components most likely more than one group will appear to be good enough (less than 1" at 100 yards, maybe even multiple groups clover-leafing). So you're down to splitting hairs, looking for sequential groups of good groups, and picking the one in the middle. When you do that you're apply the concept of confidence to your groups - if your 5-shot group is 1", there's a 90% chance every group at that setting will be between 0.25" and 3", aka "that could be good". If your group is 3", there's a 90% chance no group will ever be smaller than 1", aka "probably won't get better that how bad that was". If you shot 10-shot strings, you could change 90% to 95%. If you shot 20-shot strings you could change 95% to 98%. Maybe, statistics are funny, and a lot of variables can come in to play trying to shoot 20 shots in a row. So you pick the best group of 5 shots that's in the middle of two other decent groups because logically if the group you picked really isn't the best, at least the seating depth doesn't make the group fall apart as fast as if you picked the group that's right next to the 3" spread.
Unpopular opinion time: in reality 5-shot groups aren't enough to "prove" anything. 10-shot groups aren't enough. 20- and 50- groups still aren't enough. If the sample size is less than 100, we're still in the t-table space of "disproving the null" rather than proving anything conclusively. So, whether we mean to or not, we lean heavily on the concept of confidence intervals. Whenever anyone says "my 5-shot SD is 9 FPS", people who understand statistics hear "my 5-shot SD is 9 FPS, so there's a 95% chance the true SD of the 100-rounds I loaded last night is between 4-14 FPS." And guess what, a true SD of 14 FPS over 100 rounds
is not bad. It's more than good enough for the majority of shooters at range outs to 1,000 yards, because that ammo will out shoot the shooter's other skills, or even the rifle itself. A 14 FPS SD won't win 1000 yard F-Class matches or King of 2 Mile, but when you're looking at the 16" kill zone on an elk at 600 yards shooting from the fetal position behind a log when it's snowing and 17* outside, that 14 FPS SD is going to get it done for you. That hand load is so much better than the 40 FPS SD of a questionably old box of Core-Lokts you found in the back of the closest before shooting two shots at a pie plate and going to whack Bambi's mom under the feeder at 85 yards like the majority of hunters in this country that it doesn't even bear comparing. Be proud of that true 14 FPS SD.
Rule of thumb:
- About 22 data in each group is needed to detect a 1:2 ratio between two standard deviations
- About 35 data in each group are required for a 2:3 ratio
- About 50 in each group are required for a 3:4 ratio
Cite:
https://precisionrifleblog.com/2020/12/05/muzzle-velocity-statistics-for-shooters/
So basically, to confirm anything, to really narrow down your confidence intervals, you have to shoot. A LOT. And that's exactly what competitors do. They overlay groups across matches and check SD/ES across loads, and hone in on exactly which brass prep steps matter and how to seat primers, and what jump is best for absolute precision and accuracy versus what jump is most resilient over 100 consecutive rounds. They're chasing 5FPS SDs on 30-shot strings, because that means there's a 95% chance the true SD of the population of
every load they'll make is under 10FPS. Those shooters are awesome because they produce real, statistically meaningful results other people can rely on without having to shoot 100-shot strings testing every variable.
Hunters shooting 3-shot groups aren't proving that their loads are awesome. They're relying on statistics that generally speaking, a good bit of the time, in many cases, barring other things being wrong, and if they do everything right, then the load won't be the reason that they miss.
Pick your level of crazy and run with it. I'm a nutter, I'll admit that. But at the end of the day if the freezer is full, the rest is just noise right?