Rumor has it that Ronald Fisher invented the p-value when a woman, call her Muriel, at his lunch table was drinking tea and claimed she could if the milk was put in before the hot water and vice versa.
Fisher wanted to test this hypothesis, so he got eight cups, put milk then water in half of them and water then milk in the other half. She got them all correct.
Fisher reasoned that if she was guessing at random (the null hypothesis), the chances were 1/70 that she would have gotten them all correct (, Muriel knew there were four of each). Fisher thought this sufficient evidence in her favor.
Note that this example might suffer from the counterfactual worlds problem with p-values (see issues with p-values:Counterfactual reasoning). If Muriel had made 1 mistake, then the p-value is 17/70 — would Fisher have allowed her to run the experiment again? If so, the sample size is not truly fixed and the p-value is invalid.