A blog on statistics, methods, philosophy of science, and open science. Understanding 20% of statistics will improve 80% of your inferences.

Thursday, April 7, 2016

One-sided F-tests and halving p-values

After my previous post about one-sided tests, some people wondered about two-sided F-tests. And then Dr R recently tweeted:




I thought it would be useful to illustrate 1) why the default F-test is never ‘two-sided’ , 2) why a one-sided F-test on two means is not possible, but a one-sided t-test for the same two means is possible, and 3) why you can not halve p-values for F-tests with more than 2 groups.

The F-value and the t-value are related: t2 = F. This holds as long as the df1 = 1 (e.g., F(1, 100)) because in this case, two groups are compared, and thus, the F-test should logically equal a t-test. The critical t-value, squared, of a two-sided t-test with a 5% error rate equals the F-value of a F-test, which is always one-sided, with a 5% error rate. 

If you halve a p-value from an F-test (e.g., a p = 0.06 is reported as a ‘one-sided’ p = 0.03), you don’t end up with a directional F-test with a 5% error rate. It already was a directional F-test with a 5% error rate.

The reason is that t2 has no negative values. In a F-distribution, all differences are values in the same direction. It’s like when you close a book that was open: Where all pages were on both sides of the spine in the open book (the t-test) all pages are on one side of the spine in a F-test.

This is visualized in the figure below (see the R script below this post). The black curve is an F(1,100)-distribution (did you know the F-distribution was named in honor of Fischer?). The light blue area contains the F-values that are extreme enough to lead to p-values smaller than 0.05. The green curve is the right half of a t-distribution, and the light green area contains t-values high enough to return p-values smaller than 0.05. This t-distribution has a mirror image on the other side of 0 (not plotted). 



The two curves connect at 1 because t2 = F, and 12 = 1. If we square the critical value for the two-sided t-test (t = 1.984), we get the critical F-value (F = 3.936).

In the F-distribution, all extreme p-values are part of a single tail. In the t-distribution, only half of the extreme values are in the right tail. It is possible to use a t-test instead of an F-test on the same means. With the t-test, you can easily separate the extreme values from differences in one direction, from the extreme values due to differences in the other direction, which is not possible in the F-test. When you switch from the F-test to the t-test, you can report a one-sided p-value when comparing two groups, as long as you decided to perform a one-sided test before looking at the data (this doesn't seem likely in Dr R's tweet above!).

If the df1 is larger than 1 (for example, when more than 2 groups are compared), the F-test checks whether there are differences between multiple groups. Now, the relation between the t-test and F-test no longer exists. It not possible to report a t-test instead of an F-test, and it is thus no longer possible to report a p-value that is divided by two.  

Remember that an F-test is the ratio of the mean squares variance estimate of effects plus error, divided by the mean squared error variance alone. If this ratio is 1, there is no effect, and if the ratio is sufficiently larger than one, there is an effect - which is why we are only interested in the larger than 1 tail. The larger the degrees of freedom, the closer the F-distribution lies around 1. In the graph below, we see the F(100, 1000)-distribution and a t-distribution with df = 1000. It's a very clear (albeit extreme) illustration of the difference between a F-distribution and the t-distribution. 


To conclude: When comparing two groups, an F-test is always one-sided, but you can report a (more powerful) one-sided t-test - as long as you decided this before looking at the data. When comparing more than two groups, and the df1 is larger than 1, it makes no sense to halve the p-value (although you can always choose an alpha level of 10% when designing your study).