mirror of
https://github.com/gladstone-institutes/Bioinformatics-Workshops.git
synced 2025-11-30 09:45:43 -08:00
changes
This commit is contained in:
parent
0e93d723f1
commit
97c65d55bf
2 changed files with 29 additions and 27 deletions
|
|
@ -37,7 +37,7 @@ After looking numerically at the data, let us look at it visually. We will plot
|
||||||
```{r}
|
```{r}
|
||||||
## load the library to be used for plotting
|
## load the library to be used for plotting
|
||||||
suppressMessages(library(ggplot2))
|
suppressMessages(library(ggplot2))
|
||||||
ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot()
|
ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot() + geom_jitter(width=0.1)
|
||||||
```
|
```
|
||||||
|
|
||||||
## One-sided, one sample t-test
|
## One-sided, one sample t-test
|
||||||
|
|
@ -58,6 +58,7 @@ LinSeedWeights <- filter(chickwts, feed =="linseed")$weight
|
||||||
print(LinSeedWeights)
|
print(LinSeedWeights)
|
||||||
##let us again visualize this
|
##let us again visualize this
|
||||||
boxplot(LinSeedWeights)
|
boxplot(LinSeedWeights)
|
||||||
|
abline(h=200, lty=2, col="red")
|
||||||
```
|
```
|
||||||
|
|
||||||
Now, we will run the one-sample, one-sided t-test.
|
Now, we will run the one-sample, one-sided t-test.
|
||||||
|
|
@ -84,7 +85,7 @@ SubChickWts$feed <- droplevels(SubChickWts$feed)
|
||||||
str(SubChickWts)
|
str(SubChickWts)
|
||||||
|
|
||||||
##let us plot this again
|
##let us plot this again
|
||||||
ggplot(SubChickWts, aes(x=feed, y=weight)) + geom_boxplot()
|
ggplot(SubChickWts, aes(x=feed, y=weight)) + geom_boxplot() + geom_jitter(width=0.1)
|
||||||
```
|
```
|
||||||
|
|
||||||
The distribution of chick weights fed soybean appears to have slightly higher than those fed with linseed. We will compute the mean weights of the checks fed with each kind of feed.
|
The distribution of chick weights fed soybean appears to have slightly higher than those fed with linseed. We will compute the mean weights of the checks fed with each kind of feed.
|
||||||
|
|
@ -137,7 +138,7 @@ wilcox.test(weight ~ feed, data=SubChickWts)
|
||||||
## Statistical power estimates or Sample Size calculations
|
## Statistical power estimates or Sample Size calculations
|
||||||
We were unable to reject the hypothesis that linseed and soybean feed kept the mean weights of the chicks the same. Let us visualize these data again,
|
We were unable to reject the hypothesis that linseed and soybean feed kept the mean weights of the chicks the same. Let us visualize these data again,
|
||||||
```{r}
|
```{r}
|
||||||
ggplot(SubChickWts, aes(x=feed, y=weight)) + geom_boxplot()
|
ggplot(SubChickWts, aes(x=feed, y=weight)) + geom_boxplot()+ geom_jitter(width=0.1)
|
||||||
```
|
```
|
||||||
It however appears that the soybean feed does increase the mean weight over the linseed feed.
|
It however appears that the soybean feed does increase the mean weight over the linseed feed.
|
||||||
If the increase is true then we need more samples to conclude that the soybean feed does increase the mean chick weight in a statistically significant manner.
|
If the increase is true then we need more samples to conclude that the soybean feed does increase the mean chick weight in a statistically significant manner.
|
||||||
|
|
@ -151,7 +152,7 @@ To do that we can perform something called a statistical power analyses. We will
|
||||||
|
|
||||||
2. The effect size that you want to have the statistical power to estimate
|
2. The effect size that you want to have the statistical power to estimate
|
||||||
|
|
||||||
3. At what Type I error will you be making claims of statistical significance. This is a number between 0 and 1 (typically 0.05) and represents the fraction of times (when you repeat the same experiment over and over again) when you will claim significance when in fact your null hypothesis is true (there is no differernce in the mean weights).
|
3. At what Type I error will you be making claims of statistical significance. This is a number between 0 and 1 (typically 0.05) and represents the fraction of times (when you repeat the same experiment over and over again) when you will claim significance when in fact your null hypothesis is true (there is no difference in the mean weights).
|
||||||
|
|
||||||
4. What is the desired statistical power? This is a number between 0 and 1 and represents the fraction of times (when you repeat the same experiment over and over) you want to claim significance at the chosen Type I error, when there is really a difference as captured by the effect size.
|
4. What is the desired statistical power? This is a number between 0 and 1 and represents the fraction of times (when you repeat the same experiment over and over) you want to claim significance at the chosen Type I error, when there is really a difference as captured by the effect size.
|
||||||
|
|
||||||
|
|
@ -172,10 +173,10 @@ The results says we need to have at least 60 chicks in each feed group to have a
|
||||||
We will now go back to looking at the distribution of the weights of chicks fed all the diets and not just the above two one. Our null hypothesis is that the mean chick weights is same for all the 6 feeds. Let us visualize the data again,
|
We will now go back to looking at the distribution of the weights of chicks fed all the diets and not just the above two one. Our null hypothesis is that the mean chick weights is same for all the 6 feeds. Let us visualize the data again,
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot()
|
ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot()+ geom_jitter(width=0.1)
|
||||||
```
|
```
|
||||||
|
|
||||||
The appropriate test statistic to use here is called the F-statistic, its sampling distribution is called the F-distribution. While the t-distribution captures the sampling distribution of the scaled sample mean or scaled difference of sample means, the F-distribution captures the proportion of variance between all observations within a feed group due to variance in the mean chick weights between feed groups, i.e.,
|
The appropriate test statistic to use here is called the F-statistic, its sampling distribution is called the F-distribution. While the t-distribution captures the sampling distribution of the scaled sample mean or scaled difference of sample means, the F-distribution captures the ratio of variance in the mean chick weights between feed groups versus variance between all observations within a feed group , i.e.,
|
||||||
|
|
||||||
$$F = \frac{between\ feed\ group\ weight\ variance}{within\ feed\ group\ weight\ variance}$$
|
$$F = \frac{between\ feed\ group\ weight\ variance}{within\ feed\ group\ weight\ variance}$$
|
||||||
So, intuitively when the mean chick weights are not different between the different feed groups, the variance between these mean weights should be similar to variances of weights within a feed group. That is, under the null hypothesis F will hover around 1. Note, when we say, "within a feed group", we don't specify which particular feed group. This should suggest to you the requirement of the assumption that within feed groups variances are same across all groups.
|
So, intuitively when the mean chick weights are not different between the different feed groups, the variance between these mean weights should be similar to variances of weights within a feed group. That is, under the null hypothesis F will hover around 1. Note, when we say, "within a feed group", we don't specify which particular feed group. This should suggest to you the requirement of the assumption that within feed groups variances are same across all groups.
|
||||||
|
|
@ -187,7 +188,7 @@ summary(AmodelFit)
|
||||||
```
|
```
|
||||||
|
|
||||||
The significance above suggests that there are feeds resulting in differing mean chick weights.
|
The significance above suggests that there are feeds resulting in differing mean chick weights.
|
||||||
We don't get information on which pairs are really different from each other. To get this information, we will perform multiple pairwise tests using Tukey's posthoc tests.
|
We don't get information on which pairs are really different from each other. To get this information, we will perform multiple pairwise tests using Tukey's post-hoc tests.
|
||||||
|
|
||||||
### Multiple testing
|
### Multiple testing
|
||||||
```{r}
|
```{r}
|
||||||
|
|
@ -296,7 +297,7 @@ There are always assumptions to check for. We will visually attempt to test the
|
||||||
|
|
||||||
2. Normality of residuals (differences between observations and their predictions using the linear model).
|
2. Normality of residuals (differences between observations and their predictions using the linear model).
|
||||||
|
|
||||||
3. Homogenity of variances across the fitted/predicted values of distance
|
3. Homogeneity of variances across the fitted/predicted values of distance
|
||||||
|
|
||||||
4. Influence of outliers on slope estimates
|
4. Influence of outliers on slope estimates
|
||||||
|
|
||||||
|
|
@ -309,7 +310,7 @@ plot(lmFit)
|
||||||
We will now perform a linear model version of the one-way ANOVA test we ran above,
|
We will now perform a linear model version of the one-way ANOVA test we ran above,
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot()
|
ggplot(chickwts, aes(x=feed, y=weight)) + geom_boxplot() + geom_jitter(width = 0.1)
|
||||||
lmFit <- lm(weight ~ feed, chickwts)
|
lmFit <- lm(weight ~ feed, chickwts)
|
||||||
print(levels(chickwts$feed))
|
print(levels(chickwts$feed))
|
||||||
summary(lmFit)
|
summary(lmFit)
|
||||||
|
|
@ -334,7 +335,7 @@ Let us visualize the data now,
|
||||||
|
|
||||||
```{r}
|
```{r}
|
||||||
|
|
||||||
ggplot(ToothGrowth, aes(x=dose, y=len, color=supp)) + geom_boxplot()
|
ggplot(ToothGrowth, aes(x=dose, y=len, color=supp)) + geom_boxplot()
|
||||||
```
|
```
|
||||||
|
|
||||||
We will formulate a linear model to estimate the effects of _dose_ and _supp_.
|
We will formulate a linear model to estimate the effects of _dose_ and _supp_.
|
||||||
|
|
|
||||||
File diff suppressed because one or more lines are too long
Loading…
Add table
Add a link
Reference in a new issue