Cover
Jetzt kostenlos starten Mock exam notes .pdf
Summary
# Experimental design and statistical concepts
This section provides a foundational understanding of experimental design principles and essential statistical concepts used in data analysis.
### 1.1 Types of studies
* **Observational study:** Involves making observations and analyzing data without any intervention. An example is assessing the correlation between a person's daily fruit intake and their blood pressure [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Experimental study:** Involves making an intervention to test a hypothesis. An example is regimenting fruit consumption and recording blood pressure changes [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.2 Variables in experimental studies
Experimental studies aim to assess the effect of one variable while controlling others [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Independent/explanatory variable:** The variable that is changed or manipulated, and is hypothesized to cause an effect [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Dependent/response variable:** The variable that is measured and is expected to be affected by the independent variable [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Confounding variable:** A variable that can influence the measurements of both the independent and dependent variables, potentially distorting the observed relationship. For instance, in an experiment measuring gene expression in response to glucose concentrations, other sugars present in the cell culture media could act as confounding variables. Awareness of confounding variables is crucial for interpreting experimental results correctly [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.3 Experimental replicates
* **Technical replicates:** Multiple measurements taken from the same sample to assess the precision and reliability of the experimental technique [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Biological replicates:** Involve using different samples that are biologically distinct but treated identically, helping to account for natural biological variability [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.4 Control groups
* **Negative control:** A condition where no effect is expected; used as a baseline for comparison [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Positive control:** A condition where an effect on the dependent variable is known to occur; used to confirm the assay or experimental setup is working as expected [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.5 Descriptive and inferential statistics
* **Descriptive statistics:** Summarize and describe the main features of a dataset. Examples include mean, median, mode, range, standard deviation, and visualizations like graphs and plots [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Inferential statistics:** Make conclusions or predictions about a larger population based on a sample of data. Examples include t-tests, chi-squared tests, confidence intervals, regression analysis, and hypothesis testing [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.6 Measures of central tendency and spread
* **Mean:** The average of a dataset. It is sensitive to outliers [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Median:** The middle value in a sorted dataset. It is less affected by outliers than the mean [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* If a distribution is skewed left, the mean is typically less than the median [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* If a distribution is skewed right, the mean is typically greater than the median [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* A larger difference between the mean and median indicates the presence of more outliers [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Standard Deviation (SD):** A measure of the dispersion of data points around the mean [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Standard Error of the Mean (SEM):** Calculated as $SEM = \frac{SD}{\sqrt{n}}$, where $n$ is the sample size. It estimates the variability of sample means around the population mean [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.7 Correlation
Correlation quantifies the strength and direction of a linear relationship between two variables [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Correlation Coefficient ($r$):** Ranges from -1 to +1 [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* $r > 0$: Positive linear association, where both variables increase together [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* $r < 0$: Negative linear correlation, where one variable increases as the other decreases [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* $r = +1$: Perfect positive linear correlation [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* $r = -1$: Perfect negative linear correlation [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Strength of correlation:**
* 0.0 - 0.2: Very weak, negligible [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* 0.2 - 0.4: Weak, low [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* 0.4 - 0.7: Moderate [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* 0.7 - 0.9: Strong, high, marked [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* 0.9 - 1.0: Very strong, very high [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **$R^2$ (coefficient of determination):** The square of the correlation coefficient, indicating the proportion of variation in the dependent variable explained by the independent variable [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Important Note:** Correlation does **not** imply causation [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.8 Regression
Regression analysis models the relationship between a dependent variable and one or more independent variables by fitting a linear equation [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Linear regression:** A statistical method used to model relationships where a straight line can best represent the data [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Line of best fit:** The regression equation represents this line, used to estimate variables within a linear relationship [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Goodness of fit:** Assesses how well the regression equation represents the data [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **$R^2$ value:** A key indicator of goodness of fit [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Residuals:** The difference between actual data points and values predicted by the model. Analyzing residual plots helps identify if a linear model is appropriate [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Even distribution:** Points should be evenly distributed vertically and horizontally around the zero line [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Outliers:** Clearly indicate data points not well predicted by the model [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Clear shape:** A lack of clear patterns or shapes in the residuals suggests a good fit [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Prediction:** A linear regression model can be used to predict values of the dependent variable for given values of the independent variable, often with confidence intervals [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.9 Probability and risk
* **Uncertainty:** Living systems are complex, leading to variability and uncertainty in results [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Probability ($P$):** The proportion of times a specific outcome occurs from a large number of independent trials; scales from 0 (impossible) to 1 (certain) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* For mutually exclusive outcomes (outcomes that cannot happen simultaneously), probabilities can be added: $P(A \text{ or } B) = P(A) + P(B)$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* For independent events (where the outcome of one does not affect the other), probabilities are multiplied: $P(A \text{ and } B) = P(A) \times P(B)$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Risk:** The probability of undesirable things happening [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.10 Probability distributions
A probability distribution graphically represents the probability of different outcomes [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Binomial distribution:** Applies to situations with a fixed number of independent trials, each with two possible outcomes (success/failure) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* `dbinom(x, size, prob)`: Calculates the probability of exactly $x$ successes in $size$ trials with probability $prob$ of success [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* `pbinom(q, size, prob)`: Calculates the cumulative probability of $q$ or fewer successes in $size$ trials [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Discrete vs. Continuous Data:**
* **Discrete data:** Typically follows binomial distributions (use `dbinom`, `pbinom`, `qbinom`) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Continuous data:** Often follows normal distributions (use `pnorm`) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.11 Hypothesis testing
Hypothesis testing involves formulating a null hypothesis ($H_0$) and an alternative hypothesis ($H_A$) and using sample data to decide whether to reject $H_0$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Null hypothesis ($H_0$):** Assumes no effect, no difference, or no bias [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Alternative hypothesis ($H_A$):** Proposes that there is an effect, difference, or bias [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Significance level ($\alpha$):** The probability of rejecting the null hypothesis when it is actually true (Type I error). Commonly set at 0.05 [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **P-value:** The probability of obtaining results as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* If P-value < $\alpha$, reject $H_0$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* If P-value $\geq \alpha$, fail to reject $H_0$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Critical region:** The range of values for which the null hypothesis is rejected [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **One-tailed vs. Two-tailed tests:**
* **One-tailed test:** Used when the direction of the effect is clearly defined and justified. It has a larger critical region, making it more likely to reject $H_0$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Two-tailed test:** Used when the direction of the effect is not specified [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.12 Errors in hypothesis testing
* **Type I error (False positive):** Rejecting the null hypothesis when it is actually true. The probability of a Type I error is equal to $\alpha$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Type II error (False negative):** Failing to reject the null hypothesis when it is actually false. The probability of a Type II error is denoted by $\beta$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Power:** The probability of correctly rejecting the null hypothesis when it is false (i.e., avoiding a Type II error). Power is calculated as $1 - \beta$. A higher power indicates a greater likelihood of detecting a true effect [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.13 Effect size
Effect size measures the practical significance or meaningfulness of a statistical finding, independent of sample size [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Importance:** A statistically significant result (small P-value) may not be practically important if the effect size is small, especially with large sample sizes [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Measures:** Can be expressed as Cohen's $d$, correlation coefficient ($r$), or $R^2$ [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Cohen's d:** Calculated as $d = \frac{X_1 - X_2}{\text{pooled SD}}$, where $X_1$ and $X_2$ are means of two groups and pooled SD is a measure of combined standard deviation [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Factors increasing effect size:** Larger true difference between groups, lower variability, less measurement error, and well-controlled experimental designs [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.14 Confidence intervals and error bars
* **Confidence Interval (CI):** A range of values that is likely to contain the true population parameter with a certain level of confidence (e.g., 95% CI) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* For the difference between two means, if the 95% CI does not include 0, it suggests a statistically significant difference [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Error Bars:** Visual representations of variability or uncertainty.
* **Standard Deviation (SD) error bars:** Represent the spread of data points around the mean. Overlapping SD bars do not allow for conclusions about statistical significance [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Standard Error (SE) error bars:** Represent the accuracy of the sample mean as an estimate of the population mean. SE bars typically do not overlap if the difference is not significant (P > 0.05) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **95% Confidence Interval (CI) error bars:** Represent the range where the true population mean is likely to lie. If CI error bars do not overlap, the difference is likely statistically significant (P < 0.05) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.15 T-tests and ANOVA
* **T-test:** A statistical test used to determine if there is a significant difference between the means of two groups [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Assumptions:**
1. Dependent variable is continuous; independent variable is categorical with two outcomes [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
2. Data are normally distributed in the population [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
3. The two populations have equal variances (homoscedasticity) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **ANOVA (Analysis of Variance):** A statistical test used to compare the means of three or more groups. It compares the variance within groups to the variance between groups [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Assumptions:** Similar to t-tests: normally distributed data, independent observations, and equal variances across groups [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Post-hoc tests (e.g., Tukey's HSD):** Used after a significant ANOVA result to determine which specific group pairs differ significantly [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.16 Multiple testing
Performing multiple statistical tests increases the probability of obtaining at least one false positive (Type I error) [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Family-Wise Error Rate (FWER):** The probability of making at least one Type I error across a set of tests. Methods like Bonferroni correction control FWER by adjusting significance levels [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **False Discovery Rate (FDR):** The expected proportion of rejected null hypotheses that are actually false positives. Methods like the Benjamini-Hochberg (BH) procedure control FDR and are often used for a larger number of tests [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.17 Improving experimental design
Reducing error and bias is crucial for reliable experimental outcomes [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Types of error:**
* **Sampling error:** Occurs because a sample may not perfectly represent the population. Can be reduced through replication, balance, and blocking [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Bias:** Systematic error that distorts results. Can be introduced by study design, data collection, analysis, or publication practices [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Techniques to control bias:**
* **Simultaneous control groups:** Using negative and positive controls run concurrently with experimental groups [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Blinding:** Preventing participants and/or researchers from knowing who is receiving the treatment or placebo [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Randomization:** Randomly assigning subjects to experimental groups to minimize systematic bias [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 1.18 Questionable research practices (QRPs)
These practices can lead to misleading results and include cherry-picking data, p-hacking, and HARKing (Hypothesizing After the Results are Known). Fabrication and falsification of data are more severe forms of misconduct [1](#page=1) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
---
# Probability and hypothesis testing
This section delves into the foundational concepts of probability and their critical application in statistical hypothesis testing, covering essential elements such as hypothesis formulation, significance levels, critical regions, and the interpretation of p-values.
### 2.1 Core probability concepts
Probability quantifies the likelihood of specific outcomes in uncertain situations. It is represented on a scale from 0 to 1, where 1 indicates certainty and 0 indicates impossibility. The probability of mutually exclusive events (those that cannot occur simultaneously) occurring can be found by adding their individual probabilities. For independent events, where the occurrence of one does not affect the other, the probability of both occurring is found by multiplying their individual probabilities [2](#page=2).
### 2.2 Probability distributions
A probability distribution visually represents the probability of each possible outcome in a given scenario. The area under the curve of a probability distribution corresponds to the probability of observing a particular outcome or range of outcomes [2](#page=2).
#### 2.2.1 Binomial distribution
The binomial distribution is used to model situations with a fixed number of independent trials, each having only two possible outcomes, typically defined as "success" and "failure". These two outcomes must be mutually exclusive and exhaustive, meaning they sum to 1 [2](#page=2).
The conditions for a binomial distribution are:
* Two outcomes per trial: success and failure [2](#page=2).
* A fixed number of trials [2](#page=2).
* Each trial is independent [2](#page=2).
* The probability of success remains constant across all trials [2](#page=2).
The probability of observing a specific number of successes in a binomial distribution can be calculated using the `dbinom()` function in R. For example, the probability of getting exactly 2 heads in 4 coin tosses (where probability of heads is 0.5) is `dbinom(2, 4, 0.5)` which results in 0.375 [2](#page=2).
#### 2.2.2 Cumulative probability
Cumulative probability refers to the probability of a range of outcomes occurring, up to and including a certain value. In R, the `pbinom()` function is used to calculate cumulative binomial probabilities. For instance, `pbinom(2, 4, 0.5)` calculates the probability of getting up to and including 2 heads in 4 coin tosses, which is 0.6875. To find the probability of getting 3 or more heads, one would subtract the cumulative probability of getting up to 3 heads from 1 [2](#page=2).
#### 2.2.3 Discrete vs. continuous data
* **Discrete data** typically follows a binomial distribution and is analyzed using functions like `dbinom()` and `pbinom()` [2](#page=2).
* **Continuous data** is often analyzed using a normal distribution and functions like `pnorm()` [2](#page=2).
### 2.3 Hypothesis testing
Hypothesis testing is a statistical method used to make conclusions or predictions about a population based on a sample of data. It involves formulating hypotheses, analyzing data, and determining if the observed results provide sufficient evidence to reject a default assumption [1](#page=1).
#### 2.3.1 Null and alternative hypotheses
* **Null hypothesis ($H_0$)**: This is the default assumption, stating that there is no effect, no difference, or no bias. It generally represents the status quo or the absence of a phenomenon of interest. For example, $H_0$: a coin is not biased to heads [2](#page=2) [3](#page=3).
* **Alternative hypothesis ($H_A$)**: This hypothesis states that there is an effect, a difference, or a bias in a specific direction. It is mutually exclusive with the null hypothesis. For example, $H_A$: a coin is biased to heads [2](#page=2) [3](#page=3).
The process of hypothesis testing involves assuming the null hypothesis is true and then assessing the probability of observing the data (or more extreme data) under this assumption [3](#page=3).
#### 2.3.2 Significance level (alpha)
The significance level, denoted by $\alpha$, is a threshold set before conducting the test, typically at 0.05 (or 5%). It represents the probability of making a Type I error, which is rejecting the null hypothesis when it is actually true [3](#page=3).
> **Tip:** The sample size has no effect on the probability of a Type I error; it is solely determined by the chosen significance level ($\alpha$) [3](#page=3).
#### 2.3.3 Critical region and threshold value
The **threshold value** separates the regions where the null hypothesis is rejected from where it is accepted. The **critical region** encompasses the outcomes that are considered unlikely if the null hypothesis were true, leading to its rejection [2](#page=2).
For instance, when testing if a coin is fair by tossing it 100 times at a 5% significance level ($\alpha = 0.05$), we might calculate the critical values. Using `qbinom(0.025, 100, 0.5)` gives a lower critical value of 40, and `qbinom(0.975, 100, 0.5)` gives an upper critical value of 60. Therefore, the critical region for the number of heads is outside the range of 40 to 60. If the observed number of heads falls into this critical region (e.g., 59 heads), the null hypothesis would be rejected, suggesting the coin is not fair [2](#page=2).
#### 2.3.4 P-value
The **p-value** is the probability of obtaining results as extreme as, or more extreme than, the observed results, assuming that the null hypothesis is true. It is a crucial metric for interpreting the strength of evidence against the null hypothesis [2](#page=2).
* If the p-value is less than the significance level ($\alpha$), the null hypothesis is rejected.
* If the p-value is greater than or equal to the significance level ($\alpha$), the null hypothesis is not rejected.
> **Tip:** The p-value does not measure the effect size or confirm the truth of a hypothesis; it only indicates how compatible the observed data is with the null hypothesis [2](#page=2).
### 2.4 Errors in hypothesis testing
Two types of errors can occur in hypothesis testing:
* **Type I error (False positive)**: Rejecting the null hypothesis ($H_0$) when it is actually true. The probability of this error is equal to the significance level ($\alpha$) [3](#page=3).
* **Type II error (False negative)**: Failing to reject the null hypothesis ($H_0$) when it is actually false. The probability of this error is denoted by $\beta$ [3](#page=3).
### 2.5 Power of a test
The **power** of a statistical test is the probability of correctly rejecting the null hypothesis when it is false, thus avoiding a Type II error. It is calculated as Power = $1 - \beta$. A test with higher power is more reliable as it increases the chance of detecting a real effect [3](#page=3).
Factors that increase power include:
* A larger sample size [3](#page=3).
* A larger effect size [3](#page=3).
* A higher significance level ($\alpha$), though this also increases the risk of Type I error [3](#page=3).
* Using a one-tailed test when justified [3](#page=3).
### 2.6 Effect size
Effect size quantifies the meaningfulness or practical importance of a statistical difference or relationship. While a p-value indicates statistical significance, the effect size tells us if the observed effect is practically relevant in the real world. A large sample size can lead to a statistically significant p-value even with a small effect size, which might not be practically important [2](#page=2) [3](#page=3).
Common measures of effect size include:
* Cohen's $d$: calculated as $d = \frac{\bar{X}_1 - \bar{X}_2}{s_{\text{pooled}}}$ where $\bar{X}_1$ and $\bar{X}_2$ are the means of two groups and $s_{\text{pooled}}$ is the pooled standard deviation [2](#page=2).
* Correlation coefficient ($r$) [2](#page=2).
* Coefficient of determination ($R^2$) [2](#page=2).
### 2.7 Interpreting inferential statistics
When interpreting statistical results, several measures provide insights:
* **$R^2$ (Coefficient of Determination)**: This value, the square of the correlation coefficient, indicates the proportion of the variance in the dependent variable that is explained by the independent variable(s). A higher $R^2$ suggests a better fit of the linear model to the data [1](#page=1) [2](#page=2).
* **Adjusted $R^2$**: This is a modified version of $R^2$ that accounts for the number of independent variables in the model. It penalizes the addition of predictors that do not improve the model's fit, making it a more robust measure for comparing models with different numbers of predictors [2](#page=2).
* **Confidence Intervals (CI)**: A 95% CI provides a range within which the true population mean is likely to lie 95% of the time. For the difference between two means, if the 95% CI does not include zero, it suggests a statistically significant difference. Overlapping 95% CIs can sometimes obscure significance, but if they touch or do not overlap at all, the difference is likely significant [1](#page=1) [2](#page=2).
* **Standard Error (SE)**: The standard error of the mean (SEM) estimates how much the sample mean is likely to differ from the population mean. It is calculated as $SEM = \frac{\text{Standard Deviation}}{\sqrt{n}}$. SE bars typically do not overlap and are used to calculate confidence intervals [2](#page=2).
* **Standard Deviation (SD)**: The standard deviation represents the spread or dispersion of data points around the mean. SD error bars can be large and indicate data variation, but do not directly indicate statistical significance on their own [1](#page=1) [2](#page=2) [3](#page=3).
### 2.8 Visualizing data and error
Error bars are crucial for visualizing uncertainty in data.
* **SD error bars**: Represent the spread of data within a sample. No conclusion about statistical significance can be drawn if SD bars overlap [2](#page=2) [3](#page=3).
* **SE error bars**: Represent the accuracy of the sample mean as an estimate of the population mean. If SE bars do not overlap, it may suggest a significant difference, but this is not as definitive as 95% CI [2](#page=2) [3](#page=3).
* **95% CI error bars**: These are generally the most informative for inferential statements about the population mean. If 95% CIs for the difference between two means do not overlap, it strongly suggests a statistically significant difference ($p < 0.05$) [2](#page=2) [3](#page=3).
---
# Statistical errors and study quality
This section explores the fundamental types of statistical errors and the critical elements that define the quality of a research study.
### 3.1 Statistical errors
Statistical errors are inherent risks in hypothesis testing that can lead to incorrect conclusions. The two primary types of errors are Type I and Type II errors, which are closely related to the concepts of significance level and power.
#### 3.1.1 Type I and Type II errors
* **Type I error (False Positive):** This occurs when the null hypothesis ($H_0$) is rejected when it is actually true. In simpler terms, you conclude there is an effect or difference when there isn't one. The probability of making a Type I error is denoted by $\alpha$ (alpha), which is the significance level of the test. A common significance level is $0.05$, meaning there is a $5\%$ chance of incorrectly rejecting a true null hypothesis. The sample size does not affect the probability of a Type I error [6](#page=6) [7](#page=7).
* **Type II error (False Negative):** This occurs when the null hypothesis ($H_0$) is not rejected when it is actually false. This means you fail to detect an effect or difference that actually exists. The probability of making a Type II error is denoted by $\beta$ (beta). A common threshold for beta is $0.20$ (or $20\%$), indicating a $20\%$ chance of failing to reject a false null hypothesis [6](#page=6) [7](#page=7).
#### 3.1.2 Power
**Power** is the probability that a statistical test will correctly reject a false null hypothesis. It is calculated as $1 - \beta$. A study with $80\%$ power, for example, means that if a real effect exists, there is an $80\%$ chance the test will detect it. Higher power is desirable as it reduces the chance of a Type II error and increases the reliability of a study [6](#page=6) [7](#page=7).
**Factors that increase power:**
* **Bigger sample size:** Larger samples generally lead to increased power [7](#page=7).
* **Bigger effect size:** A larger true difference or relationship between variables makes it easier to detect, thus increasing power [7](#page=7).
* **Higher significance level ($\alpha$):** While this increases power, it also increases the probability of a Type I error [7](#page=7).
* **Using a one-tailed test:** This should only be used when the direction of the effect is clearly justified [7](#page=7).
* **Lower variance:** Less variability in the data makes it easier to detect a real effect, leading to higher power [7](#page=7).
> **Tip:** Power analysis is crucial for experimental design to ensure a study has a sufficient chance of detecting a meaningful effect if one exists.
### 3.2 Study quality
The quality of a study is paramount for ensuring the validity and reliability of its findings. Key aspects include controlling for error, minimizing bias, and avoiding questionable research practices.
#### 3.2.1 Controlling for error
Error in research can be broadly categorized into sampling error and bias.
* **Sampling error:** This arises because a sample is used to represent a larger population, and the sample may not perfectly reflect that population. Sampling error should ideally be normally distributed and can be estimated. Techniques to control sampling error include [6](#page=6) [7](#page=7):
* **Replication:** Repeating measurements or experiments increases the amount of data and improves accuracy [6](#page=6) [7](#page=7).
* **Technical replicates:** Multiple measurements from the same sample to assess precision [6](#page=6).
* **Biological replicates:** Using distinct biological samples treated identically to account for natural variability [6](#page=6).
* **Balance:** Using groups of similar sizes in comparisons helps maintain consistent variance, which is important for power [7](#page=7).
* **Blocking:** Grouping similar experimental units (e.g., by age, gender) and then randomly assigning treatments within each block can help control for systematic variation and reduce sampling error [7](#page=7).
* **Bias:** Bias is a systematic error that leads to distorted results, consistently skewing them in one direction. Factors contributing to bias include [6](#page=6) [7](#page=7):
* **Study design:** For example, only measuring the largest neurons might introduce bias if they have different membrane potentials than smaller ones [7](#page=7).
* **Data collection:** Equipment that consistently reads off a value higher or lower than the true value [6](#page=6) [7](#page=7).
* **Data analysis:** Using a model that systematically underestimates or overestimates values [6](#page=6) [7](#page=7).
* **Publication bias:** The tendency to publish results that align with expectations or are statistically significant [6](#page=6) [7](#page=7).
**Techniques to control bias:**
* **Simultaneous control groups:** Comparing test samples to control groups (negative, positive, or best available therapy) run concurrently ensures valid comparisons [7](#page=7).
* **Blinding:** Keeping patients and/or researchers unaware of group assignments (experimental vs. placebo) prevents expectations from influencing results [6](#page=6) [7](#page=7).
* **Randomization:** Randomly assigning subjects to groups is crucial, though it may be balanced with blocking to manage sampling error. Proper randomization protocols are a hallmark of good experimental design [6](#page=6) [7](#page=7).
#### 3.2.2 The placebo effect
The placebo effect highlights how a participant's expectations can influence their response to a treatment, even if the treatment is inert. Factors influencing its efficiency include the route of administration, number and color of pills, packaging, and the clinician's beliefs [7](#page=7).
#### 3.2.3 Questionable research practices (QRPs)
QRPs are methods that can inflate the likelihood of obtaining a statistically significant result without necessarily indicating a true effect. They are distinct from outright fabrication or falsification of data but undermine research integrity. Examples include [7](#page=7):
* **Cherry-picking:** Selectively presenting data or analyses that support a desired outcome [6](#page=6) [7](#page=7).
* **P-hacking:** Manipulating data or analysis until a statistically significant p-value is achieved. This can involve [7](#page=7):
* Checking statistical significance before collecting more data [7](#page=7).
* Stopping data collection early once a significant (or non-significant) result is reached [7](#page=7).
* Removing data without clear justification [7](#page=7).
* Rounding p-values to meet significance thresholds (e.g., $0.053$ to $0.05$) [7](#page=7).
* Hiding multiple tests performed and not adjusting p-values accordingly [7](#page=7).
* Adjusting statistical models based on whether a significant result is obtained, without proper justification [7](#page=7).
* **Hypothesizing after results are known (HARKing):** Presenting exploratory findings as if they were the original, pre-defined hypothesis [7](#page=7).
> **Tip:** Always look for transparency in methodology, especially regarding randomization and data analysis. If these details are absent or vague, it can be a red flag.
---
# Advanced statistical tests and their assumptions
This section details specific statistical tests, focusing on their underlying assumptions, how to perform them, and how to interpret their results, while also discussing methods for correcting multiple testing [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.1 Hypothesis testing and inferential statistics
Inferential statistics aim to make conclusions or predictions about a population based on a sample. Hypothesis testing is a core component, involving the formulation of a null hypothesis ($H_0$) and an alternative hypothesis ($H_A$). The null hypothesis typically states no effect or no difference, while the alternative hypothesis posits a specific trend or effect. The process assumes $H_0$ is true and evaluates the likelihood of observing the data under this assumption [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Significance level ($\alpha$)**: This is the probability of making a Type I error (false positive), which is the probability of rejecting $H_0$ when it is actually true. A common $\alpha$ is 0.05, meaning there is a 5% chance of a false positive [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **P-value**: The probability of obtaining results as extreme as, or more extreme than, the observed data, assuming the null hypothesis is true. If the p-value is less than $\alpha$, $H_0$ is rejected [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Critical Region**: This is the range of values for the test statistic for which $H_0$ is rejected [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Type II error ($\beta$)**: The probability of failing to reject $H_0$ when it is actually false (false negative) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Power**: The probability of correctly rejecting $H_0$ when it is false, calculated as $1 - \beta$. Higher power increases the likelihood of detecting a real effect [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.2 T-tests
T-tests are statistical tests used to evaluate if there is a statistically significant difference between the means of up to two samples [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
#### 4.2.1 Assumptions of the t-test
Before conducting a t-test, several assumptions about the data must be met to ensure the validity of the probability calculations [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
1. **Variable types**: The dependent variable must be continuous, and the independent variable must be bivariate (categorical with only two outcomes) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
2. **Normal distribution**: The populations from which the samples are drawn should be approximately normally distributed. This can be assessed using a normal quantile-quantile plot (Q-Q plot) [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
3. **Equal variances**: The two populations should have equal variances. This can be checked by examining the ratio of the larger variance to the smaller variance; if it's less than 4, variances are often considered equal [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
#### 4.2.2 Types of t-tests
* **Two-sample t-test**: Compares the means of two independent groups, such as a drug treatment group versus a control group. R does not assume equal variances by default, but `var.equal = TRUE` can be specified if this assumption is met [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **One-sample t-test**: Compares the mean of a single group to a known or hypothesized population mean [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Paired t-test**: Used when samples are related, analyzing data in pairs, such as measurements before and after an intervention on the same subjects [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.3 Analysis of Variance (ANOVA)
ANOVA is an F-test used to compare the means of three or more samples simultaneously. It works by comparing the variance within the different samples to the variance between the different samples. The outcome determines if all samples likely come from the same population or if at least one group originates from a different population [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
#### 4.3.1 Assumptions of ANOVA
Similar to t-tests, ANOVA has assumptions:
1. **Normal distribution**: The data within each group should be normally distributed [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
2. **Independence**: Observations within each group and between groups must be independent [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
3. **Equal variances (homoscedasticity)**: The groups must have equal variances [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
#### 4.3.2 Performing and interpreting ANOVA
In R, the `aov()` function is used to perform ANOVA, followed by `summary()` to view the output [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Output Interpretation**: The ANOVA output includes sum of squares, degrees of freedom (DF), mean squares, F-statistic, and the p-value [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **F-statistic**: Calculated as the ratio of mean squares between groups to mean squares within groups. A high F-statistic suggests a statistically relevant effect, leading to the rejection of $H_0$ [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Degrees of freedom**: DF between groups = K (number of groups) - 1; DF within groups = N (total number of observations) - K [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Reporting ANOVA**: Results are typically reported as F(DF between, DF within) = F-value, p = p-value [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
#### 4.3.3 Post-hoc tests
If ANOVA indicates a significant difference between group means, post-hoc tests are used to determine which specific groups differ. The Tukey honest significance test is a common post-hoc test, providing adjusted p-values for pairwise comparisons [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.4 Correcting for multiple testing
When conducting multiple statistical tests, the probability of a Type I error (false positive) increases. Multiple testing correction methods aim to control this [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Bonferroni correction**: A strict method that controls the family-wise error rate (FWER), ensuring the probability of at least one false positive remains at the chosen alpha level. It involves dividing the original alpha by the number of tests. This method increases the false negative rate and decreases power [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Benjamini-Hochberg (BH) procedure**: Controls the false discovery rate (FDR), which is the expected proportion of false positives among all rejected null hypotheses. This is often preferred for exploratory data analysis with a large number of tests, as it offers increased power compared to Bonferroni [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.5 Error and its implications
Understanding and minimizing error is crucial for reliable statistical inference [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Types of Error**:
* **Sampling error**: Occurs because a sample may not perfectly represent the entire population. Techniques like replication, balance (equal sample sizes), and blocking help reduce this error [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Bias**: Systematic error that distorts results, stemming from study design, data collection, data analysis, or publication practices [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Controlling Bias**: Methods include using simultaneous control groups, blinding (where participants or researchers are unaware of treatment assignments), and randomization [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.6 Effect Size and Confidence Intervals
* **Effect Size**: Quantifies the magnitude of a statistical difference or relationship, indicating its practical importance. Common measures include Cohen's d and $R^2$ [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Cohen's d**: Calculated as the difference between two means divided by the pooled standard deviation. The formula for pooled standard deviation is: $$ \text{Pooled SD} = \sqrt{\frac{\text{SD}_1^2 + \text{SD}_2^2}{2}} $$ [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Confidence Intervals (CI)**: Provide a range of plausible values for a population parameter (e.g., the mean). For a 95% CI, this range is expected to contain the true population mean 95% of the time [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* For t-tests, the 95% CI for the difference between means is particularly informative. If this interval does not include 0, it suggests a statistically significant difference [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Error Bars**:
* **Standard Deviation (SD)** error bars represent data spread and do not directly indicate significance [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Standard Error (SE)** error bars represent the accuracy of the sample mean as an estimate of the population mean and are used to calculate CIs [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **95% Confidence Interval (CI)** error bars are used for inferential statements about the population mean. If CI bars do not overlap, it suggests a statistically significant difference (p < 0.05). Even with up to 50% overlap, a difference can still be significant [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
### 4.7 Questionable Research Practices (QRPs)
These are practices that can distort results or lead to false conclusions, often without amounting to outright fabrication or falsification. Examples include [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7):
* **Cherry-picking**: Presenting only data that supports a desired outcome [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **P-hacking**: Manipulating data or analysis until a statistically significant p-value is obtained. This includes stopping data collection early when a significant result is found or removing data without justification [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Hypothesizing after results are known (HARKing)**: Presenting a post-hoc hypothesis as if it were an a priori one [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
* **Publication Bias**: The tendency to publish studies with statistically significant results more often than those with non-significant findings, leading to an overrepresentation of positive findings in the literature [2](#page=2) [3](#page=3) [4](#page=4) [5](#page=5) [6](#page=6) [7](#page=7).
---
## Common mistakes to avoid
- Review all topics thoroughly before exams
- Pay attention to formulas and key definitions
- Practice with examples provided in each section
- Don't memorize without understanding the underlying concepts
Glossary
| Term | Definition |
|------|------------|
| Observational study | A type of study where data is collected by observing subjects without intervention or manipulation. |
| Experimental study | A study where an intervention or manipulation is applied to one or more variables to observe its effect on a dependent variable. |
| Independent variable | The variable that is changed or manipulated by the researcher in an experiment to observe its effect. |
| Dependent variable | The variable that is measured in an experiment to assess the effect of the independent variable. |
| Confounding variable | An extraneous variable that can influence the relationship between the independent and dependent variables, potentially leading to biased results. |
| Technical replicates | Multiple measurements taken from the same sample to assess the precision and reliability of an experimental technique. |
| Biological replicates | Different samples that are biologically distinct but treated identically, used to account for natural biological variability. |
| Negative control | A control group or condition in an experiment where no effect is expected, serving as a baseline for comparison. |
| Positive control | A control group or condition in an experiment where a known effect is expected, used to validate the experimental setup and reagents. |
| Descriptive statistics | Statistical methods used to summarize and describe the main features of a dataset, such as mean, median, and standard deviation. |
| Inferential statistics | Statistical methods used to make conclusions or predictions about a population based on a sample of data. |
| Correlation | A statistical measure that describes the strength and direction of a linear relationship between two variables. |
| Regression | A statistical method used to model the relationship between a dependent variable and one or more independent variables, often used for prediction. |
| R-squared ($R^2$) | A statistical measure that represents the proportion of the variance in the dependent variable that is predictable from the independent variable(s). |
| Residuals | The difference between an actual data point and the value predicted by a statistical model, used to assess the fit of the model. |
| Probability | The likelihood of a specific outcome occurring in a random event, expressed as a number between 0 and 1. |
| Mutually exclusive outcomes | Events that cannot occur at the same time. The sum of their probabilities equals 1. |
| Independent events | Events where the outcome of one event does not affect the outcome of another event. |
| Probability distribution | A function that describes the likelihood of obtaining the possible values that a random variable can assume. |
| Binomial distribution | A probability distribution that represents the number of successes in a fixed number of independent Bernoulli trials (trials with two possible outcomes). |
| Cumulative probability | The probability of a random variable taking on a value less than or equal to a specific value. |
| Discrete data | Data that can only take on a finite number of values or a countable number of values, often whole numbers. |
| Continuous data | Data that can take on any value within a given range, with an infinite number of possibilities between any two values. |
| Null hypothesis ($H_0$) | A statement that there is no significant difference or effect between groups or variables in a statistical test. |
| Alternative hypothesis ($H_A$) | A statement that there is a significant difference or effect between groups or variables, which contradicts the null hypothesis. |
| Significance level (alpha, $\alpha$) | The probability of rejecting the null hypothesis when it is true (Type I error rate). Commonly set at 0.05. |
| Critical region | The set of values for the test statistic that leads to the rejection of the null hypothesis. |
| P-value | The probability of obtaining test results at least as extreme as the observed results, assuming the null hypothesis is true. |
| Effect size | A measure of the magnitude of a phenomenon or the strength of a relationship between variables, indicating practical significance. |
| Type I error (False positive) | Rejecting the null hypothesis when it is actually true. |
| Type II error (False negative) | Failing to reject the null hypothesis when it is actually false. |
| Power | The probability of correctly rejecting the null hypothesis when it is false (i.e., the probability of detecting a true effect). |
| Standard Deviation (SD) | A measure of the amount of variation or dispersion of a set of values. |
| Standard Error of the Mean (SEM) | A measure of the variability of sample means around the population mean. It is calculated as $SD / \sqrt{n}$. |
| Confidence Interval (CI) | A range of values, derived from sample statistics, that is likely to contain the value of an unknown population parameter. A 95% CI means that if the experiment were repeated many times, 95% of the intervals constructed would contain the true population parameter. |
| T-test | A statistical hypothesis test used to determine if there is a significant difference between the means of two groups. |
| Paired t-test | A statistical test used to compare the means of two related samples, such as measurements taken from the same subjects before and after an intervention. |
| ANOVA (Analysis of Variance) | A statistical test used to compare the means of three or more groups to determine if there are any statistically significant differences between them. |
| Post-hoc test | Statistical tests performed after a significant ANOVA result to determine which specific group means differ from each other. |
| Tukey's honest significance test | A common post-hoc test used to perform all pairwise comparisons between group means. |
| Bonferroni correction | A method used to control the family-wise error rate when performing multiple statistical tests. |
| False Discovery Rate (FDR) | The expected proportion of rejected null hypotheses that are actually true (false discoveries). |
| Sampling error | The error that arises from the fact that a sample is used to represent a population, rather than the entire population. |
| Bias | A systematic error that can lead to distorted or inaccurate results, often due to flaws in study design, data collection, or analysis. |
| Replication | Repeating an experiment or study multiple times to confirm results and increase reliability. |
| Balance | In experimental design, refers to having equal sample sizes in each group being compared, which can improve the power of statistical tests. |
| Blocking | A technique used in experimental design to reduce variability by grouping similar experimental units before random assignment to treatments. |
| Blinding | A technique in research where participants or researchers are unaware of the group assignments (e.g., treatment vs. placebo) to prevent bias. |
| Randomization | The process of randomly assigning subjects to different treatment groups to minimize systematic bias. |
| Placebo effect | A beneficial effect produced by a placebo drug or treatment, which cannot be attributed to the properties of the placebo itself, and must therefore be due to the patient's belief in that treatment. |
| Questionable Research Practices (QRPs) | Practices that violate conventional scientific norms but may not constitute outright misconduct, such as p-hacking or HARKing. |
| P-hacking | The practice of analyzing data in various ways until a statistically significant result is found. |
| HARKing (Hypothesizing After the Results are Known) | Formulating a hypothesis after the data has already been analyzed, presenting it as if it were a prior hypothesis. |
| Fabrication | The invention of data or results. |
| Falsification | The manipulation of research materials, equipment, or processes, or changing or omitting data or results such that the research is not accurately represented in the research record. |
| Biological replicate | In the context of an experiment, refers to independent biological samples that are treated similarly, allowing for assessment of biological variability. For example, using cells from different cultures or different individuals. |