samenvatting statistiek (1).pdf
Summary
# Types of research and variables
This section outlines the fundamental types of research designs and the classification of variables crucial for scientific inquiry.
## 1. Types of research and variables
### 1.1 Research designs
Medical and epidemiological research can be broadly categorized into observational and experimental studies [1](#page=1).
#### 1.1.1 Observational research
In observational research, researchers observe participants and collect measurements without actively influencing them. The goal is to identify relationships between different measurements [1](#page=1).
* **Case-control study:** This design compares a group with a specific condition or disease (cases) to a similar group without the condition (controls). The comparison focuses on potential causes, making it inherently retrospective as the causes of a condition are typically in the past [1](#page=1).
* **Cohort study:** A cohort refers to a group of patients being observed. Cohort studies can be classified based on their timeframe [1](#page=1):
* **Retrospective:** Looking back in time [1](#page=1).
* **Transversal/Cross-sectional:** Data collected during the research period [2](#page=2).
* **Prospective:** Looking into the future [1](#page=1).
#### 1.1.2 Experimental research
In experimental research, participants are influenced by an intervention, and the aim is to measure the effect of this intervention. These studies are always prospective cohort studies. Typically, participants are divided into two groups: one receiving the intervention and a control group that does not [1](#page=1).
> **Tip:** Applied statistics involves analyzing data to answer a scientific question. Data are observations where variables are measured. Researchers often use a sample from a target population to make inferences about that population [1](#page=1).
### 1.2 Variables
Variables are characteristics of a population that can vary [1](#page=1).
#### 1.2.1 Outcome variable
The outcome variable is the primary variable the researcher aims to make a statement about. It is also known as the dependent variable, and researchers seek to predict or explain it. Outcome variables are usually continuous or dichotomous [2](#page=2).
#### 1.2.2 Independent variables
Independent variables are all other variables in a study, serving as determinants, explanatory variables, predictors, or covariates [2](#page=2).
#### 1.2.3 Categorical variables
Categorical variables have a limited number of distinct outcomes [2](#page=2).
* **Nominal:** Categories are not ordered. Examples include blood type or occupation [2](#page=2).
* **Ordinal:** Categories are ordered based on a degree or rank. An example is asking about the extent of depressive feelings on a scale from "low" to "high" [2](#page=2).
* **Dichotomous:** Variables with only two possible outcomes [2](#page=2).
* **Dummy coding:** A form of dichotomous variable where categories are coded as 1 or 0, for instance, coding gender [2](#page=2).
#### 1.2.4 Numerical variables
Numerical variables are those that can be used in calculations [2](#page=2).
* **Discrete:** These variables take on whole numbers. An example is the number of times a person visited the doctor in a year [2](#page=2).
* **Continuous:** These variables can theoretically take on an infinite number of values within a range. Examples include weight and height [2](#page=2).
* **Interval scale:** The distance between any two points is the same. Weight in kilograms is an example [2](#page=2).
* **Ratio scale:** These scales have a true, natural zero point, indicating the absence of the measured quantity. For example, 0 kilograms represents the absence of weight, whereas 0 degrees Celsius does not represent the absence of temperature [2](#page=2).
### 1.3 Types of statistics
Statistics are broadly divided into two main types:
* **Descriptive statistics:** This is the initial phase of research, focusing on summarizing data clearly through graphical or numerical representations without exploring relationships between variables. It directly answers the research question [2](#page=2).
* **Explanatory/Inferential statistics:** This second stage involves estimating effects or relationships, testing hypotheses, and assessing the reliability of research findings. It includes formulating research questions and testing hypotheses [2](#page=2).
---
# Descriptive statistics: data representation and summary measures
Descriptive statistics are used to summarize and visualize data through graphical and numerical representations, providing an overview of the dataset without analyzing relationships between variables [2](#page=2) [3](#page=3).
### 2.1 Graphical data representation
Graphical representations help visualize data distribution and patterns [3](#page=3).
* **Bar chart (staafdiagram):** Used for categorical variables [3](#page=3).
* **Clustered/segmented bar chart (geclusterd/gesegmenteerd staafdiagram):** Useful for comparing two categorical variables graphically [3](#page=3).
* **Pie chart (taartdiagram):** Suitable for dichotomous or categorical variables, often used in presentations rather than scientific articles [3](#page=3).
* **Histogram:** Essential for continuous variables, providing insight into the variable's distribution within the dataset. It is the first step when analyzing continuous variables [3](#page=3).
* **Stem-and-leaf plot (tak-en-blad diagram):** Used for continuous variables, particularly with small sample sizes, functioning as a sideways histogram [3](#page=3).
* **Scatter plot (puntenwolk/scatterplot):** Visualizes the relationship between two continuous variables, with one plotted on the x-axis and the other on the y-axis, where each point represents an observation [3](#page=3).
* **Frequency table (frequentietabel):** The initial step for categorical variables, capable of incorporating missing values [3](#page=3).
* **Box-and-whisker plot (box-plot):** Used for continuous variables, displaying a vertical line from minimum to maximum, a box representing the interquartile range with a line for the median, and is a combination of graphical and numerical representation [3](#page=3).
> **Tip:** Histograms are a crucial first step for understanding the distribution of continuous variables [3](#page=3).
### 2.2 Numerical data representation
Numerical representations summarize data using frequencies and central tendency measures [3](#page=3).
#### 2.2.1 Frequency tables
Frequency tables numerically display research data for dichotomous or categorical variables, showing both counts and percentages. The "valid percentage" excludes missing values [3](#page=3).
> **Tip:** Frequency tables are not informative for continuous variables due to the large number of potential values, each occurring infrequently. For continuous variables, summary measures like the average are preferred [3](#page=3).
#### 2.2.2 Central tendency measures (centrummaten)
These measures indicate the typical value in a dataset [3](#page=3).
* **Mode (modus):** The most frequent value in a dataset; it can also be used for categorical variables but is often not very informative [3](#page=3) [4](#page=4).
* **Arithmetic mean (rekenkundig gemiddelde):** Calculated by summing all values and dividing by the total number of values. It is only a good indicator for normally distributed variables. The formula is [3](#page=3) [4](#page=4):
$$ \bar{x} = \frac{\sum_{i=1}^{n} x_i}{n} $$
where $\bar{x}$ is the arithmetic mean [3](#page=3) [4](#page=4).
* **Median (mediaan):** The middle value when all observations are ordered; it is based on percentile points with 50% of results above and 50% below. The difference between the median and the mean depends on the symmetry of the distribution. It is calculated by ordering all observations into percentile points (P50 = middle value) [3](#page=3) [4](#page=4).
* **Geometric mean (geometrisch gemiddelde):** Used for non-normally distributed (right-skewed) variables by taking the natural logarithm of each value. This transformation results in new variables that lose their original units. To return to the original units, the inverse of the logarithm is applied to the calculated geometric mean [4](#page=4).
> **Tip:** After transforming a variable (e.g., with a logarithm) to achieve a normal distribution, always check the histogram to confirm the normality and then derive the central tendency measure from this transformed distribution [4](#page=4).
#### 2.2.3 Dispersion measures (spreidingsmaten)
These measures describe how spread out the results are [3](#page=3).
* **Variance (variantie):** Represents the average of the squared differences from the mean. It is used for normally distributed variables. Values are squared to prevent positive and negative differences from canceling each other out, which would result in a sum of zero. The formula is [3](#page=3) [4](#page=4):
$$ s^2 = \frac{\sum_{i=1}^{n} (x_i - \bar{x})^2}{n-1} $$
where $s^2$ is the variance, $x_i$ are the individual values, $\bar{x}$ is the arithmetic mean, and $n$ is the number of observations [3](#page=3) [4](#page=4).
* **Standard deviation (standaarddeviatie):** The square root of the variance. It represents the average distance of each value from the arithmetic mean and is used for normally distributed variables. The formula is [3](#page=3) [4](#page=4):
$$ s_d = \sqrt{s^2} $$
where $s_d$ is the standard deviation [3](#page=3) [4](#page=4).
* **Range (range):** The difference between the minimum and maximum values in a dataset [3](#page=3).
* **Interquartile range (interkwartiel-range):** Represents the middle 50% of observations, specifically the range between the 25th and 75th percentiles (P25, P75) [3](#page=3).
### 2.3 Normal distribution (normale verdeling)
A normal distribution is characterized by a symmetric spread of variables around the mean, with no outliers, meaning the mean equals the median [4](#page=4).
* Approximately 95% of values in a normal distribution lie within the mean $\pm$ 2 times the standard deviation [4](#page=4).
* To check if a variable is normally distributed, one can verify if 95% of observations fall between the mean and 2 times the standard deviation [4](#page=4).
> **Tip:** A three-step process can help determine if a variable is normally distributed:
> 1. Examine the histogram for symmetry.
> 2. Compare the mean and median (they should be approximately equal).
> 3. Compare the mean and standard deviation (this step is only applicable for continuous variables that can only take positive values) [4](#page=4).
---
# Inferential statistics: hypothesis testing and estimation
Inferential statistics allows us to make generalizations about a population based on data from a sample, quantifying uncertainty through hypothesis testing and confidence intervals [5](#page=5).
### 3.1 Principles of inferential statistics
Inferential statistics aims to answer the question of how generalizable a research result is to the entire target population of patients. This involves making inferences from a sample to the population [5](#page=5).
* **Parameters:** These are characteristics of the entire population, which are generally unknown [5](#page=5).
* **Sample statistics:** These are measurements obtained from a sample, used to estimate population parameters [5](#page=5).
* **Point estimation:** This is a single statistic derived from sample results, serving as an estimate of the population situation [5](#page=5).
To test hypotheses, researchers start with research questions pertaining to a target population [5](#page=5).
* **Null hypothesis ($H_0$):** This hypothesis posits no effect in the target population and is the opposite of what researchers aim to demonstrate. It represents the starting assumption that is tested against the data [5](#page=5).
* **Alternative hypothesis ($H_a$):** This hypothesis is considered if $H_0$ is false. It is what researchers aim to show, and it is accepted only when there is sufficient evidence against $H_0$ [5](#page=5).
Data collected from a sample yield a research result, and inferential statistics addresses how generalizable these results are to the population, accounting for **sampling error**, which is a margin of uncertainty. This uncertainty is quantified through testing and estimation [5](#page=5).
### 3.2 Quantifying uncertainty: hypothesis testing and estimation
#### 3.2.1 Hypothesis testing
Hypothesis testing involves probability calculations to determine the likelihood of obtaining a specific result if the null hypothesis were true. This process helps decide if a difference is statistically significant [5](#page=5).
* **P-value (overschrijdingskans):** This is the probability of observing a test statistic as extreme as, or more extreme than, the one calculated from the sample, assuming the null hypothesis is true. A smaller p-value indicates less compatibility with the null hypothesis [5](#page=5) [6](#page=6) [7](#page=7).
The general steps in hypothesis testing are:
1. Define the null hypothesis ($H_0$) and the alternative hypothesis ($H_a$) [6](#page=6).
2. Collect relevant data from the sample [6](#page=6).
3. Calculate the test statistic and compare it to the null hypothesis. The test statistic quantifies how far the test result deviates from $H_0$; a larger value provides more evidence against $H_0$ [6](#page=6).
4. Compare the test statistic with a known probability distribution to derive a p-value [6](#page=6).
5. Interpret the p-value and the results [6](#page=6).
Results pertaining to the entire population are often represented with Greek letters, such as $\mu$ for the population mean and $\sigma$ for the population standard deviation [6](#page=6).
#### 3.2.2 Probability distributions for continuous variables
Probability distributions are used to interpret test statistics. For continuous variables, these distributions graphically represent the relationship between possible test statistic values and their probabilities [6](#page=6) [7](#page=7).
* The x-axis of the distribution graph shows all possible values of the test statistic, with the null hypothesis typically centered [7](#page=7).
* The p-value is the probability of obtaining the observed result or results even further from the null hypothesis, assuming $H_0$ is true. This corresponds to the area under the curve beyond the calculated test statistic [7](#page=7).
##### 3.2.2.1 Z-distribution (standard normal distribution)
The z-distribution is the standard normal probability distribution used for continuous outcome variables when testing sample means [6](#page=6) [7](#page=7).
* **Test statistic z:** This value indicates the evidence against the null hypothesis; a larger absolute value of z provides more evidence against $H_0$ [7](#page=7).
$$z = \frac{O - E}{\sigma \sqrt{n/}} \quad \text{or} \quad z = \frac{\bar{x} - \mu_0}{\sigma \sqrt{n/}}$$
Where:
* $O$ is the observed value in the sample [7](#page=7).
* $E$ is the expected value under the null hypothesis, often 0 if no effect is anticipated [7](#page=7).
* $\sigma \sqrt{n/}$ represents the uncertainty or standard error of the mean [7](#page=7).
* $\bar{x} - \mu_0$ is the difference between the sample mean and the hypothesized population mean [7](#page=7).
Key characteristics of the standard normal distribution include:
* A mean of 0 and a standard deviation of 1 [7](#page=7).
* An x-axis range from negative infinity to positive infinity [7](#page=7).
* A y-axis representing probability density; the probability of an exact value is 0 [7](#page=7).
* A total area under the curve of 1 or 100% [7](#page=7).
**Significance level ($\alpha$)**: Typically set at 5% (0.05). If the p-value is less than $\alpha$, the null hypothesis can be rejected in favor of the alternative hypothesis, indicating a statistically significant result [7](#page=7).
> **Tip:** A low p-value suggests that the observed research result is unlikely under the null hypothesis, making $H_0$ improbable. However, a non-significant result does not necessarily mean there is no effect; the effect might be too small or the sample size too insufficient to detect it reliably [7](#page=7).
**Errors in Hypothesis Testing**:
* **Type I error (fout van de eerste orde):** Rejecting the null hypothesis when it is actually true. The probability of a type I error is equal to the significance level ($\alpha$), typically 5% [7](#page=7).
* **Type II error (fout van de tweede orde):** Failing to reject the null hypothesis when it is false. The probability of a type II error is denoted by $\beta$. Statistical power, which is $1-\beta$, depends on sample size [8](#page=8).
##### 3.2.2.2 One-sided vs. two-sided testing
* **One-sided testing:** Hypotheses are formulated in a specific direction (e.g., testing for a positive effect) [8](#page=8).
* **Two-sided testing:** Hypotheses are not directional (e.g., testing for any effect, positive or negative). In two-sided testing, the p-value is doubled because the rejection region is split between both tails of the distribution. By default, two-sided testing is generally preferred when there is no prior knowledge to suggest a specific direction of effect [8](#page=8).
#### 3.2.3 Estimation with confidence intervals
Estimation involves quantifying the uncertainty around a sample statistic by calculating a **confidence interval (CI)**. A CI provides a range of plausible values for the population parameter [5](#page=5) [8](#page=8).
* **Confidence level:** For a 5% significance level ($\alpha$), a 95% confidence interval is typically used. This means there is a 95% certainty that the true population value lies within the calculated interval [8](#page=8).
* **95% Confidence Interval Formula:**
$$95\% \text{ CI} = \mu \pm 1.96 \times \left(\frac{\sigma}{\sqrt{n}}\right) \quad \text{or} \quad 95\% \text{ CI} = \bar{x} \pm 1.96 \times \left(\frac{s_d}{\sqrt{n}}\right)$$
Where:
* $\mu$ or $\bar{x}$ is the population mean or sample mean, respectively [8](#page=8).
* $1.96$ is the critical z-value corresponding to a 2-sided 5% p-value. The probability of a standard normal variable falling between -1.96 and 1.96 is 95% [8](#page=8).
* $\sigma/\sqrt{n}$ or $s_d/\sqrt{n}$ is the standard error of the mean, representing the uncertainty [8](#page=8).
* **Point estimate:** This is the observed value from the sample, around which the CI is calculated [8](#page=8).
* Confidence intervals can also be calculated for other confidence levels, such as 90% or 99% [8](#page=8).
> **Comparison of Testing and Estimation:** Hypothesis testing is a qualitative approach that determines if a result is significant or not (an "all or nothing" approach requiring critical interpretation). Estimation is a quantitative approach that provides information about the magnitude of an effect. When a result is not significant, the exact p-value should still be reported. "Borderline significance" is often considered for p-values between 0.05 and 0.10 [8](#page=8).
### 3.3 Sampling error and statistical significance
* **Sampling error:** This is the inherent variability that arises because we are using a sample to represent a population. It leads to uncertainty in our estimates and test results [5](#page=5).
* **Standard error of the mean (SEM):** This quantifies the precision or reliability of the research result. It is influenced by sample size ($n$) and the spread (standard deviation, $sd$) of observations in the sample. A smaller sample size or wider spread leads to greater uncertainty and a larger SEM [6](#page=6).
$$SEM = \frac{sd}{\sqrt{n}}$$
* **Statistical significance:** A result is considered statistically significant if it is unlikely to have occurred by chance alone if the null hypothesis were true. This is typically determined by comparing the p-value to a pre-determined significance level ($\alpha$) [5](#page=5) [7](#page=7).
### 3.4 Central Limit Theorem and T-distribution
#### 3.4.1 Central limit theorem
The Central Limit Theorem (CLT) states that for a sufficiently large sample size, the distribution of sample means will approximate a normal distribution, regardless of the original distribution of the variable in the population. This principle underlies the use of the z-distribution for large samples [8](#page=8).
> **Example:** If you repeatedly draw samples and calculate their means, the distribution of these means will tend towards normality as the sample size increases, even if the original data is skewed [8](#page=8).
#### 3.4.2 T-distribution vs. Z-distribution
The **t-distribution** is a probability distribution similar to the standard normal (z) distribution but is generally wider and its shape depends on the **degrees of freedom (df)**, which are related to the sample size ($df = n-1$) [8](#page=8).
* **Use of t-distribution:** The t-distribution is used when the population standard deviation ($\sigma$) is unknown, which is common in practice. It is particularly useful for small sample sizes [9](#page=9).
* **Relationship to z-distribution:** As the sample size (and thus degrees of freedom) increases, the t-distribution increasingly approximates the z-distribution [9](#page=9).
$$z = \frac{\bar{x} - \mu_0}{\sigma / \sqrt{n}} \quad \text{vs.} \quad t = \frac{\bar{x} - \mu_0}{s_d / \sqrt{n}}$$
Where $s_d$ is the sample standard deviation [9](#page=9).
* **Convention:** It is often recommended to use the t-distribution for hypothesis testing and confidence intervals, regardless of sample size, as it provides a more conservative estimate for smaller samples and becomes virtually identical to the z-distribution for larger samples [9](#page=9).
* **Critical values:** The critical value used in calculations (e.g., 1.96 for a 95% CI with z-distribution) needs to be adjusted for the t-distribution and can be found in t-distribution tables or generated by software. These t-values are typically larger than their z-distribution counterparts because the t-distribution has heavier tails [9](#page=9).
$$95\% \text{ CI} = \bar{x} \pm t_{(1-\alpha/2); (n-1) df} \times \frac{s_d}{\sqrt{n}}$$
### 3.5 Analysis of continuous outcome variables
#### 3.5.1 Comparing two measurements of continuous variables in the same persons
This involves comparing two measurements from the same individuals, known as paired observations within one group. The goal is to quantify the uncertainty in generalizing the findings from the sample to the target population. Sample size and the spread of individual results (captured by the standard error of the mean) are crucial for calculating confidence intervals [9](#page=9).
* **Paired t-test:** This test is used to test the mean difference between repeated measurements in paired data. The "pairing" means individual difference scores are the outcome variables, and they are not independent. Paired t-tests are parametric tests with certain assumptions [9](#page=9).
* **Null Hypothesis ($H_0$):** $\mu_{\Delta} = 0$, meaning there is no difference between the measurements, or the average difference is zero [9](#page=9).
* **Assumption:** The outcome variable (the difference) should be approximately normally distributed. The t-distribution is used to derive the test statistic and 95% confidence interval for the mean difference, and critical values are obtained from t-tables or statistical software [9](#page=9).
---
# Analysis of continuous and dichotomous outcome variables
This section details statistical methods for analyzing continuous and dichotomous outcome variables, covering tests for comparing groups, assessing relationships, and building predictive models.
### 4.1 Comparing two measurements of continuous variables in the same individuals
This involves comparing two measurements from the same individuals, essentially analyzing paired observations within a single group. The primary goal is to quantify the uncertainty in generalizing research findings from the sample to the target population, with sample size and the variability of individual results (reflected in the standard error of the mean) being crucial for calculating confidence intervals [9](#page=9).
#### 4.1.1 Paired t-test
The paired t-test is used to test the mean difference between repeated measurements. In this test, individual difference scores serve as the outcome variables, and these are not independent of each other. As a parametric test, it has specific assumptions, including the normality of the differences. The test utilizes a t-distribution for the test statistic and to construct a 95% confidence interval, seeking an alternative to the 1.96 value from the standard normal distribution [9](#page=9).
* **Null Hypothesis ($H_0$):** $\mu_{\Delta} = 0$, indicating no difference between the measurements or an average difference of zero [9](#page=9).
* **Test Statistic Formula:**
$$t = \frac{\bar{x} - \mu_0}{s_d / \sqrt{n}}$$
where $\bar{x}$ is the mean difference, $\mu_0$ is the hypothesized mean difference $s_d$ is the standard deviation of the differences, and $n$ is the sample size [10](#page=10).
* **Interpretation:** The t-value indicates how the sample mean difference ($\bar{x}$) compares to the null hypothesis ($\mu_0$), considering the uncertainty ($s_d / \sqrt{n}$). The p-value associated with the t-value determines statistical significance (e.g., $p < 0.001$ indicates a highly significant difference). The p-value is typically two-tailed [10](#page=10).
* **Degrees of Freedom ($df$):** $n - 1$ [10](#page=10).
* **Confidence Interval Estimation:**
$$95\% BI = \bar{x} \pm t_{(1-\alpha/2);(n-1) df} \times \frac{s_d}{\sqrt{n}}$$
This interval estimates the population mean difference with 95% confidence [10](#page=10).
* **Tip:** Testing addresses whether a difference is statistically significant, while estimation quantifies the magnitude of the difference [10](#page=10).
#### 4.1.2 One-sample t-test
This test compares the mean of a single group to a known or theoretical standard value ($\mu_0$). It is also a parametric test with the assumption of normality of the data [10](#page=10).
* **Null Hypothesis ($H_0$):** $\mu = \mu_0$ (the group mean equals the standard value) or $\mu_{\Delta} = 0$ (the mean difference equals zero) [10](#page=10).
* **Test and Estimation:** Similar formulas and interpretations as the paired t-test are used, but $\mu_0$ represents the standard value. A significant difference is indicated if 0 is not within the 95% confidence interval of the mean difference [10](#page=10) [11](#page=11).
### 4.2 Continuous variables – two independent groups
This involves comparing the means of two independent groups [11](#page=11).
#### 4.2.1 Independent samples t-test
This test assesses the difference between the means of two independent groups, subject to specific assumptions [11](#page=11).
* **Null Hypothesis ($H_0$):** $\mu_1 = \mu_2$ or $\mu_1 - \mu_2 = 0$ (the means of the two groups are equal) [11](#page=11).
* **Assumptions:** Both outcome variables must be normally distributed, and homoscedasticity (equal variances) is required. Homoscedasticity can be checked using Levene's test or an F-test [11](#page=11).
* **Test Statistic Formula (assuming equal variances):**
$$t = \frac{\bar{x}_1 - \bar{x}_2 - 0}{s_P \times \sqrt{1/n_1 + 1/n_2}}$$
where $s_P$ is the pooled standard deviation calculated as:
$$s_P = \sqrt{\frac{(n_1-1)s_1^2 + (n_2-1)s_2^2}{(n_1-1 + n_2-1)}}$$
$s_1$ and $s_2$ are the standard deviations of group 1 and group 2, respectively [11](#page=11).
* **Degrees of Freedom ($df$):** $(n_1 + n_2) - 2$ [11](#page=11).
* **Confidence Interval Estimation:**
$$95\% BI = (\bar{x}_1 - \bar{x}_2) \pm t_{(1-\alpha/2);(n_1+n_2-2) df} \times s_P \times \sqrt{\frac{1}{n_1} + \frac{1}{n_2}}$$
A significant difference is indicated if 0 is not within the interval [12](#page=12).
* **Levene's test for equality of variances:** This test checks for equal variances between groups. A non-significant result (p > 0.05) indicates equal variances, leading to the use of the upper output in the t-test for equality of means. A significant result requires the use of Welch's adjustment, often automatically performed by statistical software [12](#page=12).
#### 4.2.2 ANOVA (Analysis of Variance)
ANOVA is used to compare means when there are three or more independent groups [12](#page=12).
* **Null Hypothesis ($H_0$):** $\mu_1 = \mu_2 = \mu_3 = \dots$ (all group means are equal in the population) [12](#page=12).
* **Test Statistic:** The F-test, following an F-distribution, is used. The F-statistic is the ratio of between-group variance to within-group variance:
$$F = \frac{\text{between-group variance}}{\text{within-group variance}}$$
* **Assumptions:** All groups must be normally distributed, and homoscedasticity is required [12](#page=12).
* **Interpretation:** A larger F-value indicates greater evidence against the null hypothesis [12](#page=12).
* **Post-hoc tests:** If ANOVA indicates a significant difference, post-hoc tests (like pairwise t-tests) are conducted to identify which specific groups differ. These require corrections for multiple testing to control the Type I error rate, leading to higher p-values compared to uncorrected tests [13](#page=13).
### 4.3 Comparing a skewed continuous variable
When dealing with skewed continuous variables, two approaches are common: transformation or non-parametric tests.
#### 4.3.1 Transformation
For right-skewed data, a natural logarithm transformation can normalize the distribution. Tests (t-tests or ANOVA) are then performed on the transformed data, and the results are back-transformed to the original scale for interpretation. The geometric mean can be calculated from the transformed data [13](#page=13):
$$\text{geometric mean} = e^{\text{mean}(\ln(x_i))}$$
#### 4.3.2 Non-parametric tests
These tests are based on ranks and are less powerful than parametric tests but do not require normality assumptions. They typically provide p-values but not effect size estimates [13](#page=13).
* **Mann-Whitney U test:** Compares two independent groups using ranks [13](#page=13).
* **Null Hypothesis ($H_0$):** $Ranksom_1 = Ranksom_2$ (the distributions are the same) [13](#page=13).
* **Wilcoxon signed-rank test:** Compares paired observations within one group using ranks. It tests if the median difference is zero, implying an equal number of positive and negative changes [13](#page=13).
* **Null Hypothesis ($H_0$):** $Sum \, ranks > 0 = Sum \, ranks < 0$ (median difference = 0) [13](#page=13).
* **Sign test:** Compares one group to a standard value, assessing if the median is equal to a standard value [13](#page=13).
* **Kruskal-Wallis test:** Compares three or more independent groups using ranks [13](#page=13).
* **Null Hypothesis ($H_0$):** $Ranksom_1 = Ranksom_2 = Ranksom_3 = \dots$ (the distributions are the same) [14](#page=14).
#### 4.3.3 Correlation
Correlation quantifies the linear association between two continuous variables [14](#page=14).
* **Pearson Correlation Coefficient ($r$):** Measures the linear association between two normally distributed continuous variables without outliers [14](#page=14).
$$r = \frac{\sum_{i=1}^{n} (x_i - \bar{x})(y_i - \bar{y})}{\sqrt{\sum_{i=1}^{n} (x_i - \bar{x})^2 \sum_{i=1}^{n} (y_i - \bar{y})^2}}$$
where $n$ is the sample size, $x_i$ and $y_i$ are individual values, and $\bar{x}$ and $\bar{y}$ are the means. The coefficient ranges from -1 to 1, with values closer to 1 or -1 indicating a stronger linear relationship [14](#page=14).
* **Coefficient of Determination ($r^2$):** Represents the proportion of variance in one variable that is explained by the linear relationship with the other variable. Adjusted $r^2$ accounts for potential overestimation in larger models [14](#page=14).
* **Assumptions:** Two continuous variables, approximate normality, absence of outliers, and a linear relationship [14](#page=14).
* **Spearman's rank correlation coefficient ($\rho$):** A non-parametric alternative to Pearson's r, used for ordinal or skewed continuous variables, or when the relationship is non-linear. It calculates the correlation between the ranks of the variables [15](#page=15).
#### 4.3.4 Linear regression
Linear regression models the relationship between a dependent continuous outcome variable (Y) and one or more independent predictor variables (X) [15](#page=15).
* **Simple Linear Regression:**
$$Y = b_0 + b_1X$$
where $b_0$ is the intercept (expected value of Y when X=0) and $b_1$ is the slope or regression coefficient (expected change in Y for a one-unit increase in X) [15](#page=15).
* **Calculation of Coefficients:**
$$b_1 = \frac{\sum(x_i - \bar{x})(y_i - \bar{y})}{\sum(x_i - \bar{x})^2}$$
$$b_0 = \bar{y} - b_1\bar{x}$$
* **Null Hypothesis ($H_0$):** $\beta_1 = 0$ (there is no linear relationship between X and Y in the population) [16](#page=16).
* **Standardized Regression Coefficient (Beta, $\beta$):** Expresses X and Y in standard deviation units, allowing for comparison of predictor strengths in multiple regression. It is equivalent to the Pearson correlation coefficient when there is only one predictor [16](#page=16).
* **Coefficient of Determination ($R^2$):** Similar to $r^2$, it indicates the proportion of variance in Y explained by the model. Adjusted $R^2$ is preferred for multiple regression to avoid overestimation [16](#page=16).
* **Dummy Variables:** Used for categorical or dichotomous predictors. For a dichotomous predictor (e.g., male=1, female=0), $b_0$ represents the expected Y for the reference group (female), and $b_1$ represents the difference in expected Y between the groups. For categorical predictors with $k$ categories, $k-1$ dummy variables are created [16](#page=16).
* **Assumptions:** Independent observations, linear relationship between predictors and outcome (checked visually or by categorizing predictors), normality of residuals, and homoscedasticity of residuals [17](#page=17).
* **Multiple Linear Regression:** Extends simple linear regression to include multiple predictors ($X_1, X_2, \dots, X_k$).
$$y = b_0 + b_1x_1 + b_2x_2 + \dots + b_k x_k$$
This model assesses the independent effect of each predictor while holding others constant (covariates). It shares the same assumptions as simple linear regression [17](#page=17).
#### 4.3.5 Association Models vs. Prediction Models
* **Association Models:** Aim to clarify the relationship with a central determinant by accounting for confounding and effect modification [17](#page=17).
* **Confounding:** A variable that distorts the relationship between the predictor and outcome. A change of approximately 10% in the regression coefficient after adjusting for a potential confounder suggests confounding [18](#page=18).
* **Effect Modification (Interaction):** The effect of the predictor on the outcome differs across levels of another variable. Tested by including an interaction term ($X*C$) in the model. A p-value < 0.10 for the interaction term suggests effect modification. If interaction is present, the main effect needs to be stratified [18](#page=18).
* **Prediction Models:** Aim to predict the outcome variable as accurately as possible using a set of predictors. Procedures like backward or forward selection can be used to build the model [18](#page=18).
### 5. Analysis of dichotomous outcome variables
For dichotomous outcome variables, the distinction between parametric and non-parametric tests is less relevant as these variables are inherently non-parametric [18](#page=18).
#### 5.1 Comparing one group
#### 5.1.1 Comparing two measurements within one group
This involves comparing paired observations, often in cross-over trials with short-term interventions [19](#page=19).
* **McNemar's test:** Tests for a difference in proportions between paired measurements [19](#page=19).
* **Null Hypothesis ($H_0$):** $\pi_{\Delta} = 0$ (proportion of difference is zero) [19](#page=19).
#### 5.1.2 Comparing a measurement with a standard value
* **Z-test for proportion:** Tests the difference between a sample proportion and a standard population proportion [19](#page=19).
* **Null Hypothesis ($H_0$):** $\Pi = \Pi_0$ or $\Pi_{\Delta} = 0$ [19](#page=19).
* **Test Statistic Formula:**
$$z = \frac{p - \pi_0}{SE(p)_{H_0}}$$
where $p$ is the sample proportion, $\pi_0$ is the standard proportion, and $SE(p)_{H_0} = \sqrt{\frac{\pi_0(1-\pi_0)}{n}}$ [19](#page=19).
* **Assumptions:** For the z-distribution, $np$ and $n(1-p)$ should both be greater than 5 [19](#page=19).
* **Confidence Interval Estimation:**
$$95\% BI = P \pm z_{(1 - \alpha/2)} \times SE(p)$$
where $SE(p) = \sqrt{\frac{p(1-P)}{n}}$. The 95% confidence interval for the proportion is calculated, and the standard value ($\pi_0$) is checked for inclusion [19](#page=19).
#### 5.2 Comparing two groups
This involves comparing proportions between two independent groups and assessing the association between two dichotomous variables, typically presented in a 2x2 contingency table [20](#page=20).
* **Chi-squared test ($\chi^2$):** Tests for an association between two dichotomous variables based on observed (O) and expected (E) counts in each cell of the contingency table [20](#page=20).
$$\chi^2 = \sum \frac{(O-E)^2}{E}$$
where $E = (\text{Row Total}) \times (\text{Column Total}) / \text{Grand Total}$ [20](#page=20).
* **Degrees of Freedom ($df$):** $(a-1) \times (b-1)$, where $a$ and $b$ are the number of rows and columns, respectively [20](#page=20).
* **Assumptions:** The expected count (E) in at least 80% of cells should be greater than 5, and all E should be greater than 1. Larger sample sizes improve the approximation [20](#page=20).
* **Fisher's exact test:** Calculates the exact p-value, serving as an alternative to the chi-squared test, especially for small sample sizes [20](#page=20).
* **Continuity correction:** Another alternative for 2x2 tables, improving the approximation of the chi-squared distribution [20](#page=20).
#### 5.2.1 95% Confidence Interval for Risk Difference and Relative Risk
While the chi-squared test indicates overall association, effect measures and their confidence intervals quantify the magnitude of the effect [20](#page=20).
* **Confidence Interval for Difference in Proportions:**
$$95\% BI = P_1 - P_2 \pm z_{(1-\alpha/2)} \times SE(P_1 - P_2)$$
where $SE(P_1 - P_2) = \sqrt{\frac{P_1(1-P_1)}{n_1} + \frac{P_2(1-P_2)}{n_2}}$. Assumptions include $np$ and $n(1-p)$ being greater than 5 in both groups [21](#page=21).
#### 5.3 Comparing more than two groups
This involves comparing proportions across three or more independent groups, typically using an RxK contingency table [21](#page=21).
* **Chi-squared test:** Used to test the overall association between categorical variables in RxK tables. The formula and assumptions are similar to the 2x2 case, but Fisher's exact test and continuity corrections are not applicable for RxK tables [21](#page=21).
* **Trend test:** A linear-by-linear association test can be performed for ordinal variables, with 1 degree of freedom [21](#page=21).
* **Post-hoc analysis:** For an overall significant chi-squared result, categories can be regrouped, split into multiple 2x2 tables, or a logistic regression model can be employed [21](#page=21).
#### 5.4 Odds Ratio as an effect measure in 2x2 tables
The odds ratio (OR) is an effect measure, particularly for case-control or retrospective studies, representing the relative odds of an outcome [21](#page=21).
* **Odds:**
$$Odds = \frac{P(y=1)}{1 - P(y=1)}$$
where $P(y=1)$ is the probability of the outcome [21](#page=21).
* **Odds Ratio Formula:**
$$OR = \frac{a \times d}{b \times c} = \frac{(a/c)}{(b/d)}$$
where $a, b, c, d$ are the cell counts in a 2x2 table. The OR is also commonly used in prospective studies and logistic regression but can overestimate the relative risk [22](#page=22).
#### 5.5 Analyzing relationships with a dichotomous outcome variable and diverse other variables: Logistic Regression Analysis
Logistic regression is used when the outcome variable is dichotomous, transforming it to allow for a linear regression-like analysis [22](#page=22).
* **Logistic Regression with a Dichotomous Determinant:** The natural logarithm of the odds is used:
$$\ln\left(\frac{P(y_{dichotomous})}{1 - P(y_{dichotomous})}\right) = b_0 + b_1x_1 + \dots$$
* **Interpretation of Regression Coefficient:** $\text{EXP}(b_1)$ represents the odds ratio for the predictor. $\text{EXP}(\beta_1) = \frac{odds(y=1, \text{exposed})}{odds(y=1, \text{unexposed})}$. A null hypothesis of $\text{EXP}(\beta_1)=1$ suggests no difference in odds between groups [22](#page=22).
* **Maximum Likelihood:** A method for estimating regression coefficients, aiming to maximize the probability of observing the data given the model parameters. The -2 Log Likelihood statistic is used for model comparison; a lower value indicates a better fit. The Likelihood Ratio Test compares nested models based on the difference in -2 Log Likelihood values, following a chi-squared distribution with degrees of freedom equal to the difference in the number of parameters [22](#page=22) [23](#page=23).
* **Logistic Regression with a Categorical Determinant:** Categorical predictors with more than two categories are typically treated as dummy variables [23](#page=23).
* **Logistic Regression with a Continuous Determinant:** The odds ratio for a one-unit increase in the continuous variable can be exponentiated and adjusted for clinical relevance. The 95% confidence interval for the adjusted OR is calculated by multiplying the standard error by $x$ [23](#page=23).
#### 5.6 Checking for linearity in logistic regression
The assumption of linearity means the odds ratio is constant regardless of the predictor's value. This can be tested by categorizing the continuous predictor and performing a logistic regression with the categorical variable. If a linear trend is observed in the regression coefficients of the categories, the continuous variable can be maintained; otherwise, the categorical analysis is retained [23](#page=23).
#### 5.7 Confounding and Effect Modification in Logistic Regression
These concepts are investigated using stratified analyses or by incorporating interaction terms and covariates into the logistic regression model, similar to linear regression [23](#page=23) [24](#page=24) [25](#page=25).
### 6. Analysis of survival data
Survival analysis focuses on the time until an event occurs, not just whether it occurs. It is typically studied prospectively [23](#page=23).
* **Kaplan-Meier survival curve:** A graphical representation of survival over time, calculating cumulative survival probabilities at different time points [23](#page=23).
* **Log-rank test:** Compares two or more survival curves by comparing observed and expected cases at each time point. The null hypothesis is that the curves overlap. It follows a chi-squared distribution with $df =$ (number of groups - 1) and only provides a p-value [24](#page=24).
* **Cox Regression Analysis:** Relates survival data to determinants. It transforms the outcome to allow for a linear regression-like analysis using the natural logarithm of the hazard [24](#page=24).
$$\ln(\text{hazard}(y)) = \ln[h_{t0}] + b_1x_1 + b_2x_2 + \dots$$
* **Dichotomous Determinant:** $\text{EXP}(B)$ represents the hazard ratio (HR). $H_0: \text{EXP}(B_1) = 1$ indicates no difference in hazard [24](#page=24).
* **Categorical Determinant:** Dummy coding is used to compare groups against a reference category [24](#page=24).
* **Continuous Determinant:** The hazard ratio for a one-unit increase is calculated. Linearity assumption can be tested by categorizing the continuous predictor [24](#page=24).
* **Proportional Hazards Assumption:** The hazard ratio must be constant over time, checked using Kaplan-Meier curves [24](#page=24).
* **Model Comparison:** The -2 Log Likelihood method is used to compare models, with the difference in values following a chi-squared distribution [24](#page=24).
### 7. Multiple regression analysis
This section reiterates multiple regression, focusing on predicting a dichotomous outcome variable using a set of covariates. The principles of association and prediction models, including confounding and effect modification, are applied, similar to linear regression models. The Hosmer-Lemeshow test is a goodness-of-fit test for logistic regression models, aiming for a non-significant p-value to indicate a good fit [25](#page=25).
---
# Advanced statistical concepts and reliability
This section explores advanced statistical techniques for analyzing relationships between variables, modeling outcomes, and assessing the trustworthiness of measurement tools.
### 5.1 Regression models
Regression analysis models the relationship between a dependent variable and one or more independent variables [17](#page=17).
#### 5.1.1 Assumptions of linear regression
For linear regression, several assumptions must be met for the results to be valid [17](#page=17):
* **Independent observations:** Data points should not be paired or clustered (e.g., within a school or class) [17](#page=17).
* **Linear relationship:** For continuous predictors, there must be a linear association with the outcome variable. This can be visually assessed with a scatterplot or by categorizing the predictor and examining trends [17](#page=17).
* **Normality of residuals:** The errors in the model should be normally distributed, which is often true if the outcome variable itself is normally distributed. A histogram of residuals can check this, and log transformations might be used for skewed distributions [17](#page=17).
* **Homoscedasticity:** The variance of the residuals should be constant across all predicted values of the outcome variable. An extra plot (residual plot) can help assess this [17](#page=17).
#### 5.1.2 Simple linear regression with continuous predictors
When analyzing continuous variables, the presence of a linear relationship between the outcome and the predictor is examined. If no linear relationship is found, the continuous variable may be categorized (e.g., into quartiles) and analyzed using dummy variables within a linear regression model [17](#page=17).
#### 5.1.3 Multiple linear regression
Multiple linear regression examines the relationship between multiple independent (predictor) variables ($X$) and a continuous outcome variable ($Y$). The model is represented as:
$$y = b_0 + b_1x_1 + b_2x_2 + \dots + b_k x_k$$
where $X$ values are covariates. This analysis assesses how covariates relate to the outcome variable in combination. It shares the same assumptions as simple linear regression. When measuring independent effects, confounders are held constant [17](#page=17).
#### 5.1.4 Prediction models
Prediction models aim to forecast the dependent variable as accurately as possible using a set of potential determinants [17](#page=17).
* **Backward selection:** This method begins with a comprehensive model including all potential determinants. Variables contributing least (highest p-value, e.g., >0.10) are iteratively removed [18](#page=18).
* **Forward selection:** This approach starts by identifying the single best predictor (lowest p-value) and then sequentially adds other predictors until no new variables improve the model significantly, typically using a p-value threshold (e.g., 0.10) [18](#page=18).
The quality of a prediction model is evaluated by how well it predicts the outcome variable, often indicated by the proportion of variance explained [18](#page=18).
### 5.2 Association models
Association models aim to isolate and clarify the relationship with a central determinant by accounting for other variables [17](#page=17).
#### 5.2.1 Confounding
Confounding occurs when an observed relationship between a predictor ($X$) and an outcome ($Y$) is partially or fully explained by a third variable that is associated with both $X$ and $Y$. It is investigated by comparing regression coefficients before and after adjusting for or including the potential confounding variable in the model. A change of approximately 10% in the regression coefficient is often considered indicative of confounding [17](#page=17) [18](#page=18).
#### 5.2.2 Effect modification (Interaction)
Effect modification, or interaction, means the effect of a predictor ($X$) on an outcome ($Y$) differs across levels of another variable (the effect modifier). This is tested by including an interaction term ($X \ast C$) in the model alongside the main effects. A p-value for the interaction term below 0.10 is often used as a cut-off to detect potential interaction. If interaction is present, the main effect should be analyzed in stratified subgroups. Continuous variables involved in effect modification should be dichotomized (e.g., by median) [18](#page=18).
#### 5.2.3 Building association models
To refine the estimation of the relationship between a continuous variable and a central determinant, association models adjust for confounders and examine effect modification. These models start with the crude (unadjusted) relationship. When stratifying, sample size considerations for subgroups are important. Confounders can be tested individually, simultaneously, or through stepwise selection. When interpreting effect estimates, other determinants in the model are kept constant [18](#page=18).
### 5.3 Logistic regression analysis
Logistic regression is used for dichotomous outcome variables. The relationship between predictors and the log odds of the outcome is modeled [23](#page=23):
$$\ln\left(\frac{P(y_{\text{dichotomous}})}{1 - P(y_{\text{dichotomous}})}\right) = b_0 + b_1x_1 + \dots + b_k x_k$$
This is a statistical model with covariates and partial regression coefficients, allowing for the testing of independent effects while holding confounders constant [25](#page=25).
#### 5.3.1 Categorical determinants in logistic regression
Categorical variables with more than two categories should be analyzed as dummy variables. A significant chi-square indicates a difference between categories, while linearity should be assessed using dummy variables if the chi-square is not significant [23](#page=23).
#### 5.3.2 Continuous determinants in logistic regression
The odds ratio (OR) for a one-unit increase in a continuous variable can be converted to an OR for a specific number of units ($x$) for better interpretation:
$$OR_{x \text{ units}} = \exp[x \times b_1 \text{ unit}]$$
The 95% confidence interval for this adjusted OR is calculated by multiplying the standard error by $x$ or by using the formula $OR_{x \text{ units}} = (OR_{1 \text{ unit}})^x$ [23](#page=23).
#### 5.3.3 Checking for linearity in logistic regression
The assumption of a linear relationship for continuous predictors in logistic regression is checked by categorizing the predictor (e.g., into tertiles or quartiles) and performing a logistic regression with the categorical variable. If linearity is not observed, the analysis with the categorical variable is retained; otherwise, the continuous variable can be used. Trends in regression coefficients of the categorized variable can indicate linearity [23](#page=23).
#### 5.3.4 Confounding and effect modification in logistic regression
Confounding and effect modification are investigated using stratified analyses or logistic regression models. Confounding is identified by comparing regression coefficients before and after adjustment (a change of about 10% is a common cut-off). Effect modification is tested via an interaction term, with a p-value <0.10 often used to detect it. If interaction is present, stratification is necessary. Stepwise adjustment methods can be used to build models, deciding sequentially whether to include potential confounders. Model comparison using the -2 Log Likelihood statistic, following a chi-square distribution with degrees of freedom equal to the difference in parameters, can determine if removing a variable leads to a significant change [23](#page=23) [25](#page=25).
#### 5.3.5 Prediction models in logistic regression
The quality of a multiple logistic regression model is assessed similarly to the proportion of explained variance in linear regression. The Hosmer-Lemeshow test is a goodness-of-fit test that compares predicted versus observed outcomes. A non-significant p-value (indicating the model fits well) is desired [25](#page=25).
### 5.4 Survival analysis
Survival analysis focuses on the time until an event occurs, rather than just whether it occurs. It is typically investigated through prospective cohort studies. Examples include studies of mortality, morbidity, or recovery [23](#page=23).
#### 5.4.1 Kaplan-Meier survival curve
The Kaplan-Meier curve graphically represents survival over time, calculating the probability of survival at each follow-up interval, conditional on surviving up to that point [23](#page=23).
#### 5.4.2 Log-rank test
The log-rank test compares survival curves between two or more groups by comparing observed cases at each time point with expected cases under the null hypothesis of no difference between curves. It follows a chi-square distribution with df = (number of groups – 1). This test provides a p-value but no effect measure [24](#page=24).
#### 5.4.3 Cox regression analysis
Cox regression models the relationship between survival time and predictors. The natural logarithm of the hazard is transformed to allow for a linear regression-like analysis:
$$\ln(\text{hazard}(y)) = \ln[h_{t0}] + b_1x_1 + b_2x_2 + \dots$$
where $y$ is the dichotomous outcome, $\ln[h_{t0}]$ is the baseline hazard, and $b_1$, $b_2$ are regression coefficients for independent variables $x_1$, $x_2$ [24](#page=24).
* **Dichotomous determinant:** The hazard ratio (HR) is calculated as $EXP(B_1)$. An HR > 1 indicates an increased risk. The intercept is not reported as it's a time-dependent function [24](#page=24).
* **Categorical determinant:** Dummy coding is used to compare hazard ratios for different categories against a reference group [24](#page=24).
* **Continuous determinant:** The hazard ratio represents the risk for a one-unit increase in the determinant. Linearity is assumed and can be tested by categorizing the variable and observing the trend in EXP(B) values. Confounding is identified by a 10% change in the regression coefficient after adding a confounder. Effect modification is assessed by examining if variables like sex alter the effect, requiring stratification [24](#page=24).
A critical assumption in Cox regression is **proportional hazards**, meaning the hazard ratio remains constant over time, which can be checked by plotting survival curves [24](#page=24).
### 5.5 Sample size calculations
Sample size calculations estimate the number of participants needed to detect a specific expected effect with a certain statistical power [25](#page=25).
* **Alpha ($\alpha$):** The significance level, typically set at 0.05 for rejecting the null hypothesis [25](#page=25).
* **Statistical power ($1-\beta$):** The probability of correctly rejecting a false null hypothesis, with a minimum of 80% generally recommended [25](#page=25).
* **Effect size and dispersion:** The magnitude of the expected effect and its variability, often estimated from literature or pilot studies. Sample size is calculated using formulas, software, or online tools [25](#page=25).
### 5.6 Advanced considerations in statistical analysis
#### 5.6.1 Assessing normality of continuous variables
Normality of continuous variables can be assessed visually through histograms and QQ-plots, by comparing the mean and median, or by comparing the mean and standard deviation. Formal indicators include skewness and kurtosis (values between -1 and 1 suggest approximate normality). Hypothesis tests like the Kolmogorov-Smirnov and Shapiro-Wilks tests can detect deviations from normality, though they are sensitive to sample size [26](#page=26).
#### 5.6.2 Multicollinearity
Multicollinearity is a problem in regression models where predictor variables are highly correlated, making it difficult to assess their independent effects. It can be checked using Pearson correlations between continuous variables (a cut-off of 0.60 is often used) or chi-square tests for categorical variables. If multicollinearity exists, one of the correlated variables must be removed [26](#page=26).
#### 5.6.3 Other statistical techniques
* **Chi-square test:** Used with $r \times k$ contingency tables to test the overall association between categorical variables [26](#page=26).
* **Two-way ANOVA:** Compares a continuous outcome variable across two categorical variables [26](#page=26).
* **Repeated measures:** Analyzes repeated measurements of a continuous outcome, often using paired t-tests for more than two measurements [26](#page=26).
* **Multilevel analysis:** Used for analyzing clustered data [26](#page=26).
### 5.7 Reliability of measurement instruments
Reliability assesses the consistency and dependability of a measurement tool [26](#page=26).
#### 5.7.1 Kappa statistic
Kappa ($\kappa$) measures agreement for categorical variables. It is calculated as:
$$ \kappa = \frac{\bar{p} - \hat{p}}{1 - \hat{p}} $$
where $\bar{p}$ is the observed proportion of agreement and $\hat{p}$ is the expected proportion of agreement by chance. A kappa value between 0.4 and 0.7 is considered acceptable, while values >0.7 or >0.75 indicate good agreement [26](#page=26) [27](#page=27).
#### 5.7.2 Agreement for continuous variables
The Pearson correlation coefficient or the Intraclass Correlation Coefficient (ICC) measures agreement between continuous variables. The ICC is better suited for assessing test-retest or inter-rater reliability as it accounts for systematic deviations, which Pearson correlation may miss. For instance, a consistent difference of 2 units across all measurements would not be detected by Pearson correlation but would be by ICC [27](#page=27).
#### 5.7.3 Types of reliability
* **Validity:** The agreement with a criterion measure [26](#page=26).
* **Test-retest reliability:** Consistency of results when a measurement is repeated over time [26](#page=26).
* **Inter-rater or intra-rater reliability:** Consistency of measurements made by different raters or the same rater on different occasions [26](#page=26).
---
## Common mistakes to avoid
- Review all topics thoroughly before exams
- Pay attention to formulas and key definitions
- Practice with examples provided in each section
- Don't memorize without understanding the underlying concepts
Glossary
| Term | Definition |
|------|------------|
| Observational research | A type of study where researchers observe subjects and measure variables of interest without assigning treatments or interventions. The relationships between variables are then studied. |
| Experimental research | A type of study where researchers manipulate one or more variables (interventions) and measure their effect on an outcome variable, while controlling for other factors. Participants are actively influenced. |
| Case-control study | A retrospective observational study where individuals with a specific outcome or disease (cases) are compared to individuals without the outcome (controls) to identify potential causes or risk factors. |
| Cohort study | An observational study where a group of individuals (a cohort) is followed over time to observe the incidence of outcomes or diseases, often comparing outcomes between exposed and unexposed groups. |
| Cross-sectional study | An observational study where data are collected at a single point in time from a population or sample, providing a snapshot of prevalence and associations. |
| Outcome variable (dependent variable) | The variable that is measured or observed and is hypothesized to be affected by the independent variable(s). It is the primary focus of the research question. |
| Independent variable (predictor, determinant) | A variable that is manipulated or observed to assess its effect on the outcome variable. It is used to explain or predict changes in the dependent variable. |
| Categorical variable | A variable that can take on a limited, and usually fixed, number of possible values, typically representing distinct categories or groups. |
| Nominal variable | A type of categorical variable where the categories have no intrinsic order or ranking. Examples include blood type or gender. |
| Ordinal variable | A type of categorical variable where the categories have a natural order or ranking, but the intervals between categories are not necessarily equal or quantifiable. |
| Dichotomous variable | A categorical variable with only two possible values, often coded as 0 and 1 (e.g., yes/no, present/absent). |
| Dummy coding | A method of representing a categorical variable with k categories as k-1 binary (0 or 1) dummy variables, allowing their inclusion in regression models. |
| Numerical variable (quantitative variable) | A variable that can be measured and expressed as a number, allowing for arithmetic operations. |
| Discrete variable | A numerical variable that can only take on a finite number of values, typically whole numbers, often resulting from counting (e.g., number of doctor visits). |
| Continuous variable | A numerical variable that can theoretically take on any value within a given range, often resulting from measurement (e.g., weight, height). |
| Interval scale | A scale of measurement where the intervals between values are equal and meaningful, but there is no true zero point (e.g., Celsius temperature). |
| Ratio scale | A scale of measurement where the intervals between values are equal and meaningful, and there is a true, absolute zero point (e.g., height, weight, age). |
| Descriptive statistics | Methods used to summarize and describe the main features of a dataset, including measures of central tendency, dispersion, and frequency distributions, often presented graphically or numerically. |
| Inferential statistics | Methods used to draw conclusions and make generalizations about a population based on sample data, including hypothesis testing and estimation of population parameters. |
| Frequency table | A table that displays the frequency (count) and often the proportion or percentage of observations falling into each category or value of a variable. |
| Bar chart (bar graph) | A graphical representation of categorical data where rectangular bars of equal width represent each category, and the height of the bar is proportional to the frequency or proportion of data in that category. |
| Histogram | A graphical representation of the distribution of numerical data, where bars represent the frequency of data within specified intervals or bins. It is used to visualize the shape of a distribution. |
| Scatterplot | A graphical representation used to display the relationship between two numerical variables. Each point on the plot represents an observation with its values for both variables. |
| Box-and-whisker plot (box plot) | A graphical method for displaying the distribution of numerical data through quartiles. It shows the median, interquartile range, and potential outliers. |
| Mean ($ \bar{x} $) | The arithmetic average of a set of numbers, calculated by summing all values and dividing by the number of values ($ \bar{x} = \frac{\sum x_i}{n} $). |
| Median | The middle value in a dataset when the data are ordered from least to greatest. If there is an even number of observations, it is the average of the two middle values. |
| Mode | The value that appears most frequently in a dataset. It can be used for both numerical and categorical data. |
| Variance ($ s^2 $) | A measure of the dispersion or spread of a dataset, calculated as the average of the squared differences from the mean ($ s^2 = \frac{\sum (x_i - \bar{x})^2}{n-1} $). |
| Standard deviation ($ sd $ or $ s $) | A measure of the dispersion or spread of a dataset around the mean, calculated as the square root of the variance ($ sd = \sqrt{s^2} $). |
| Range | The difference between the maximum and minimum values in a dataset. |
| Interquartile range (IQR) | The difference between the third quartile (75th percentile) and the first quartile (25th percentile) of a dataset, representing the spread of the middle 50% of the data. |
| Normal distribution | A symmetrical, bell-shaped probability distribution characterized by its mean and standard deviation. Many statistical methods assume data follows a normal distribution. |
| Skewed distribution | A distribution where the data are not symmetrical around the mean. A right-skewed distribution has a long tail to the right (mean > median), and a left-skewed distribution has a long tail to the left (mean < median). |
| Null hypothesis ($ H_0 $) | A statement that there is no significant difference or relationship between variables or groups in a population. It is the hypothesis that researchers aim to disprove. |
| Alternative hypothesis ($ H_a $) | A statement that contradicts the null hypothesis, suggesting that there is a significant difference or relationship between variables or groups. |
| P-value (probability value) | The probability of obtaining observed results (or more extreme results) if the null hypothesis were true. A low p-value (typically < 0.05) indicates evidence against the null hypothesis. |
| Statistical significance | A result that is unlikely to have occurred by random chance alone, typically determined by a p-value falling below a predetermined significance level (alpha). |
| Confidence interval (CI) | A range of values that is likely to contain the true population parameter with a certain degree of confidence (e.g., 95% confidence interval). |
| Standard error of the mean (SEM) | A measure of the variability of sample means around the population mean. It quantifies the precision of the sample mean as an estimate of the population mean ($ SEM = \frac{sd}{\sqrt{n}} $). |
| Test statistic | A value calculated from sample data that measures how far the sample result deviates from the null hypothesis. It is used to determine the p-value. |
| Probability distribution | A mathematical function that describes the probabilities of different possible outcomes for a random variable (e.g., normal distribution, t-distribution, chi-squared distribution). |
| Z-distribution (standard normal distribution) | A normal distribution with a mean of 0 and a standard deviation of 1, used for hypothesis testing and confidence intervals when the population standard deviation is known or the sample size is large. |
| T-distribution | A probability distribution similar to the normal distribution but with heavier tails, used for hypothesis testing and confidence intervals when the population standard deviation is unknown and the sample size is small. It depends on degrees of freedom. |
| Degrees of freedom (df) | A parameter in statistical distributions that reflects the number of independent pieces of information available to estimate a parameter. It often relates to sample size. |
| Paired t-test | A statistical test used to compare the means of two related groups or measurements from the same individuals (e.g., before and after an intervention). |
| One-sample t-test | A statistical test used to compare the mean of a single sample to a known or hypothesized population mean. |
| Independent samples t-test | A statistical test used to compare the means of two independent groups. |
| Homoscedasticity | The assumption that the variances of different groups are approximately equal. This is a condition for some statistical tests, like the independent samples t-test and ANOVA. |
| Levene's test | A statistical test used to assess the equality of variances between two or more groups. |
| ANOVA (Analysis of Variance) | A statistical test used to compare the means of three or more independent groups. It partitions the total variance into variance between groups and variance within groups. |
| F-statistic | The test statistic used in ANOVA, calculated as the ratio of between-group variance to within-group variance. |
| Post-hoc tests | Follow-up statistical tests performed after a significant ANOVA result to determine which specific group means differ from each other. |
| Non-parametric tests | Statistical tests that do not assume the data follows a specific distribution (e.g., normal distribution). They are often used with ordinal data or when assumptions of parametric tests are violated. |
| Mann-Whitney U test | A non-parametric test used to compare two independent groups. It is an alternative to the independent samples t-test. |
| Wilcoxon signed-rank test | A non-parametric test used to compare two related samples or paired observations. It is an alternative to the paired t-test. |
| Sign test | A non-parametric test used to compare the median of a single group to a hypothesized value or to compare paired data. |
| Kruskal-Wallis test | A non-parametric test used to compare three or more independent groups. It is an alternative to ANOVA. |
| Correlation | A statistical measure that describes the strength and direction of the linear relationship between two continuous variables. |
| Pearson correlation coefficient (r) | A measure of the linear association between two continuous variables, ranging from -1 (perfect negative correlation) to +1 (perfect positive correlation), with 0 indicating no linear correlation. |
| Coefficient of determination ($ r^2 $) | The proportion of the variance in the dependent variable that is predictable from the independent variable(s). It indicates the strength of the linear relationship. |
| Spearman's rank correlation coefficient | A non-parametric measure of the strength and direction of the monotonic relationship between two ranked variables. |
| Linear regression | A statistical technique used to model the linear relationship between a dependent variable and one or more independent variables. |
| Simple linear regression | A regression model with one independent variable. |
| Multiple linear regression | A regression model with two or more independent variables. |
| Regression coefficient (slope, b) | In linear regression, the coefficient that indicates the expected change in the dependent variable for a one-unit increase in the independent variable. |
| Intercept (constant, a or $ b_0 $) | In linear regression, the value of the dependent variable when all independent variables are zero. |
| Adjusted R-squared | A modified version of R-squared that adjusts for the number of predictors in the model, providing a more accurate measure of model fit, especially in multiple regression. |
| Dummy variable | A binary variable (0 or 1) used to represent categories of a categorical predictor in regression analysis. |
| Logistic regression | A statistical technique used to model the probability of a dichotomous outcome variable as a function of one or more predictor variables. |
| Odds ratio (OR) | A measure of the strength of association between an exposure and an outcome. It is the ratio of the odds of the outcome occurring in one group to the odds of it occurring in another group. |
| Maximum Likelihood Estimation (MLE) | A method of estimating the parameters of a statistical model by finding the parameter values that maximize the likelihood function, which represents the probability of observing the data given the parameters. |
| -2 Log Likelihood | A measure used in logistic regression to assess the goodness of fit of a model. Lower values indicate a better fit. |
| Likelihood ratio test | A statistical test used to compare the fit of two nested models, typically by comparing their -2 Log Likelihood values. |
| Survival analysis | A set of statistical methods for analyzing the time until an event of interest occurs, such as death, disease recurrence, or recovery. |
| Kaplan-Meier curve | A graphical method for estimating and displaying the survival function from lifetime data. It shows the probability of survival over time. |
| Log-rank test | A statistical test used to compare the survival distributions of two or more groups. It tests the null hypothesis that the survival curves are identical. |
| Cox proportional hazards model (Cox regression) | A semi-parametric statistical model used in survival analysis to investigate the effect of predictor variables on the hazard rate of an event occurring. |
| Hazard ratio (HR) | In Cox regression, the exponentiated regression coefficient ($ exp(B) $), representing the relative risk of the event occurring in one group compared to another, assuming other predictors are constant. |
| Confounding | A bias that occurs when an observed association between an exposure and an outcome is distorted by the presence of a third variable (confounder) that is associated with both the exposure and the outcome. |
| Effect modification (interaction) | A situation where the effect of an exposure on an outcome differs across levels of another variable (the effect modifier). The relationship between the exposure and outcome is not uniform. |
| Stepwise selection | A procedure for building regression models by adding or removing predictor variables based on statistical criteria (e.g., p-values) to find the best-fitting model. |
| Hosmer-Lemeshow test | A goodness-of-fit test for logistic regression models, which assesses whether the observed event rates match the predicted event rates across deciles of risk. |
| Statistical power (1-β) | The probability of correctly rejecting a false null hypothesis. A power of 80% means there is an 80% chance of detecting a true effect if it exists. |
| Skewness | A measure of the asymmetry of a probability distribution. Positive skewness indicates a tail extending to the right, while negative skewness indicates a tail extending to the left. |
| Kurtosis | A measure of the "tailedness" or "peakedness" of a probability distribution. High kurtosis means heavier tails and a sharper peak, while low kurtosis means lighter tails and a flatter peak. |
| QQ-plot (Quantile-Quantile plot) | A graphical tool used to assess whether a dataset follows a particular theoretical distribution, such as the normal distribution. |
| Multicollinearity | A phenomenon in multiple regression where two or more predictor variables are highly linearly related to each other, making it difficult to estimate the independent effect of each predictor. |
| Kappa statistic (Cohen's Kappa) | A measure of inter-rater or inter-observer agreement for categorical items. It accounts for the possibility of agreement occurring by chance. |
| Validity | The extent to which a measurement tool measures what it is intended to measure. |
| Test-retest reliability | The consistency of results when a test or measurement is administered to the same individuals on two or more occasions under similar conditions. |
| Inter-rater reliability | The degree of agreement between two or more independent raters or observers who are evaluating the same phenomenon. |
| Intra-class correlation coefficient (ICC) | A statistical measure used to assess the reliability or consistency of measurements, especially when dealing with continuous data and multiple raters or occasions. |