Cover
Start nu gratis Week 10 Population and Sampling to be shared.pdf · version 1_compressed.pdf
Summary
# Population, sample, and sampling techniques
This section outlines the fundamental concepts of population, sample, and the various methods employed to select a sample for research purposes.
### 1.1 Population
A population in research is defined as the entire group of individuals or elements that share at least one common characteristic relevant to the study. It represents the complete set from which data could theoretically be collected and is the larger group to which a researcher aims to generalize their findings [4](#page=4).
#### 1.1.1 Key characteristics of a population
* **Target group:** The population is the complete group that a researcher wants to make inferences about [4](#page=4).
* **Shared traits:** It is defined by specific characteristics that its members (people, objects, etc.) have in common [4](#page=4).
* **Can be diverse:** Populations are not limited to humans; they can include non-human entities like events, animal species, or objects [4](#page=4).
* **Often too large to study directly:** It is typically impractical or impossible to collect data from every member due to constraints like time and cost [4](#page=4).
* **Source of samples:** Researchers select a representative subset, known as a sample, from the population to gather data, with the intention of generalizing the sample's findings to the entire population [4](#page=4).
#### 1.1.2 Example in educational research
| Research Title | Population |
| :------------------------------------------------------------------ | :------------------------------------------------------------------------------------------- |
| Exploring Students' Critical Thinking Skills in Extensive Reading Classes at UIN Jakarta | All students enrolled in Extensive Reading classes at UIN Jakarta. | [5](#page=5).
| Teachers' Perceptions of Online Assessment Practices in Indonesian Junior High Schools | All English teachers teaching in Indonesian junior high schools. | [5](#page=5).
| The Effect of Project-Based Learning on Vocabulary Mastery among Vocational School Students | All students studying in vocational high schools where English is taught as a subject. | [5](#page=5).
| Integrating Local Culture in EFL Textbooks: A Study of Indonesian Secondary School Materials | All English textbooks used in Indonesian secondary schools. | [5](#page=5).
| The Use of AI Tools in Academic Writing: A Study of Graduate Students in English Education Programs | All graduate students majoring in English Education programs. | [5](#page=5).
> **Tip:** The population defines the scope of a study and determines to whom the findings can be generalized [5](#page=5).
### 1.2 Sample
A sample is a subset of the population that is selected for actual data collection. Researchers study the sample to draw conclusions about the entire population because collecting data from every member is often impractical, too costly, or impossible. The primary goal is to select a sample that accurately reflects the characteristics of the population under investigation [6](#page=6).
#### 1.2.1 Key aspects of a research sample
* **Subset of a population:** A sample is a portion of the larger group being studied, for instance, 100 students from a university of 20,000 [7](#page=7).
* **Representation:** The sample must be representative of the population to enable researchers to derive valid conclusions [7](#page=7).
* **Manageability:** Using a sample makes data collection more feasible in terms of time and resources compared to studying the entire population [7](#page=7).
* **Sampling methods:** Various methods are used to select a sample, broadly categorized into probability and non-probability sampling [7](#page=7).
#### 1.2.2 Examples of population and sample in educational research
| Research Title | Population | Sample |
| :------------------------------------------------------------------ | :------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------- |
| Exploring Students' Critical Thinking Skills in Extensive Reading Classes at UIN Jakarta | All students enrolled in Extensive Reading classes at UIN Jakarta. | 90 students from three Extensive Reading classes in the English Education Department. | [8](#page=8).
| Teachers' Perceptions of Online Assessment Practices in Indonesian Junior High Schools | All English teachers teaching in Indonesian junior high schools. | 30 English teachers from five public junior high schools in East Jakarta. | [8](#page=8).
| The Effect of Project-Based Learning on Vocabulary Mastery among Vocational School Students | All students studying in vocational high schools where English is taught as a subject. | 60 students from two classes at SMKN 5 Jakarta. | [8](#page=8).
| Integrating Local Culture in EFL Textbooks: A Study of Indonesian Secondary School Materials | All English textbooks used in Indonesian secondary schools. | Six English textbooks used in Grades 7 to 9, published by three major Indonesian publishers. | [8](#page=8).
| The Use of AI Tools in Academic Writing: A Study of Graduate Students in English Education Programs | All graduate students majoring in English Education programs in Indonesia. | 40 graduate students enrolled in the Master's Program at UIN Jakarta who have used AI tools in writing courses. | [8](#page=8).
> **Key Point:** A good sample must be representative of the population so that results reflect broader trends rather than isolated cases [8](#page=8).
### 1.3 Sampling
Sampling is the process or technique employed to select individuals from a population to form a sample. It addresses the question of "How do we choose whom or what to study?". This method is used to select a representative subset of a population so that researchers can make conclusions about the entire group without studying everyone. The choice of sampling method is critical for ensuring the sample accurately reflects the population, thereby making research findings reliable and generalizable. The two main categories of sampling are probability sampling, which involves random selection, and non-probability sampling, which relies on non-random criteria [9](#page=9).
### 1.4 Probability sampling
Probability sampling involves random selection, ensuring that every member of the population has a known, non-zero chance of being included in the sample. The primary purpose of this technique is to create a representative sample that allows for strong statistical inferences about the entire population [10](#page=10).
#### 1.4.1 Probability sampling methods
| No. | Sampling Type | Description | Steps in the Process | Example in Educational Research |
| :-- | :---------------------- | :--------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | Simple Random Sampling | Every member of the population has an equal chance of being selected; selection is purely by chance. | 1. Prepare a complete list of the population. 2. Assign each member a number. 3. Use a random number generator or lottery method to select participants. | Selecting 50 students from a list of 500 English Department students using random numbers generated by Excel. | [11](#page=11).
| 2 | Stratified Random Sampling | The population is divided into homogeneous subgroups (strata), and random samples are taken from each group. | 1. Identify relevant strata (e.g., gender, class level). 2. Divide population into these strata. 3. Randomly select proportional samples from each stratum. 4. Combine all selected participants. | Dividing students by gender and randomly selecting 25 males and 25 females to ensure gender balance. | [11](#page=11).
| 3 | Cluster Sampling | The population is divided into groups (clusters), and entire clusters are randomly selected instead of individuals. | 1. Divide the population into natural clusters (e.g., schools, classes). 2. Randomly select several clusters. 3. Include all members within those clusters in the sample. | Randomly selecting five schools in Jakarta, then surveying all English teachers in those schools. | [11](#page=11).
| 4 | Systematic Sampling | Every kth member is chosen from an ordered list after a random start, providing even coverage across the population. | 1. Determine the population size (N) and desired sample size (n). 2. Calculate (k = N/n). 3. Randomly choose a starting point between 1 and k. 4. Select every kth individual thereafter. | From 500 Reading course students, choose every 10th student (k=10) starting from the 4th name on the list. | [11](#page=11).
| 5 | Area Sampling | A form of cluster sampling where clusters are based on geographical areas such as provinces, cities, or schools. | 1. Divide the target region into areas (e.g., provinces, districts). 2. Randomly select several areas. 3. Collect data from all elements or a subsample within those selected areas. | Selecting three provinces in Indonesia, then choosing two districts from each to collect teacher data. | [11](#page=11).
| 6 | Multistage Sampling | A complex version of cluster sampling involving multiple levels of random selection. | 1. Identify large clusters (e.g., provinces). 2. Randomly select some clusters. 3. Within each, identify smaller units (e.g., schools). 4. Randomly select samples within those smaller units. | Stage 1: Select provinces -> Stage 2: Select schools -> Stage 3: Select teachers from those schools. | [11](#page=11).
> **Note:** In systematic sampling, the starting point is always chosen randomly between 1 and $k$ (the sampling interval). For example, if $N=500$ and $n=50$, then $k=N/n=10$. If the random start is 4, the sample includes the 4th, 14th, 24th, etc., participants. Area sampling is a form of cluster sampling with geographical clusters, while multistage sampling involves multiple levels of selection from various types of groupings [12](#page=12).
### 1.5 Non-probability sampling
Non-probability sampling involves the selection of participants based on factors other than random chance, such as convenience, cost, or researcher judgment. While often used for practical reasons when random sampling is not feasible, this method is more susceptible to bias [13](#page=13).
> **Key Takeaway:** Probability sampling is ideal for studies aiming to generalize findings to an entire population, whereas non-probability sampling is more suitable for exploratory studies or understanding specific contexts and experiences [13](#page=13).
#### 1.5.1 Non-probability sampling methods
| No. | Sampling Type | Description | Steps in the Process | Example in Educational Research |
| :-- | :---------------------- | :------------------------------------------------------------------------------------------------------------ | :------------------------------------------------------------------------------------------------------------------------------------------------------- | :----------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | Convenience Sampling | Participants are chosen based on their easy availability or willingness to participate; it is simple but may reduce representativeness. | 1. Identify individuals who are easy to access. 2. Ask for their voluntary participation. 3. Collect data from those who agree. | Selecting your own students to respond to a classroom survey on reading strategies because they are already accessible. | [14](#page=14).
| 2 | Purposive (Judgmental) Sampling | Participants are intentionally chosen based on specific criteria related to the study's objectives. | 1. Define inclusion criteria (e.g., teaching experience, use of technology). 2. Identify individuals who meet those criteria. 3. Invite them to participate. | Selecting English teachers who have implemented digital storytelling in their classrooms. | [14](#page=14).
| 3 | Snowball Sampling | Existing participants recruit other participants who meet the criteria; useful for hard-to-reach populations. | 1. Identify one or two initial participants (seeds). 2. Collect data and ask them to recommend others. 3. Continue until the desired sample size is reached. | Interviewing one teacher about inclusive practices, then being referred to colleagues with similar experience. | [14](#page=14).
| 4 | Quota Sampling | The researcher ensures specific categories or subgroups are represented in certain proportions, but selection within each group is non-random. | 1. Decide on categories (e.g., male/female, school type). 2. Determine the number (quota) for each. 3. Fill each quota using available participants. | Choosing 20 male and 20 female students from different schools without random selection. | [14](#page=14).
| 5 | Volunteer Sampling | Participants self-select by responding to an invitation or call for participation. | 1. Announce or advertise the study. 2. Allow interested participants to sign up. 3. Collect data from those who volunteer. | Posting an online survey about AI use in writing classes and using responses from those who choose to participate. | [14](#page=14).
### 1.6 Comparison of probability and non-probability sampling
| Aspect | Probability Sampling | Non-Probability Sampling |
| :------------------ | :--------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------- |
| Basic Definition | Every individual or element in the population has a known and equal chance of being selected. | Individuals are selected based on availability, judgment, or willingness rather than randomization. | [15](#page=15).
| Main Purpose | To achieve representativeness and allow statistical generalization of results to the population. | To obtain in-depth, specific, or practical insights, especially when randomization is not possible. | [15](#page=15).
| Selection Basis | Random selection following objective procedures (lottery, random numbers, intervals, stages). | Subjective or convenience-based selection guided by researcher's criteria or participants' willingness. | [15](#page=15).
| Typical Techniques | Simple random, stratified, cluster, systematic, area, multistage. | Convenience, purposive, quota, snowball, volunteer. | [15](#page=15).
| Example in Research | Selecting 100 students randomly from 1,000 to test reading comprehension. | Interviewing only teachers who have used project-based learning in their classes. | [15](#page=15).
| Advantages | Minimizes bias; results can be generalized; allows statistical inference. | Easier and faster to conduct; useful for exploratory or qualitative studies; flexible for specific populations. | [15](#page=15).
| Limitations | Time-consuming and requires a full population list; not always feasible in large or dispersed populations. | Higher potential for bias; results may not represent the entire population. | [15](#page=15).
| Best Used When | Quantitative research requiring generalizable results; large-scale or national studies. | Qualitative or classroom-based research; exploratory studies or cases with limited access. | [15](#page=15).
---
# Levels of measurement and data types
Understanding how data is measured is fundamental in educational research, as it dictates the types of analyses that can be appropriately applied and how findings can be interpreted. Variables are measured in different ways depending on what they represent and how they are quantified. The four levels of measurement, forming a continuum from least to most precise, are nominal, ordinal, interval, and ratio [16](#page=16).
### 2.1 Nominal data
Nominal data is a type of qualitative data used to categorize variables into distinct, unordered groups or labels that have no inherent quantitative value. These categories are mutually exclusive, meaning each item can belong to only one group, and they cannot be ranked or meaningfully ordered [17](#page=17).
**Characteristics of nominal data:**
* **Categorical:** Data is sorted into groups or categories [17](#page=17).
* **Mutually exclusive:** Each item can only fit into one category at a time [17](#page=17).
* **No inherent order:** There is no natural or meaningful order to the categories [17](#page=17).
* **No quantitative value:** The categories cannot be used for mathematical calculations like addition or subtraction [17](#page=17).
* **Often qualitative:** It is a qualitative data type, sometimes called "named" or "labelled" data [17](#page=17).
**Examples of nominal data:**
* Gender: Male, Female, Non-binary [18](#page=18).
* Marital Status: Single, Married, Divorced, Widowed [18](#page=18).
* Blood Type: A, B, AB, O [18](#page=18).
* Nationality: American, Canadian, Indian [18](#page=18).
* Eye Color: Brown, Blue, Green, Hazel [18](#page=18).
* Type of School: Public, Private, Islamic [18](#page=18).
* Teaching Method: Communicative, Grammar-Translation, Task-Based [18](#page=18).
**Key features of nominal data research:**
* Variables are categories only, not ordered or numeric [18](#page=18).
* Analysis focuses on counts, frequencies, and associations [18](#page=18).
* Most appropriate analysis methods include the Chi-Square test, Fisher's Exact test, or descriptive percentages [18](#page=18).
> **Example:** Research investigating "Gender Differences in Students' Preference for Online vs. Offline English Classes" uses Gender (Male/Female) and Learning Mode Preference (Online/Offline) as nominal variables. The purpose is to examine whether preferences differ by gender category, and a Chi-Square test for association between two nominal variables would be suitable [19](#page=19).
### 2.2 Ordinal data
Ordinal data is qualitative data that ranks or orders categories based on a natural hierarchy, but the distances between these categories are not quantifiable or equal. Common examples include Likert scale responses (e.g., "strongly disagree" to "strongly agree") or Olympic medal rankings. The key characteristic is the order, not the exact numerical difference between the ranks [20](#page=20).
**Key characteristics of ordinal data:**
* **Order is important:** Categories are arranged in a specific, meaningful sequence [20](#page=20).
* **Unequal or unknown intervals:** The difference between each category is not uniform or precise. For instance, the difference in satisfaction between "satisfied" and "very satisfied" is not necessarily the same as between "dissatisfied" and "satisfied" [20](#page=20).
* **Categorical:** The data is descriptive and grouped into distinct categories rather than precise numerical values [20](#page=20).
* **Limited mathematical operations:** Meaningful arithmetic operations like addition or averaging cannot be performed on ordinal data because the intervals are not equal [20](#page=20).
**Examples of ordinal data:**
* Customer satisfaction levels: "Very satisfied," "satisfied," "neutral," "dissatisfied," "very dissatisfied" [21](#page=21).
* Educational attainment: "High school diploma," "bachelor's degree," "master's degree," "doctoral degree" [21](#page=21).
* Pain scale: "Mild pain," "moderate pain," "severe pain" [21](#page=21).
* Economic status: "Low," "medium," "high" [21](#page=21).
* Frequency of exercise: "Never," "rarely," "sometimes," "often," "always" [21](#page=21).
**Key features of ordinal data research:**
* Ordinal variables express order or ranking, but intervals between ranks are unequal or unknown [21](#page=21).
* They are commonly collected via Likert scales, ranking questionnaires, or rating checklists [21](#page=21).
* Appropriate analyses are non-parametric, including the Mann-Whitney U test, Kruskal-Wallis H test, Spearman's rho, or the Median test [21](#page=21).
* Descriptive analysis often reports the median, mode, and percentages, rather than the mean or standard deviation [21](#page=21).
> **Example:** In a study on "The Relationship Between Students' Motivation and Their Reading Engagement," motivation (High-Medium-Low) and engagement (High-Medium-Low) are treated as ordinal variables. Spearman's Rank Correlation (rho) is a suitable analysis for finding correlations between such ranked variables [22](#page=22).
### 2.3 Interval data
Interval data is a type of quantitative data where scores are measured on a scale with equal, measurable distances between values, such as standardized test scores or GPAs. Unlike a ratio scale, interval data has an arbitrary zero point, meaning a score of zero does not represent a complete absence of the measured trait. Interval data always takes numerical values where the distance between two points on the scale is standardized and equal [23](#page=23).
**Key characteristics:**
* **Equal intervals:** The distance between any two consecutive numbers on the scale is the same. For example, the difference between a score of 1100 and 1200 on the SAT is the same as the difference between 900 and 1000, which is 100 [23](#page=23).
* **Arbitrary zero:** The zero point on the scale is not absolute. For instance, a GPA of 0.0 does not mean a total lack of academic performance, and a score of 0 on a test does not imply the student has no knowledge whatsoever [23](#page=23).
* **Order and difference:** Interval data provides the order of values and allows for the calculation of the difference between them [23](#page=23).
"Equal intervals" means that the distance between any two adjacent points on the scale is consistent and meaningful. For example, the difference between 10 and 20 is the same as between 20 and 30 [24](#page=24).
> **Example:** In "The Effect of Extensive Reading on Students' TOEFL Reading Scores," TOEFL Reading Scores are treated as interval data. A student's score ranging from 30-67 (part of a total TOEFL range of 310-677) exhibits equal intervals, but a score of "0" would not indicate "no reading ability". Parametric tests like the Independent Samples t-Test are appropriate for comparing mean scores between groups [25](#page=25).
### 2.4 Ratio data
Ratio data is a type of quantitative data that possesses all the characteristics of nominal, ordinal, and interval scales, plus a true or absolute zero point. A true zero means that the value of zero represents the complete absence of the variable being measured, which allows for all mathematical operations, including multiplication and division [26](#page=26).
**Key Characteristics:**
* **Quantitative:** The data is numerical [26](#page=26).
* **Ordered:** Data can be ranked from low to high [26](#page=26).
* **Equal Intervals:** The difference between any two adjacent points on the scale is consistent and meaningful [26](#page=26).
* **True Zero:** The zero point on the scale means a total absence of the quantity. You cannot have negative values for the measured variable [26](#page=26).
Ratio data offers the most analytical flexibility, allowing for all descriptive statistics (mean, median, mode, standard deviation, range, variance) and a wide range of parametric statistical tests, such as Pearson correlation and linear regression. Ratios can be compared logically (e.g., "twice as long," "half the amount") because of the true zero point [27](#page=27).
**Examples in Educational Research:**
* **Age:** A student can be 0 years old, meaning they have no age. A 10-year-old student is twice as old as a 5-year-old student [28](#page=28).
* **Years of Education/Experience:** Zero years of education signifies the complete absence of formal schooling. An educator with 20 years of experience has twice the experience of one with 10 years [28](#page=28).
* **Number of Students:** A school can have zero students, indicating the complete absence of students [28](#page=28).
* **Time taken to complete a task:** A student can take 0 seconds to react to a stimulus, and a student who takes 4 minutes takes half the time of one who takes 8 minutes [28](#page=28).
* **Scores on a test with an absolute zero:** While most test scores are interval, a count of the number of incorrect answers (where zero means no incorrect answers) would be ratio data [28](#page=28).
> **Example:** In a study on "The Relationship Between Study Time and TOEFL Scores," study time reported in hours per week (e.g., 0, 3, 5, 8, 12 hours) is ratio data. A "0" indicates no study time. TOEFL scores, while often treated as interval, are sometimes analyzed with regression in conjunction with ratio variables, and vocabulary pretest/posttest results showing the actual number of words recalled correctly (e.g., 0-50 words) are also ratio data, where "0" means no correct vocabulary retained. Parametric analyses like Pearson correlation or simple linear regression are suitable for such data [29](#page=29).
---
# Research design and statistical analysis
The connection between research design and statistical analysis is fundamental, with the former dictating how data is collected and the latter enabling its interpretation [30](#page=30).
### 3.1 Understanding the research-analysis link
Every research design, whether experimental, correlational, descriptive, or comparative, serves a distinct analytical purpose. The research design itself determines how data is gathered, while statistical analysis provides the means to understand that data. The research question is the initial driver, specifying the type of relationship to be investigated – be it a difference, an association, or a description. This identified relationship then guides the selection of the appropriate statistical test [30](#page=30).
Key takeaways for this relationship include:
* Research questions are paramount and direct the choice of design, which in turn dictates the analysis [30](#page=30).
* When approaching analysis, always consider:
1. The primary research purpose [30](#page=30).
2. The measurement level of the variables (data type) [30](#page=30).
3. Whether the research aims to compare, relate, or describe [30](#page=30).
* The decision between parametric and non-parametric tests hinges on the data type and its distribution [30](#page=30).
### 3.2 Parametric versus non-parametric tests
In quantitative research, the choice of statistical test is predominantly influenced by two critical factors: the type of data (interval/ratio versus nominal/ordinal) and the distribution of the data (normal versus non-normal) [31](#page=31).
**Parametric tests** are employed when the data adheres to specific mathematical assumptions, most notably normal distribution and homogeneity of variance [31](#page=31).
**Non-parametric tests** are utilized when these assumptions are not met, such as when dealing with categorical data, ranked data, or distributions that are skewed [31](#page=31).
The following table outlines the key distinctions between parametric and non-parametric tests:
| Aspect | Parametric Tests | Non-Parametric Tests |
| :-------------------- | :--------------------------------------------------------------- | :---------------------------------------------------------------- |
| **Type of Data** | Interval or Ratio (continuous, numeric) | Nominal or Ordinal (categorical or ranked) |
| **Assumptions** | Data are normally distributed. Homogeneity of variance. Equal intervals between values. | No assumption of normality. Can be used with skewed, ranked, or categorical data. |
| **Main Purpose** | To test means, relationships, or differences using actual numerical values. | To test ranks, frequencies, or medians without relying on distribution shape. |
| **Statistical Power** | Generally higher – more sensitive to detect true effects if assumptions are met. | Slightly lower – less sensitive but more flexible when assumptions are violated. |
| **Examples of Tests** | t-Test (independent, paired), ANOVA, Pearson Correlation, Regression | Mann-Whitney U, Wilcoxon Signed-Rank, Kruskal-Wallis H, Spearman's rho, Chi-Square Test |
| **Central Tendency** | Mean, Standard Deviation | Median, Rank, Frequency |
| **When to Use** | When sample size ≥ 30, data are continuous, and normality is satisfied. | When sample size < 30, data are ordinal or nominal, or distribution is skewed. |
| **Interpretation** | Results focus on mean differences or numeric relationships. | Results focus on rank differences, frequency associations, or median comparisons. |
#### 3.2.1 Illustrative scenarios for choosing tests
The choice between parametric and non-parametric tests can be illustrated with various research scenarios:
* **Scenario:** Comparing mean TOEFL scores between two groups.
* **Data Type:** Interval / Ratio
* **Distribution:** Normal
* **Appropriate Test:** Independent-Samples t-Test (Parametric) [33](#page=33).
* **Scenario:** Comparing motivation levels (High-Medium-Low) between schools.
* **Data Type:** Ordinal
* **Distribution:** Non-normal / Ranked
* **Appropriate Test:** Kruskal-Wallis H Test (Non-Parametric) [33](#page=33).
* **Scenario:** Correlating study time (hours) and GPA.
* **Data Type:** Ratio
* **Distribution:** Normal
* **Appropriate Test:** Pearson Correlation (Parametric) [33](#page=33).
* **Scenario:** Correlating students' ranking in reading and writing.
* **Data Type:** Ordinal
* **Distribution:** Non-normal / Ranked
* **Appropriate Test:** Spearman's rho (Non-Parametric) [33](#page=33).
* **Scenario:** Examining the association between gender and preferred learning platform.
* **Data Type:** Nominal
* **Distribution:** Categorical
* **Appropriate Test:** Chi-Square Test (Non-Parametric) [33](#page=33).
> **Tip:** Always ensure your chosen statistical test aligns with the nature of your data and your research question to ensure valid findings [34](#page=34).
Key takeaways regarding parametric and non-parametric tests:
* Parametric tests necessitate interval or ratio data that exhibit a normal distribution [34](#page=34).
* Non-parametric tests are suitable for ranked, categorical, or non-normally distributed data [34](#page=34).
* Crucially, always verify:
* The data type [34](#page=34).
* Whether the data distribution is normal [34](#page=34).
* The research question (compare, relate, describe) [34](#page=34).
* Selecting the correct statistical test is vital for ensuring research findings are valid, reliable, and interpretable [34](#page=34).
### 3.3 Matching data types and research designs to analysis methods
In research, a universal, one-size-fits-all analytical approach does not exist. The interplay between data type (nominal, ordinal, interval, ratio) and research design (descriptive, comparative, correlational, experimental) is what determines the validity of a particular statistical test. The process of choosing the right analysis involves aligning the research question, the collected data, and the adopted design. This precise matching ensures that the conclusions drawn are statistically sound and academically defensible [35](#page=35).
The following table details how different research designs, purposes, data types, and example variables align with appropriate statistical analyses:
| No. | Research Design | Research Purpose | Type of Data | Examples of Variables | Example Research Title | Appropriate Statistical Analysis | Explanation |
| :-- | :------------------------------ | :------------------------------------------------------------ | :----------------- | :---------------------------------------------------------------- | :----------------------------------------------------------------------------------- | :------------------------------------------------------------------------- | :---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| 1 | Descriptive | To summarize or describe characteristics of a population. | Nominal / Ordinal | Gender, school type, attitude category | Teachers' Preferences for Online Learning Platforms in Indonesia | Frequencies, Percentages, Mean, Mode, Standard Deviation | Descriptive designs summarize data using counts or averages, without comparisons or relationship testing. | [36](#page=36).
| 2 | Comparative (Causal-Comparative) | To compare two or more groups based on a single variable. | Ordinal / Interval / Ratio | Teaching method, motivation score, exam result | Differences in Motivation Levels Between Public and Private School Students | Parametric: t-Test / ANOVA
Non-parametric: Mann-Whitney U / Kruskal-Wallis | | [36](#page=36). | 3 | Experimental / Quasi-Experimental | To determine cause and effect between independent and dependent variables. | Interval / Ratio | Instructional approach, test scores, performance level | Effect of Mindful Learning on Reading Comprehension Achievement | Parametric: t-Test (paired or independent), ANOVA, ANCOVA | These designs compare means between groups or pre-post scores to test causal effects. | [36](#page=36). | 4 | Correlational | To examine the strength and direction of the relationship between two variables. | Interval / Ratio | Study time, GPA, proficiency, test score | Relationship Between Study Time and TOEFL Scores of English Education Students | Parametric: Pearson Correlation
Non-parametric: Spearman's rho | Correlation shows how two variables move together – positively, negatively, or not at all. | [36](#page=36). | 5 | Associational (Categorical Relationship) | To find association between categorical variables. | Nominal | Gender, learning style, preferred platform | Association Between Gender and Preferred Learning Mode | Chi-Square Test | Compares observed frequencies with expected frequencies to find association patterns. | [36](#page=36). | 6 | Predictive (Regression) | To predict one variable based on another (or several others). | Interval / Ratio | Motivation, proficiency, achievement | Can Motivation Predict Students' English Writing Scores? | Simple Regression or Multiple Regression | Regression builds a model to predict outcomes based on predictor variables. | [36](#page=36). | 7 | Mixed-Methods (Quantitative Side) | To describe and test effects or relationships within one design. | Combination (Ordinal + Ratio) | Attitude scale + performance test | Exploring Students' Attitudes and Writing Improvement Through Peer Review | Descriptive + Inferential (e.g., Mean + t-Test or Correlation) | Mixed methods combine both descriptive summary and inferential testing for broader insight. | [36](#page=36). > **Tip:** The fundamental rule for choosing between parametric and non-parametric tests is to assess if the data is normal and continuous (favoring parametric) or if it is ordinal or skewed (favoring non-parametric) [36](#page=36). --- ## Common mistakes to avoid - Review all topics thoroughly before exams - Pay attention to formulas and key definitions - Practice with examples provided in each section - Don't memorize without understanding the underlying concepts
Non-parametric: Mann-Whitney U / Kruskal-Wallis | | [36](#page=36). | 3 | Experimental / Quasi-Experimental | To determine cause and effect between independent and dependent variables. | Interval / Ratio | Instructional approach, test scores, performance level | Effect of Mindful Learning on Reading Comprehension Achievement | Parametric: t-Test (paired or independent), ANOVA, ANCOVA | These designs compare means between groups or pre-post scores to test causal effects. | [36](#page=36). | 4 | Correlational | To examine the strength and direction of the relationship between two variables. | Interval / Ratio | Study time, GPA, proficiency, test score | Relationship Between Study Time and TOEFL Scores of English Education Students | Parametric: Pearson Correlation
Non-parametric: Spearman's rho | Correlation shows how two variables move together – positively, negatively, or not at all. | [36](#page=36). | 5 | Associational (Categorical Relationship) | To find association between categorical variables. | Nominal | Gender, learning style, preferred platform | Association Between Gender and Preferred Learning Mode | Chi-Square Test | Compares observed frequencies with expected frequencies to find association patterns. | [36](#page=36). | 6 | Predictive (Regression) | To predict one variable based on another (or several others). | Interval / Ratio | Motivation, proficiency, achievement | Can Motivation Predict Students' English Writing Scores? | Simple Regression or Multiple Regression | Regression builds a model to predict outcomes based on predictor variables. | [36](#page=36). | 7 | Mixed-Methods (Quantitative Side) | To describe and test effects or relationships within one design. | Combination (Ordinal + Ratio) | Attitude scale + performance test | Exploring Students' Attitudes and Writing Improvement Through Peer Review | Descriptive + Inferential (e.g., Mean + t-Test or Correlation) | Mixed methods combine both descriptive summary and inferential testing for broader insight. | [36](#page=36). > **Tip:** The fundamental rule for choosing between parametric and non-parametric tests is to assess if the data is normal and continuous (favoring parametric) or if it is ordinal or skewed (favoring non-parametric) [36](#page=36). --- ## Common mistakes to avoid - Review all topics thoroughly before exams - Pay attention to formulas and key definitions - Practice with examples provided in each section - Don't memorize without understanding the underlying concepts
Glossary
| Term | Definition |
|------|------------|
| Population | The entire group of individuals or elements that share at least one common characteristic relevant to a study, representing the complete set from which data could theoretically be collected. |
| Sample | A subset of the population selected for actual data collection, used by researchers to draw conclusions about the entire population when studying the whole group is impractical. |
| Sampling | The process or technique used to select individuals from the population to form a sample, ensuring the chosen subset accurately reflects the population. |
| Probability Sampling | A sampling method where every member of the population has a known, non-zero chance of being included in the sample, aiming to create a representative sample for strong statistical inferences. |
| Simple Random Sampling | A probability sampling technique where every member of the population has an equal chance of being selected through purely chance-based methods. |
| Stratified Random Sampling | A probability sampling method where the population is divided into homogeneous subgroups (strata), and random samples are taken proportionally from each group to ensure representation. |
| Cluster Sampling | A probability sampling technique where the population is divided into groups (clusters), and entire clusters are randomly selected for inclusion in the sample, rather than individual members. |
| Systematic Sampling | A probability sampling method where every k-th member is chosen from an ordered list after a randomly selected starting point, providing even coverage across the population. |
| Non-Probability Sampling | A sampling method where the selection of participants is not random and is based on factors like convenience, cost, or judgment, making it more susceptible to bias but practical when random sampling is not feasible. |
| Convenience Sampling | A non-probability sampling method where participants are chosen because they are easily available or willing to participate, making it simple and quick but potentially less representative. |
| Purposive Sampling | A non-probability sampling method where participants are intentionally chosen based on specific criteria relevant to the study's objectives, often used to gather in-depth information from a targeted group. |
| Snowball Sampling | A non-probability sampling technique where existing participants recruit other participants who meet the study criteria, useful for reaching hard-to-reach or specialized populations. |
| Quota Sampling | A non-probability sampling method where researchers ensure specific subgroups are represented in certain proportions, but the selection within each group is non-random. |
| Nominal Data | A type of qualitative data that categorizes variables into distinct, unordered groups or labels without any quantitative value, where categories are mutually exclusive and cannot be ranked. |
| Ordinal Data | A type of qualitative data that ranks or orders categories based on a natural hierarchy, where the order is important but the distances between categories are not quantifiable or equal. |
| Interval Data | A type of quantitative data where scores are measured on a scale with equal, measurable distances between values, but the scale has an arbitrary zero point, meaning zero does not represent a complete absence of the trait. |
| Ratio Data | A type of quantitative data that possesses true zero, meaning the value of zero represents a complete absence of the variable being measured, allowing for all mathematical operations including multiplication and division. |
| Parametric Tests | Statistical tests used when data meet certain mathematical assumptions, primarily normal distribution and equal variances, typically applied to interval or ratio data to test means, relationships, or differences. |
| Non-Parametric Tests | Statistical tests used when the assumptions for parametric tests are not met, such as when data are categorical, ranked, or skewed, and are applied to nominal or ordinal data to test ranks, frequencies, or medians. |
| Research Design | The overall strategy and structure chosen by a researcher to carry out a study, outlining how data will be collected and analyzed to answer research questions. |
| Statistical Analysis | The process of collecting, organizing, summarizing, and interpreting numerical data to identify patterns, relationships, and make inferences about a population. |