A study with high validity actually measures exactly what it was intended to do, but high reliability is necessary to maintain high validity and provide stable results even in repeated situations.
Original Korean article: Validity/Reliability R Statistics: Criteria for judging a good measurement tool
The concepts of validity and reliability R statistics are key criteria for judging a good measurement tool. High reliability does not always mean high validity, and it must be checked whether the measurement is appropriate for the purpose of the study and whether repeated measurements produce consistent results. This article explains the differences between the two concepts and the criteria to check in actual research.
Ⅰ. feasibility
Validity refers to how accurately a measurement tool or method in research actually measures what it is intended to measure.
- Content Validity: Concept: Content validity evaluates whether a measurement tool contains all important content for the research topic or purpose. Example: For example, if there is a test that evaluates students' math skills, the process of evaluating content validity is to check whether the test includes only addition and subtraction problems or whether it includes all various mathematical concepts such as multiplication, division, and geometry.
- Criterion-related Validity: Concept: Criterion-related validity evaluates the validity of a measurement tool through correlation with a specific criterion (or external measure). Types and examples: Concurrent Validity: Evaluation compared to standards at the current time. For example, if a new depression test shows a high correlation with an existing, validated depression test, it can be said to have high concurrent validity. Predictive Validity: Evaluation compared to future standards. For example, if college entrance exam scores are a good predictor of job achievement after graduation, the test has high predictive validity.
- Construct Validity: Concept: Construct validity evaluates whether a measurement tool actually reflects the theoretical construct well. Example: The process of reviewing structural validity is to check whether the questionnaire intended to measure ‘self-esteem’ is composed of questions that actually reflect self-esteem. For this purpose, various statistical analysis techniques (e.g. factor analysis) can be used.
- Ecological Validity: Concept: Ecological validity means whether research results can be equally applied in the real world. Example: If the results of a memory test performed in a laboratory environment show the same memory pattern in everyday life, it can be said to have high ecological validity.
- Concept: Content validity evaluates whether a measurement tool contains all important content for the research topic or purpose.
- Example: For example, if there is a test that evaluates students' math skills, the process of evaluating content validity is to check whether the test includes only addition and subtraction problems or whether it includes all various mathematical concepts such as multiplication, division, and geometry.
- Concept: Criterion-related validity evaluates the validity of a measurement tool through its correlation with a specific criterion (or external measure).
- Types and examples: Concurrent Validity: Evaluation compared to standards at the current time. For example, if a new depression test shows a high correlation with an existing, validated depression test, it can be said to have high concurrent validity. Predictive Validity: Evaluation compared to future standards. For example, if college entrance exam scores are a good predictor of job achievement after graduation, the test has high predictive validity.
- Concurrent Validity: Evaluation compared to standards at the current time. For example, if a new depression test shows a high correlation with an existing, validated depression test, it can be said to have high concurrent validity.
- Predictive Validity: Evaluation compared to future standards. For example, if college entrance exam scores are a good predictor of job achievement after graduation, the test has high predictive validity.
- Concept: Structural validity evaluates whether a measurement tool actually reflects the theoretical construct.
- Example: The process of reviewing structural validity is to check whether the questionnaire intended to measure ‘self-esteem’ is composed of questions that actually reflect self-esteem. For this purpose, various statistical analysis techniques (e.g. factor analysis) can be used.
- Concept: Ecological validity refers to whether research results can be equally applied in the real world.
- Example: If the results of a memory test performed in a laboratory environment show the same memory pattern in everyday life, it can be said to have high ecological validity.

Ⅱ. reliability
Reliability refers to whether a measurement tool or method in research consistently produces results. In other words, the degree to which similar results are obtained when measured repeatedly under the same conditions is evaluated.
- Internal Consistency: Concept: Internal consistency evaluates how well the items in a measurement tool reflect the same concept. Example: If a questionnaire consists of 10 questions, and all of these questions measure ‘self-esteem,’ internal consistency can be said to be high only when the correlation between each question is high. To evaluate this, Cronbach’s α coefficient is often used.
- Test-Retest Reliability: Concept: Retest reliability evaluates how consistent the results are when the same measurement tool is repeatedly applied to the same subject at regular time intervals. Example: When a psychological test is administered to the same person twice, two months apart, if the scores on both tests are similar, the test's test-retest reliability can be said to be high.
- Parallel-Forms Reliability: Concept: Parallel-Forms Reliability evaluates the consistency between two different forms of measurement tools designed to measure the same concept. Example: When there is a type A test paper and a type B test paper that evaluates mathematical ability, if the scores obtained when evaluating the same students with the two test papers are similar, the reliability of the alternative form can be said to be high.
- Inter-Rater Reliability: Concept: Inter-rater reliability refers to how consistent the results are when different evaluators independently evaluate the same object. Example: When several psychologists watch a recording of a counseling session for the same patient and each rate the level of depression, if their ratings are similar, inter-rater reliability can be said to be high.
- Split-Half Reliability: Concept: Split-Half Reliability is a method of evaluating the consistency of the entire test by dividing the data obtained from one test into half and finding a correlation between the scores of each half. Example: In a cognitive ability test consisting of 20 questions, if there is a high correlation between the scores of each part of the first 10 questions and the last 10 questions, the reliability of the split response can be said to be high.
- Concept: Internal consistency evaluates how well the items in a measurement tool reflect the same concept.
- Example: If a questionnaire consists of 10 questions, and all of these questions measure ‘self-esteem,’ internal consistency can be said to be high only when the correlation between each question is high. To evaluate this, Cronbach’s α coefficient is often used.
- Concept: Test-retest reliability evaluates how consistent the results are when the same measurement tool is repeatedly applied to the same subject at certain time intervals.
- Example: When a psychological test is administered to the same person twice, two months apart, if the scores on both tests are similar, the test's test-retest reliability can be said to be high.
- Concept: Alternative reliability assesses the consistency between two different types of measurement instruments designed to measure the same concept.
- Example: When there is a type A test paper and a type B test paper that evaluates mathematical ability, if the scores obtained when evaluating the same students with the two test papers are similar, the reliability of the alternative form can be said to be high.
- Concept: Inter-rater reliability refers to how consistent the results are when different evaluators independently evaluate the same object.
- Example: When several psychologists watch a recording of a counseling session for the same patient and each rate the level of depression, if their ratings are similar, inter-rater reliability can be said to be high.
- Concept: Split response reliability is a method of evaluating the consistency of the entire test by dividing the data obtained from one test into half and finding a correlation between the scores of each half.
- Example: In a cognitive ability test consisting of 20 questions, if there is a high correlation between the scores of each part of the first 10 questions and the last 10 questions, the reliability of the split response can be said to be high.
Good article to read together
- 1. What is research? [R Statistics]
- 2. Variables and Measurements [R Statistics]
- 3. Measurement error [R statistics]
- 5. Research method [R statistics]
- Importance and usage of pipe operator %>%
Key Checklist
- Is the measurement tool appropriate for the research purpose?
- Do repeated measurements produce similar results?
- Isn't this a situation where reliability is high but validity is low?
- Has the validity been confirmed through existing research or expert review?
Good R statistics articles to read together
- What is research: Summary of research concepts for introduction to R statistics
- Variables and Measurement R Statistics: Understanding independent variables, dependent variables and measurement levels
- Measurement Error R Statistics: Easily Understand Random Error and Systematic Error
- Research Method Introduction to R Statistics: Understanding research design and analysis methods at a glance
FAQ
What is the difference between validity and reliability?
Validity refers to whether a measurement tool properly measures the concept being studied, and reliability refers to how consistent the results are when measured repeatedly. The two are related but not the same concept.
What are the criteria for judging a good measurement tool?
A good measurement tool must accurately measure concepts that fit the research purpose and produce stable results even when used repeatedly. A feasibility review and reliability review must be conducted together.
Does high reliability mean high validity?
Not necessarily. Even if the same results are repeated, if the wrong concept is being measured in the first place, reliability may be high but validity may be low.
Related Reading
- Related Thinknote article
- Related Thinknote article
- Related Thinknote article
- Related Thinknote article
- Related Thinknote article
FAQ
What is this article about?
This article is an English translation and global-reader adaptation of the Korean post “Validity/Reliability R Statistics: Criteria for judging a good measurement tool.” It preserves the original article’s main explanation, examples, and practical context.
Why is it translated into English?
The English version helps global readers access Thinknote articles through English search keywords while keeping the Korean source available as the original reference.
Where can I read the original Korean version?
You can read the original Korean article here: https://www.thinknote.co.kr/validity-reliability-r-statistics/