Diagnostic validity applies to any test, measurement, or decision-making strategy that categorizes people. Also referred to as categorical validity or, more pragmatically, as the 2 × 2 table, diagnostic validity examines the relationship between how a test categorizes a subject and in which category the subject actually is
Reliability is a statistical measure of how reproducible the survey instrument's data are. Validity measures how well a scale or measurement measures what it sets out to measure.
Cronbach's Alpha (α) - frequently used estimates of reliability and internal consistency of an instrument (shown as a number between 0 and 1). It is "connected to the inter-relatedness of the items within the test" (Tavakol & Dennick, 2011, p. 53). It is a "fundamental element in the evaluation of a measurement instrument" and is not the only measure of reliability (e.g., item-response theory) (p.53).
Krippendorff's α (alpha) is a general statistical measure of agreement among observers, measuring devices, or coders of data, designed to indicate their reliability. As a general measure, it is applicable to data on various levels of measurement (metrics) and includes some known coefficients as special cases. As a statistical measure, it maps samples from a population of data into a single chance corrected coefficient, a scale, indicating the extent to which the population of data can be relied on or trusted in subsequent analyses. Alpha equates reliability with the reproducibility of the data-generating process, measured by the agreement on what the data in question refer to or mean. Typical applications of α are content analyses where volumes of text need to be read and categorized, interview responses that require scaling or ranking before they can be treated statistically, or estimates of political or economic variables.
Cohen's Kappa coefficient (κ) is a statistical measure of the degree of agreement or concordance between two independent raters that takes into account the possibility that agreement could occur by chance alone.
Like other measures of interrater agreement, κ is used to assess the reliability of different raters or measurement methods by quantifying their consistency in placing individuals or items in two or more mutually exclusive categories. For instance, in a study of developmental delay, two pediatricians may independently assess a group of toddlers and classify them with respect to their language development into either “delayed for age” or “not delayed.” One important aspect of the utility of this classification is the presence of good agreement between the two raters. Agreement between two raters could be simply estimated as the percentage of cases in which both raters agreed. However, a certain degree of agreement is expected by chance alone. In other words, two raters could still agree on some occasions even if they were randomly assigning individuals into either category.
Frey, B. (2018). The Sage encyclopedia of educational research, measurement, and evaluation.(Vols. 1-4). Thousand Oaks,, CA: SAGE Publications, Inc. doi:
Salkind, N. J. (2010). Encyclopedia of research design (Vols. 1-0). Thousand Oaks, CA: SAGE Publications, Inc. doi: 10.4135/9781412961288