• Contact us
  • E-Submission
ABOUT
BROWSE ARTICLES
JOURNAL POLICIES
FOR CONTRIBUTORS

Articles

Page Path

Review Article

Statistical Methods: Reliability Assessment and Method Comparison

The Ewha Medical Journal 2017;40(1):9-16. Published online: January 31, 2017

Clinical Trial Center, Ewha Womans University Mokdong Hospital, Seoul, Korea.

Corresponding author: Kyoung Ae Kong. Clinical Trial Center, Ewha Womans University Medical Center, 1071 Anyangcheon-ro, Yangcheon-gu, Seoul 07985, Korea. Tel: 82-2-2650-2069, Fax: 82-2-2650-6141, kkong@ewha.ac.kr
• Received: December 29, 2016   • Accepted: January 4, 2017

Copyright © 2017. Ewha Womans University School of Medicine

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 104 Views
  • 0 Download
  • 35 Crossref
prev next
  • The reliability of clinical measurements is critical to medical research and clinical practice. Newly proposed methods are assessed in terms of their reliability, which includes their repeatability, intra- and interobserver reproducibility. In general, new methods that provide repeatable and reproducible results are compared with established methods used clinically. This paper describes common statistical methods for assessing reliability and agreement between methods, including the intraclass correlation coefficient, coefficient of variation, Bland-Altman plot, limits of agreement, percent agreement, and the kappa statistic. These methods are more appropriate for estimating reliability than hypothesis testing or simple correlation methods. However, some methods of reliability, especially unscaled ones, do not clearly define the acceptable level of error in real size and unit. The Bland-Altman plot is more useful for method comparison studies as it assesses the relationship between the differences and the magnitude of paired measurements, bias (as mean difference), and degree of agreement (as limits of agreement) between two methods or conditions (e.g., observers). Caution should be used when handling heteroscedasticity of difference between two measurements, employing the means of repeated measurements by method in methods comparison studies, and comparing reliability between different studies. Additionally, independence in the measuring processes, the combined use of different forms of estimating, clear descriptions of the calculations used to produce indices, and clinical acceptability should be emphasized when assessing reliability and method comparison studies.
  • 1. Korean Society for Preventive Medicine.Preventive medicine and public health; 2nd ed. Seoul: Gyechuk Munwhasa; 2013.
  • 2. Szklo M, Nieto FJ. Epidemiology: beyond the basics; 2nd ed. Sudbury, MA: Jones and Bartlett Publishers; 2007.
  • 3. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med 1998;26:217-238.
  • 4. Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008;31:466-475.
  • 5. Petrie A, Sabin C. Medical statistics at a glance; 3rd ed. Chichester, UK: John Wiley & Sons; 2009.
  • 6. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135-160.
  • 7. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-428.
  • 8. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6:284-290.
  • 9. Rosner B. Fundamentals of biostatistics; 7th ed. Boston, MA: Duxbury Press; 2006.
  • 10. Hirschmann MT, Konala P, Amsler F, Iranpour F, Friederich NF, Cobb JP. The position and orientation of total knee replacement components: a comparison of conventional radiographs, transverse 2D-CT slices and 3D-CT reconstruction. J Bone Joint Surg Br 2011;93:629-633.
  • 11. Kim CH, Chung CK, Hong HS, Kim EH, Kim MJ, Park BJ. Validation of a simple computerized tool for measuring spinal and pelvic parameters. J Neurosurg Spine 2012;16:154-162.
  • 12. Donner A, Zou G. Testing the equality of dependent intraclass correlation coefficients. J R Stat Soc Ser D Stat 2002;51:367-379.
  • 13. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-310.
  • 14. Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol 2003;22:85-93.
  • 15. Johnsson AA, Fagman E, Vikgren J, Fisichella VA, Boijsen M, Flinck A, et al. Pulmonary nodule size evaluation with chest tomosynthesis. Radiology 2012;265:273-282.
  • 16. Bland M. Correction to section “Measuring agreement using repeated measurements” in Bland and Altman (1986) [Internet] 2009;07 03 cited 2016 Dec 19. Available from: https://www.users.york.ac.uk/~mb55/meas/repeated.htm
  • 17. Hanneman SK. Design, analysis, and interpretation of method-comparison studies. AACN Adv Crit Care 2008;19:223-234.
  • 18. Bruton A, Conway JH, Holgate ST. Reliability: what is it, and how is it measured? Physiotherapy 2000;86:94-99.
  • 19. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174.
  • 20. Fleiss JL. Statistical methods for rates and proportions; 2nd ed. New York, NY: John Wiley and Sons; 1981.
  • 21. Altman DG. Practical statistics for medical research; London, UK: Chapman & Hall/CRC; 1991.
  • 22. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions; 3rd ed. Hoboken, NJ: John Wiley & Sons; 2003.
  • 23. StataCorp.STATA base reference manual (release 13); College Station, TX: Stata Press; 2013.
Fig. 1

Intraclass correlation coefficient and Pearson's correlation coefficient as indices for intra- or interobserver reliability. ICC, intraclass correlation coefficient; correlation coefficient, Pearson's correlation coefficient.

emj-40-9-g001.jpg
Fig. 2

Graphical presentation of agreement. A case where the greater magnitude of measurements has the greater difference.

emj-40-9-g002.jpg
Fig. 3

Graphical presentation of agreement. A case where an increase in the variability of the differences is based on an increase in the magnitude of measurements.

emj-40-9-g003.jpg
Fig. 4

Measurements of pulmonary nodule size using two radiological methods (shown is a Bland-Altman plot).

emj-40-9-g004.jpg
Table 1

Agreement between observers A and B on binary measurements

emj-40-9-i001.jpg
Table 2

Agreement between methods A and B on measurements with four-category results

Number in parentheses indicates the weight used for calculation of the weighted kappa.

emj-40-9-i002.jpg

Figure & Data

Fig. 1

Intraclass correlation coefficient and Pearson's correlation coefficient as indices for intra- or interobserver reliability. ICC, intraclass correlation coefficient; correlation coefficient, Pearson's correlation coefficient.

emj-40-9-g001.jpg
Fig. 2

Graphical presentation of agreement. A case where the greater magnitude of measurements has the greater difference.

emj-40-9-g002.jpg
Fig. 3

Graphical presentation of agreement. A case where an increase in the variability of the differences is based on an increase in the magnitude of measurements.

emj-40-9-g003.jpg
Fig. 4

Measurements of pulmonary nodule size using two radiological methods (shown is a Bland-Altman plot).

emj-40-9-g004.jpg
Table 1

Agreement between observers A and B on binary measurements

emj-40-9-i001.jpg
Table 2

Agreement between methods A and B on measurements with four-category results

Number in parentheses indicates the weight used for calculation of the weighted kappa.

emj-40-9-i002.jpg

References

  • 1. Korean Society for Preventive Medicine.Preventive medicine and public health; 2nd ed. Seoul: Gyechuk Munwhasa; 2013.
  • 2. Szklo M, Nieto FJ. Epidemiology: beyond the basics; 2nd ed. Sudbury, MA: Jones and Bartlett Publishers; 2007.
  • 3. Atkinson G, Nevill AM. Statistical methods for assessing measurement error (reliability) in variables relevant to sports medicine. Sports Med 1998;26:217-238.
  • 4. Bartlett JW, Frost C. Reliability, repeatability and reproducibility: analysis of measurement errors in continuous variables. Ultrasound Obstet Gynecol 2008;31:466-475.
  • 5. Petrie A, Sabin C. Medical statistics at a glance; 3rd ed. Chichester, UK: John Wiley & Sons; 2009.
  • 6. Bland JM, Altman DG. Measuring agreement in method comparison studies. Stat Methods Med Res 1999;8:135-160.
  • 7. Shrout PE, Fleiss JL. Intraclass correlations: uses in assessing rater reliability. Psychol Bull 1979;86:420-428.
  • 8. Cicchetti DV. Guidelines, criteria, and rules of thumb for evaluating normed and standardized assessment instruments in psychology. Psychol Assess 1994;6:284-290.
  • 9. Rosner B. Fundamentals of biostatistics; 7th ed. Boston, MA: Duxbury Press; 2006.
  • 10. Hirschmann MT, Konala P, Amsler F, Iranpour F, Friederich NF, Cobb JP. The position and orientation of total knee replacement components: a comparison of conventional radiographs, transverse 2D-CT slices and 3D-CT reconstruction. J Bone Joint Surg Br 2011;93:629-633.
  • 11. Kim CH, Chung CK, Hong HS, Kim EH, Kim MJ, Park BJ. Validation of a simple computerized tool for measuring spinal and pelvic parameters. J Neurosurg Spine 2012;16:154-162.
  • 12. Donner A, Zou G. Testing the equality of dependent intraclass correlation coefficients. J R Stat Soc Ser D Stat 2002;51:367-379.
  • 13. Bland JM, Altman DG. Statistical methods for assessing agreement between two methods of clinical measurement. Lancet 1986;1:307-310.
  • 14. Bland JM, Altman DG. Applying the right statistics: analyses of measurement studies. Ultrasound Obstet Gynecol 2003;22:85-93.
  • 15. Johnsson AA, Fagman E, Vikgren J, Fisichella VA, Boijsen M, Flinck A, et al. Pulmonary nodule size evaluation with chest tomosynthesis. Radiology 2012;265:273-282.
  • 16. Bland M. Correction to section “Measuring agreement using repeated measurements” in Bland and Altman (1986) [Internet] 2009;07 03 cited 2016 Dec 19. Available from: https://www.users.york.ac.uk/~mb55/meas/repeated.htm
  • 17. Hanneman SK. Design, analysis, and interpretation of method-comparison studies. AACN Adv Crit Care 2008;19:223-234.
  • 18. Bruton A, Conway JH, Holgate ST. Reliability: what is it, and how is it measured? Physiotherapy 2000;86:94-99.
  • 19. Landis JR, Koch GG. The measurement of observer agreement for categorical data. Biometrics 1977;33:159-174.
  • 20. Fleiss JL. Statistical methods for rates and proportions; 2nd ed. New York, NY: John Wiley and Sons; 1981.
  • 21. Altman DG. Practical statistics for medical research; London, UK: Chapman & Hall/CRC; 1991.
  • 22. Fleiss JL, Levin B, Paik MC. Statistical methods for rates and proportions; 3rd ed. Hoboken, NJ: John Wiley & Sons; 2003.
  • 23. StataCorp.STATA base reference manual (release 13); College Station, TX: Stata Press; 2013.

Citations

Citations to this article as recorded by  
  • Analysis of the Reliability of Feather Sections for Corticosterone Measurement in Pekin Ducks
    Se-Jin Lim, Chan Ho Kim, Ka Young Yang, Woo Do Lee, Su Mi Kim, Yang-Ho Choi, Jung Hwan Jeon
    Animals.2025; 15(2): 138.     CrossRef
  • The dynamics of perceived justice and its outcomes in the online tourism sector: inter-relationships and temporal and carryover effects
    Kowoon Kim, Hong-Youl Ha
    Current Issues in Tourism.2024; 27(24): 4740.     CrossRef
  • Accuracy of a Smart Diaper System for Nursing Home Residents for Automatically Detecting Voided Volume: Instrument Validation Study
    Jae Heon Kim, Ui Cheol Lee, Byeong Hun Jeong, Byeong Uk Kang, Sung Ryul Shim, In Gab Jeong
    JMIR Formative Research.2024; 8: e58583.     CrossRef
  • Translation and validation of the Korean Version of the Reflux Symptom Score
    Hye Kyu Min, So Young Jeon, Jerome R Lechien, Jung Min Park, Hwanhee Park, Jung-wan Yu, Suk Kim, Su Jin Jeong, Jung wook Kang, Kim Su il, Lee Young chan, Young-Gyu Eun, Seong-Gyu Ko
    Journal of Voice.2024; 38(2): 545.e1.     CrossRef
  • Development and validation of a semi-quantitative food frequency questionnaire as a tool for assessing dietary vitamin D intake among Korean women
    Hye Ran Shin, SuJin Song, Sun Yung Ly
    Nutrition Research and Practice.2024; 18(6): 872.     CrossRef
  • The Risk of Contamination according to Hairstyle during Aseptic Procedures in Nursing Students: An Observational Study
    Se Young Lim, Eun Jung Kim
    Journal of Korean Academy of Fundamentals of Nursing.2024; 31(3): 295.     CrossRef
  • A Novel Dental Plaque Index Using Intraoral Camera Images
    Ji-Soo Kim
    Journal of Dental Hygiene Science.2024; 24(3): 200.     CrossRef
  • Test–retest Reliability and Concurrent Validity of a Headphone and Necklace Posture Correction System Developed for Office Workers
    Gyu-hyun Han, Chung-hwi Yi, Seo-hyun Kim, Su-bin Kim, One-bin Lim
    Physical Therapy Korea.2023; 30(3): 174.     CrossRef
  • Clinical Evaluation of a Rapid Diagnostic Test Kit for Canine Parvovirus and Coronavirus
    Chaeyeong MIN, Won-Shik KIM, Chom-Kyu CHONG, Yong LIM
    Korean Journal of Clinical Laboratory Science.2023; 55(1): 45.     CrossRef
  • A Study on the Validity and Test-retest Reliability of the Measurement of the Head Tilt Angle of the Smart Phone Application ‘KPIMT Torticollis Protractor’
    Seong Hyeok Song, Ji Su Park, Ki Yeon Song, Ki Hyun Baek, Seung Hak Yoo, Ju Sang Kim
    The Journal of Korean Physical Therapy.2023; 35(6): 177.     CrossRef
  • Comparison of Corneal Higher-order Aberrations Measured by Scheimpflug Camera and Placido Disc-based Topography in Korean Patients
    Yeon Ju Lim, Do Hee Jung, Kang Yeun Pak, Chan-Ho Cho
    Journal of the Korean Ophthalmological Society.2023; 64(12): 1141.     CrossRef
  • Writing Development of Children before Entering Primary School: Focusing on Graphomotor Skills and Written Expression
    Boram No, Naya Choi
    Korean Journal of Child Studies.2022; 43(1): 47.     CrossRef
  • Clinical Accuracy of Non-Contact Forehead Infrared Thermometer Measurement in Children: An Observational Study
    Yeon-Mi Kim, Myung-Roul Jang, Ju-Ryoung Moon, Goeun Park, Ye-Jin An, Jeong-Meen Seo
    Children.2022; 9(9): 1389.     CrossRef
  • A Prototype of a Stereoacuity Test Using a Head-Mounted Display
    Hyuna Cho, Hyosun Kim, Rang Kyun Mok, Sung Eun Park, Wungrak Choi, Sueng-Han Han, Jinu Han
    Journal of the Korean Ophthalmological Society.2022; 63(3): 301.     CrossRef
  • Lumbar Spine Computed Tomography to Magnetic Resonance Imaging Synthesis Using Generative Adversarial Network: Visual Turing Test
    Ki-Taek Hong, Yongwon Cho, Chang Ho Kang, Kyung-Sik Ahn, Heegon Lee, Joohui Kim, Suk Joo Hong, Baek Hyun Kim, Euddeum Shim
    Diagnostics.2022; 12(2): 530.     CrossRef
  • Assessing Agreement Between Upright and Supine Head Roll Tests for Horizontal Semicircular Canal Benign Paroxysmal Positional Vertigo
    Tae Ho Kim, Jae Sang Han, Jae Hong Han, Dong-Hee Lee, Yeonji Kim, Shi Nae Park, Kyoung-Ho Park, Jae-Hyun Seo
    Korean Journal of Otorhinolaryngology-Head and Neck Surgery.2022; 65(9): 497.     CrossRef
  • Effect of Unmeasured Time Hours on Occupational Noise Exposure Assessment in the Shipbuilding Process in Korea
    Jaewoo Shin, Seokwon Lee, Kyoungho Lee, Hyunwook Kim
    International Journal of Environmental Research and Public Health.2021; 18(16): 8847.     CrossRef
  • Repeatability of Bruch’s Membrane Opening-minimum Rim Width in Age-related Macular Degeneration and Diabetic Macular Edema
    Bum Jun Kim, Woo Hyuk Lee, Ki Yup Nam, Ji Hye Kim, Tae Seen Kang, Hyun Kyung Cho, Yong Seop Han
    Journal of the Korean Ophthalmological Society.2021; 62(11): 1490.     CrossRef
  • Reliability and Validity of an Ultrasonic Device for Measuring Height in Adults
    Seon Hwa Cho, Young Gyu Cho, Hyun Ah Park, A Ra Bong
    Korean Journal of Family Medicine.2021; 42(5): 376.     CrossRef
  • A Health Information Quality Assessment Tool for Korean Online Newspaper Articles: Development Study
    Naae Lee, Seung-Won Oh, Belong Cho, Seung-Kwon Myung, Seung-Sik Hwang, Goo Hyeon Yoon
    Journal of Medical Internet Research.2021; 23(7): e24436.     CrossRef
  • A Comparison of Central Corneal Thickness Measurements and Measurement Repeatability Using Three Imaging Modalities
    Sang Earn Woo, Si Hyung Lee
    Journal of the Korean Ophthalmological Society.2021; 62(2): 184.     CrossRef
  • Development and validation of prediction equations for the assessment of muscle or fat mass using anthropometric measurements, serum creatinine level, and lifestyle factors among Korean adults
    Gyeongsil Lee, Jooyoung Chang, Seung-sik Hwang, Joung Sik Son, Sang Min Park
    Nutrition Research and Practice.2021; 15(1): 95.     CrossRef
  • A Study on the Characteristics of Indoor Radon Concentration in Water Curtain Cultivation Facilities
    Sang-cheol Kim, Chan-ju Park, Jae-hyuk Choi, Min-a Seo, Jin-gyun Eom, Mi-sun Park
    Journal of Environmental Analysis, Health and Toxicology.2021; 24(2): 84.     CrossRef
  • Assessment of Repeatability and Reproducibility of Non-Invasive TBUT Measurement Using the Bland-Altman Plot
    Yee-Rin Jung, Hyung-Min Park, Byoung-Sun Chu
    Journal of Korean Ophthalmic Optics Society.2021; 26(4): 307.     CrossRef
  • Effect of depth of anesthesia on the phase lag entropy in patients undergoing general anesthesia by propofol
    Jae Hong Park, Sang Eun Lee, Eunsu Kang, Yei Heum Park, Hyun-seong Lee, Soo Jee Lee, Dongju Shin, Gyu-Jeong Noh, Il Hyun Lee, Ki Hwa Lee
    Medicine.2020; 99(30): e21303.     CrossRef
  • Reproducibility of abnormality detection on chest radiographs using convolutional neural network in paired radiographs obtained within a short-term interval
    Yongwon Cho, Young-Gon Kim, Sang Min Lee, Joon Beom Seo, Namkug Kim
    Scientific Reports.2020;[Epub]     CrossRef
  • Repeatability and Reproducibility of Tear Meniscus Evaluations Using Two Different Spectral Domain-optical Coherence Tomography
    Jin Ha Kim, Kyu Ryong Choi, Roo Min Jun, Kyung Eun Han
    Journal of the Korean Ophthalmological Society.2019; 60(10): 929.     CrossRef
  • Reliability and Validity of Non-invasive Blood Pressure Measurement System Using Three-Axis Tactile Force Sensor
    Sun-Young Yoo, Ji-Eun Ahn, György Cserey, Hae-Young Lee, Jong-Mo Seo
    Sensors.2019; 19(7): 1744.     CrossRef
  • Efficacy of the Mobile Three-Dimensional Wound Measurement System in Pressure Ulcer Assessment
    Dongkeun Jun, Hyungon Choi, Jeenam Kim, Myungchul Lee, Soonheum Kim, Dongin Jo, Cheolkeun Kim, Donghyeok Shin
    Journal of Wound Management and Research.2019; 15(2): 78.     CrossRef
  • Evaluation of Image Receptor Characteristics in Computed Radiography System Using Exposure Index in International Electrotechnical Commission (Ⅰ)
    Park Hyemin, Yoon Yongsu, Roh Younghoon, Kim Sungjun, Na Chanyoung, Han Taeho, Kim Jungsu, Jeong hoiwoun, Kim Jungmin
    Journal of Radiological Science and Technology.2019; 42(4): 291.     CrossRef
  • Short-term Reproducibility of Pulmonary Nodule and Mass Detection in Chest Radiographs: Comparison among Radiologists and Four Different Computer-Aided Detections with Convolutional Neural Net
    Young-Gon Kim, Yongwon Cho, Chen-Jiang Wu, Sejin Park, Kyu-Hwan Jung, Joon Beom Seo, Hyun Joo Lee, Hye Jeon Hwang, Sang Min Lee, Namkug Kim
    Scientific Reports.2019;[Epub]     CrossRef
  • Evaluating test-retest reliability in patient-reported outcome measures for older people: A systematic review
    Myung Sook Park, Kyung Ja Kang, Sun Joo Jang, Joo Yun Lee, Sun Ju Chang
    International Journal of Nursing Studies.2018; 79: 58.     CrossRef
  • Clinical Assessment of Cellulose Tube-Type Tear Test Kit
    Jung-Eun Park, Myeong-Jin Jeong, Koon-Ja Lee
    The Korean Journal of Vision Science.2018; 20(3): 305.     CrossRef
  • Cross-cultural Adaptation and Validation of the eHealth Literacy Scale in Korea
    Sun Ju Chang, Eunjin Yang, Hyunju Ryu, Hee Jung Kim, Ju Young Yoon
    Korean Journal of Adult Nursing.2018; 30(5): 504.     CrossRef
  • Comparison of the Utility of dnaJ and 16S rDNA Sequences for Identification of Clinical Isolates of Vibrio Species
    In-Sun Choi, Dae Soo Moon, Geon Park, Seong-Ho Kang, Choon-Mee Kim, Young-Joon Ahn, Dong-Min Kim, Na Ra Yun, Dong Hoon Lim, Sung Heui Shin, Joong-Ki Kook, Young-Hyo Chang, Sook-Jin Jang
    Laboratory Medicine Online.2018; 8(1): 7.     CrossRef
CanvasJS.com
CanvasJS.com
CanvasJS.com

Download Citation

Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:

Include:

Statistical Methods: Reliability Assessment and Method Comparison
Ewha Med J. 2017;40(1):9-16.   Published online January 31, 2017
Download Citation
Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

Format:
  • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
  • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
Include:
  • Citation for the content below
Statistical Methods: Reliability Assessment and Method Comparison
Ewha Med J. 2017;40(1):9-16.   Published online January 31, 2017
Close

Figure

  • 3
  • 0
  • 1
  • 2
  • 3
  • 0
Statistical Methods: Reliability Assessment and Method Comparison
Image Image Image Image
Fig. 1 Intraclass correlation coefficient and Pearson's correlation coefficient as indices for intra- or interobserver reliability. ICC, intraclass correlation coefficient; correlation coefficient, Pearson's correlation coefficient.
Fig. 2 Graphical presentation of agreement. A case where the greater magnitude of measurements has the greater difference.
Fig. 3 Graphical presentation of agreement. A case where an increase in the variability of the differences is based on an increase in the magnitude of measurements.
Fig. 4 Measurements of pulmonary nodule size using two radiological methods (shown is a Bland-Altman plot).
Statistical Methods: Reliability Assessment and Method Comparison

Agreement between observers A and B on binary measurements

Agreement between methods A and B on measurements with four-category results

Number in parentheses indicates the weight used for calculation of the weighted kappa.

Table 1 Agreement between observers A and B on binary measurements

Table 2 Agreement between methods A and B on measurements with four-category results

Number in parentheses indicates the weight used for calculation of the weighted kappa.

TOP Mpgyi