In both scenarios, the two judges disagree, but we can say that the disagreements in scene 2 are more than disagreements in scene 1. To handle this, we can define different levels of agreement, where 0 is completely disagreed and 1 is fully agreed. Zaiontz, Charles. Cohen`s Kappa. www.real-statistics.com/reliability/interrater-reliability/cohens-kappa/ Kappa value interpretation Landis & Koch (1977):<0 No agreement0 — .20 Slight.21 — .40 Fair.41 — .60 Moderate.61 — .80 Substantial.81–1.0 Perfect The revolutionary paper introducing kappa as a new technique was published by Jacob Cohen in 1960 in the journal Educational and Psychological Measurement.  Cohen`s Kappa statistic measures the reliability of interraters (sometimes called interobserver matching). The reliability or accuracy of the interstater occurs when your data evaluators (or collectors) give the same data element the same score. Another performance measure in the form of the kappa14 coefficient has also been proposed to unify various classification problems. Kappa is used for comparison between BCI systems with a different number of classes, with difficulty using CA for comparison .