Home Can Fuzzy Relational Calculus Bring Complex Issues in Selection of Examiners into Focus?
Article Open Access

Can Fuzzy Relational Calculus Bring Complex Issues in Selection of Examiners into Focus?

  • Satish S. Salunkhe EMAIL logo , Yashwant Joshi and Ashok Deshpande
Published/Copyright: December 17, 2015
Become an author with De Gruyter Brill

Abstract

The examinee and the examiner play pivotal roles in the educational grading system. Students’ academic performance evaluation by multiple experts involves epistemic uncertainty, which can be modeled using a fuzzy set theory. How many evaluators/experts are almost similar in their perceptual subjective evaluation of the students answer paper? In other words, how many experts are reliable for a particular evaluation task with a defined possibility level? In this paper, the focus is on object’s features (students’ marks) as a basis in the subjective evaluation process to identify the degree of similarity among the domain experts. The case study reveals that 11 out of 20 evaluators are similar in their decision making of students’ academic performance with possibility (α-level cut, 0.98). The inter-rater reliability (κ-coefficient) among the selected 11 teachers is 0.41, which signifies a fair/moderate agreement in the evaluation process. This paper proposes an approach that is useful for the selection of experts having similar perceptions in judgment. This paper demonstrates a case study showing how it is useful to educational policy makers in the selection of examiners.

1 Introduction

It is crucial to identify unbiased experts with similar perceptions in their judgment. Several researchers have studied this important aspect from various viewpoints: Einhorn [11] argued that a consensus between experts is a necessary condition for expertise. Hoffmann [19] used a simple model designed by McGrew [31]. He inferred that consensus in the relevant expert community is an indicator if the following two necessary conditions are fulfilled: reliability and statistical independence. There is no epistemic means to determine the general reliability of genuine moral experts [19]. Due to variability in experts’ judgments, it is essential to scrutinize multiple experts to obtain the cluster having a similarity in perception with increased validity and reliability in evaluation tasks [1, 5, 912, 15, 18, 28, 29, 34, 35, 41, 42, 44, 47]. Obtaining a single distribution of elicited information that comprises several experts’ beliefs is desirable [8, 33]. The correlation between the experts’ judgments should be greater, perhaps much greater, when they are judging the same trait than when they are judging different traits [42]. Goldman [17] discussed about how laypersons should evaluate the testimony of experts and decide which of two or more rival experts is most credible. Goldman’s definition of identification criteria for experts [16, 17] has been improved by Scholz [40]. Only the expert has epistemic access to the knowledge of the domain D of expertise (Goldman calls it esoteric knowledge). By contrast, the layperson only has access to exoteric knowledge, i.e. knowledge outside the domain D (Ref. [17], p. 94). The crucial question is: how can a layperson (merely with the help of the exoteric knowledge) identify an expert without having the relevant esoteric knowledge and the cognitive abilities? Ashton [1] suggested an approach to measure the validity of an experts group while adding a new expert in the group. The studies by Ashton and Ashton [2], Libby and Blashfield [27], Makridakis and Winkler [30], and Winkler and Makridakis [49] follow factorial experimental design for experts selection using group validity. This experimental design has more computational time complexity. Hierarchical clustering can be obtained using fuzzy relational calculus, with a factorial experimental design [24, 25, 43, 48, 51].

After the emergence of the fuzzy set theory in the simple task of looking at relations as fuzzy sets in the universe, as accomplished in a celebrated paper by Zadeh [51], he introduced the concept of fuzzy relation, defined as the notion of equivalence, and gave the concept of fuzzy ordering. Several researchers have made seminal contributions and extended the concept of fuzzy relations. Identification of similar experts/examiners in performance evaluation relates to mostly fuzzy classification based on fuzzy similarity relation. The mathematical concept of fuzzy equivalence relation is the basis for fuzzy classification. Although in the literature [3, 4, 6, 7, 20, 22, 26, 32, 38, 45, 46], various approaches for the subjective evaluation of students’ answer script are proposed using fuzzy sets and logic, there are no references for how to evaluate/classify the expertise of experts before they will evaluate the students’ answer scripts. How many teachers are almost similar in their subjective judgment while evaluating students’ answer script? In other words, how many teachers are reliable for a particular evaluation task with a defined possibility level (α-level cut)? The authors have made an attempt to address this issue using fuzzy sets as a basis (part A) with a focus on fuzzy relational calculus to identify similar experts [39]. The study in part A relates to the agreement of teachers based on fuzzy sets at a defined α-level cut using fuzzy similarity measures. It is essential to confirm the similarity among teachers based on actual evaluation of the object (students) as an additional measure.

The paper is organized as follows: Section 2 refers to mathematical preliminaries, while an approach for the selection of teachers for students’ evaluation, based on fuzzy sets theoretic operations and fuzzy relational calculus, is described in Section 3. The case study for selecting fuzzy similarity-based examiner/teacher is covered in Section 4. The results and discussion of the case study are discussed in Section 5. The conclusion and future scope for research are integral parts of Section 6.

2 Preliminaries and Notations

This section briefly describes fuzzy relation/fuzzy relational calculus [52] operations used in the paper.

Definition 2.1: Let U be a universe set. A fuzzy set A of U is defined with a membership μA (x)→[0, 1], where μA (x), ∀xU indicates the degree of x in A [50, 51].

Definition 2.2: Let R be a fuzzy relation on X×Y, i.e. R={((x, y), fR (x, y))|(x, y)∈X×Y}, the α-cut matrix Rα is denoted by

Rα={((x,y),fR(x,y))|fR(x,y)=1, if_fR(x,y)α;fR(x,y)=0; if_fR(x,y)<α,(x,y)X×Y,α[0,1]}.

Definition 2.3: Let RX×Y and SY×Z be fuzzy relations; the max-min composition RS is defined by

RS={((x,z),maxy{minx,z{fR(x,y),fS(y,z)}})|xX,yY,zZ}.

Definition 2.4: A fuzzy relation R on X×X is called a fuzzy equivalence relation if the following three conditions held [37, 50, 51]:

  1. R is reflexive if fR (x, x)=1, ∀xX.

  2. R is symmetric if fR (x, y)=fR (y, x), ∀x, yX.

  3. R is transitive if R(2)=(RR)⊂R, or more explicitly

    fR(x,z)maxy{{minx,z{fR(x,y),fR(y,z)}},x,y,zX.

Definition 2.5: A fuzzy relation R on X×Y is called is a fuzzy compatible or tolerance or proximity relation if it satisfies reflexive and symmetric conditions.

Definition 2.6: The transitive closure, RT , of a fuzzy relation R is defined as the relation that is transitive, contain R, and has the smallest possible membership grades.

Definition 2.7: Let R be a fuzzy compatibility relation on a finite universal set X with |X|=n, then the max-min transitive closure of R in the relation is defined as the relation R(n−1) [21, 25].

Algorithm A: Find the transitive closure RT of fuzzy compatibility relation [25].

  1. Calculate R(2) if R(2)R or R(2)=R, then transitive closure RT =R and stop. Otherwise, k=2, go to step 2.

  2. If 2kn−1, then RT =R(n−1) and stop. Otherwise, calculate R(2k)=R(2k1)R(2k1), if R(2k)=R(2k1), then transitive closure RT=R(2k) and stop. Otherwise, go to step 3.

  3. k=k+1, go to step 2.

Definition 2.8: A partition of S means a family of disjoint subsets, say {S1, S2, …, Sn }, such that the union of these subsets coincides with the entire set S. In other words, S1S2∪…∪Sn and SiSj =ϕ, ∀ij.

Definition 2.9: The normalized Euclidean distance [23, 26]:

(1)d(A,B)=i=1n|mA(xi)mB(xi)|2n2, (1)

where n is the cardinality of the universe of discourse, A and B are fuzzy sets, and d is a distance measure.

Definition 2.10: Fleiss κ [13, 14] is a statistical measure of inter-rater reliability. κ can be defined as

(2)κ=P¯P¯e1P¯e. (2)

The factor 1−P̅e gives the degree of agreement that is attainable above chance, and P̅−P̅e gives the degree of agreement actually achieved above chance. If the raters agree strongly, then κ=1. If there is no agreement among the raters, then κ<0.

Definition 2.11: The cosine amplitude method is manipulated on a collection of n data samples. The collected data set is represented as X={x1, x2, …, xn }. Each of the n samples is represented as a vector with an m dimension, xi ={xi1, xi2, …, xim }. The position of each datum in space is represented by m feature values. The relation value rij reflects a similarity relationship between xi and xj data. For n data samples, the size of the relation matrix will be n×n. The relation matrix always obeys the rules of being reflexive and symmetric, and so it is a tolerance relation. All rij values are always in the interval of [0, 1] in this method, and they are calculated through the following equation:

(3)rij=|k=1mxikxjk|(k=1mxik2)(k=1mxjk2),i=j=1,2,,n. (3)

If xi and xj are very similar to each other, rij becomes close to one. Unlike this situation, if they are very dissimilar to each other, rij becomes close to zero.

3 The Proposed Approach

Fuzzy sets as a basis (Part A) have been used to identify similar experts by the authors as a first step in the experts selection process [39]. The study in part A relates to agreement among experts at a defined α-cut level in construction of fuzzy sets. However, it is also equally important to identify the pairwise similarity among experts when they actually evaluate the features of the object. This has prompted the authors to apply one of the facets of other fuzzy relational calculus, which is explained below. Further screening of experts can be done using “Objects Features as a Basis (Part B),” which is proposed below.

3.1 Object’s Characteristics (e.g. Students’ Academic Performance) as a Basis (Part B)

The steps to follow using cosine amplitude similarity calculation among n experts for each object Ok are summarized below.

  1. For each object Ok , extract a column feature vector Vjk ={m(Qjk1), m(Qjk2), …, m(Qjkn )}, which is a collection of all feature values (i.e. attributes) Qi , from each expert Ej , where 1≤in.

  2. Each feature may have a different weight, wi . Normalize each feature values QjkiVj with respect to the weightage of feature wi to obtain the normalized vector of an object Ok , where 1≤in.

    (4)m(Qjki)=m(Qjki)/wi. (4)
    (5)Vjk={m(Qjk1),m(Qjk2),,m(Qjkn)}. (5)
  3. Normalize the column feature vector Vjk to obtain Vjk,

    (6)m(Qjki)=m(Qjki)i=1nm(Qjki), (6)
    (7)Vjk={m(Qjk1),m(Qjk2),,m(Qjkn)}. (7)
  4. Apply the similarity measure technique on Vjk, i.e. on <V1k,V2k,,Vnk>, to get the fuzzy tolerance relation similarity matrix [Sk ]n×n . This similarity matrix reflects pairwise similarity index between n experts in the context of an object Ok .

  5. Using max-min composition and transitive closure, transform the fuzzy tolerance relation [Sk ]n×n obtained in step 4 to fuzzy equivalence relation [Sk]n×n.

  6. For each α-cut value (i.e. possibility) of fuzzy equivalence relation [Sk]n×n, find the clusters (i.e. partitions) of experts.

  7. Select the cluster having the highest cardinality of experts. If there are more than one clusters having equal or marginal equal numbers of experts, then select all those clusters. We presuppose that the confidence in the decision making of similar experts selection could strengthen when a group comprising the highest number of experts is selected at a defined α-cut (possibility) level.

  8. Repeat steps 1–7 for each object Ok . Calculate the frequency distribution of each expert based on its occurrence in selected clusters obtained in step 7 for all k objects at a reference α-cut (e.g. α=0.98).

  9. Organizational policies may then decide a certain cutoff limit on the rate of experts’ occurrence in selected clusters. Select top n experts who are above the cutoff limit (e.g. above 80%).

4 Case Study

The case study relates the selection of examiners/teachers for evaluating the answer script of students having a similar evaluation perception. The examination answer script samples in subject Marathi language were obtained from 237 secondary school students from three different institutions in Mumbai, India, during academic year 2013–2014. Each student wrote a 10–12-page solution with respect to 12 subjective questions. The questions selected in the question paper can be used to assess students’ writing ability. Answers were evaluated using a 10-point rubric for our study from the Secondary School (SSC) Board, Maharashtra. Also, 20 subject matter experts (teachers) from different schools were identified for the answer scripts evaluation. All 20 evaluators belonging to the “Marathi” department of the SSC board evaluated every 237 answer scripts. This experiment helped us observe the nature of consistency among all experts in the evaluation process. Experts were trained before any evaluation began with respect to the assessment rubric scale and evaluation process. Before the evaluation process begins, each evaluator expressed their evaluation judgment based on their perception for each linguistic variable. Five satisfaction levels, gi , to award marks scored for the 12 questions, are selected, like very poor, poor, average, good, and very good. Then, all the 20 teachers constructed fuzzy sets for each linguistic variable. The evaluation process for the pilot study occurred in May 2014. All 20 evaluators met together in the central assessment examination room in Marathi Vidyalaya, Mumbai (India), school campus for evaluation. Each teacher was given an answer script to grade. Each teacher awarded a score to every question in numeric value based on the weightage of the question, as shown in Table 1. All teachers reported the students’ performance on a separate result sheet. The evaluators made no marking on the answer script. In the evaluation process, the students’ identities were concealed using masks and identified using unique alphanumeric codes to preserve anonymity. Let

  • (Student)k denote the kth student, where 1≤k≤237;

  • Tj denote the jth teacher, where 1≤j≤20;

  • Qi denote the ith question, where 1≤i≤12;

  • m(Qjkn ) denote the marks awarded to question n, by teacher j, of student k; and

  • Wi denote the weightage of question i, as shown in Table 1.

Table 1

Weightage of Each Question.

Question no.Q1Q2Q3Q4Q5Q6Q7Q8Q9Q10Q11Q12
Marks (Wi)4335433448104

The typical computations to follow using the cosine amplitude similarity measure among 20 experts for (Student)12 are summarized below. The typical evaluation score sheet of student roll number 12 evaluated by 20 experts is shown in Table 2, which is normalized as shown in Table 3. A column-wise normalized data sheet is shown in Table 4. These normalized data of students’ marks corresponding to each teacher column is a vector that is used in pairwise cosine amplitude similarity measure computations among all teacher vectors. The similarity index value is in the range [0, 1], where 0 indicates absolute dissimilarity and 1 indicates absolute similarity among teachers in the context of evaluation judgment.

Table 2

Data Sheet for Student ID 12 Evaluated by 20 Experts.

Question no.Teacher 01Teacher 02Teacher 03Teacher 04Teacher 05Teacher 06Teacher 07Teacher 08Teacher 09Teacher 10Teacher 11Teacher 12Teacher 13Teacher 14Teacher 15Teacher 16Teacher 17Teacher 18Teacher 19Teacher 20
Q1332.533333242.534441.52.5332.5
Q222.5232.521.52.5331.5233332232
Q3221.52.5332333223331.51.52.532.5
Q4333333243.542.53444323.533
Q532.522.52.5113.5331.5244422223
Q62.522321.52.5222.52222231.52.522.5
Q72.52.5222.521.5222.511.522211.532.52.5
Q83.531.52.53322.5231.5233322.5432
Q93.52.53332222322.53331.522.522
Q1000000000000000000000
Q115.5567766.545745665447.555
Q1232.533.53.543433.52.521.511323.52.51.5
Table 3

Normalized Data Sheet of Student ID 12.

Question no.Teacher 01Teacher 02Teacher 03Teacher 04Teacher 05Teacher 06Teacher 07Teacher 08Teacher 09Teacher 10Teacher 11Teacher 12Teacher 13Teacher 14Teacher 15Teacher 16Teacher 17Teacher 18Teacher 19Teacher 20
Q17.57.56.257.57.57.57.57.55106.257.51010103.756.257.57.56.25
Q26.66678.33336.6667108.33336.666758.3333101056.6667101010106.66676.6667106.6667
Q36.66676.666758.333310106.66671010106.66676.6667101010558.3333108.3333
Q466666648785688864766
Q57.56.2556.256.252.52.58.757.57.53.75510101055557.5
Q68.33336.66676.6667106.666758.33336.66676.66678.33336.66676.66676.66676.66676.66671058.33336.66678.3333
Q78.33338.33336.66676.66678.33336.666756.66676.66678.33333.333356.66676.66676.66673.33335108.33338.3333
Q88.757.53.756.257.57.556.2557.53.7557.57.57.556.25107.55
Q98.756.257.57.57.555557.556.257.57.57.53.7556.2555
Q1000000000000000000000
Q115.5567766.545745665447.555
Q127.56.257.58.758.75107.5107.58.756.2553.752.52.57.558.756.253.75
Table 4

Column-Wise Normalized Data Sheet of Student ID 12.

Question no.Teacher 01Teacher 02Teacher 03Teacher 04Teacher 05Teacher 06Teacher 07Teacher 08Teacher 09Teacher 10Teacher 11Teacher 12Teacher 13Teacher 14Teacher 15Teacher 16Teacher 17Teacher 18Teacher 19Teacher 20
Q10.0920.10030.09330.0890.08950.1030.1190.09240.06640.10760.11230.11580.11620.11790.11930.05920.10930.08790.09710.0891
Q20.08180.11150.09950.11870.09940.09150.07940.10270.13270.10760.08980.1030.11620.11790.11930.15790.11660.07810.12940.095
Q30.08180.08920.07460.09890.11930.13730.10580.12320.13270.10760.11980.1030.11620.11790.11930.07890.08750.09770.12940.1188
Q40.07360.08030.08960.07120.07160.08240.06350.09860.09290.08610.08980.09270.09290.09430.09540.09470.070.0820.07770.0855
Q50.0920.08360.07460.07420.07460.03430.03970.10780.09960.08070.06740.07720.11620.11790.11930.07890.08750.05860.06470.1069
Q60.10220.08920.09950.11870.07950.06860.13230.08210.08850.08970.11980.1030.07740.07860.07950.15790.08750.09770.08630.1188
Q70.10220.11150.09950.07910.09940.09150.07940.08210.08850.08970.05990.07720.07740.07860.07950.05260.08750.11720.10790.1188
Q80.10740.10030.0560.07420.08950.1030.07940.0770.06640.08070.06740.07720.08710.08840.08950.07890.10930.11720.09710.0713
Q90.10740.08360.11190.0890.08950.06860.07940.06160.06640.08070.08980.09650.08710.08840.08950.05920.08750.07320.06470.0713
Q1000000000000000000000
Q110.06750.06690.08960.08310.08350.08240.10320.04930.06640.07530.07190.07720.06970.07070.05960.06320.070.08790.06470.0713
Q120.0920.08360.11190.10390.10440.13730.1190.12320.09960.09420.11230.07720.04360.02950.02980.11840.08750.10250.08090.0534

5 Results and Discussion

The procedure detailed in Section 3 was followed, and the results obtained are discussed below:

The relation in Table 3 is reflexive, symmetric but is not transitive and is a fuzzy tolerance relation between 20 experts for student 12. The fuzzy tolerance relation shown in Table 5 is transformed to a fuzzy equivalence relation, as shown in Table 6.

Table 5

Fuzzy Tolerance Relation among 20 Experts.

Student 12Teacher 01Teacher 02Teacher 03Teacher 04Teacher 05Teacher 06Teacher 07Teacher 08Teacher 09Teacher 10Teacher 11Teacher 12Teacher 13Teacher 14Teacher 15Teacher 16Teacher 17Teacher 18Teacher 19Teacher 20
Teacher 0110.98890.9760.97530.98060.93970.95260.95710.94710.98020.96180.97840.95630.94740.94640.91660.98690.98130.95970.9654
Teacher 020.988910.97370.97990.98690.95220.94920.96830.96780.99190.96050.98430.97210.96460.96380.92790.9950.98260.98580.9753
Teacher 030.9760.973710.98370.97620.94080.96250.95340.95320.98010.97130.97920.93760.92560.92180.92980.9680.96280.94810.9504
Teacher 040.97530.97990.983710.98410.95330.97590.96670.97480.98910.98520.98660.9530.94230.93950.96570.98160.9680.97330.9642
Teacher 050.98060.98690.97620.984110.97890.96520.97570.97660.9930.97440.98150.96020.94940.94650.91860.98420.98410.98490.9653
Teacher 060.93970.95220.94080.95330.978910.96260.95740.9450.96780.96130.94940.90920.89380.8910.88950.95120.97190.96650.9122
Teacher 070.95260.94920.96250.97590.96520.962610.93240.92390.96670.98140.96690.90710.89420.88880.92020.95190.96730.94490.9307
Teacher 080.95710.96830.95340.96670.97570.95740.932410.98530.98220.97030.96580.95620.94280.94420.93250.9660.95230.96790.9563
Teacher 090.94710.96780.95320.97480.97660.9450.92390.985310.97980.96010.96490.9610.95120.95060.9480.96190.94530.97890.9675
Teacher 100.98020.99190.98010.98910.9930.96780.96670.98220.979810.98390.99430.97640.96720.96550.93630.99060.9770.98680.9744
Teacher 110.96180.96050.97130.98520.97440.96130.98140.97030.96010.983910.98660.94630.93450.9330.94160.96440.95670.95830.9509
Teacher 120.97840.98430.97920.98660.98150.94940.96690.96580.96490.99430.986610.97960.97310.97120.93260.98550.96580.97420.9726
Teacher 130.95630.97210.93760.9530.96020.90920.90710.95620.9610.97640.94630.979610.99890.99830.89510.97460.93230.96570.9722
Teacher 140.94740.96460.92560.94230.94940.89380.89420.94280.95120.96720.93450.97310.998910.99930.88290.96650.92180.95880.9695
Teacher 150.94640.96380.92180.93950.94650.8910.88880.94420.95060.96550.9330.97120.99830.999310.88250.96540.91820.95850.9684
Teacher 160.91660.92790.92980.96570.91860.88950.92020.93250.9480.93630.94160.93260.89510.88290.882510.93190.9090.92540.9131
Teacher 170.98690.9950.9680.98160.98420.95120.95190.9660.96190.99060.96440.98550.97460.96650.96540.931910.97430.98010.9615
Teacher 180.98130.98260.96280.9680.98410.97190.96730.95230.94530.9770.95670.96580.93230.92180.91820.9090.974310.97130.9563
Teacher 190.95970.98580.94810.97330.98490.96650.94490.96790.97890.98680.95830.97420.96570.95880.95850.92540.98010.971310.9695
Teacher 200.96540.97530.95040.96420.96530.91220.93070.95630.96750.97440.95090.97260.97220.96950.96840.91310.96150.95630.96951
Table 6

Fuzzy Equivalence Relation among 20 Experts Using Max-Min Composition.

Student 12Teacher 01Teacher 02Teacher 03Teacher 04Teacher 05Teacher 06Teacher 07Teacher 08Teacher 09Teacher 10Teacher 11Teacher 12Teacher 13Teacher 14Teacher 15Teacher 16Teacher 17Teacher 18Teacher 19Teacher 20
Teacher 0110.98890.98370.98890.98890.97890.98140.98220.98220.98890.98660.98890.97960.97960.97960.96570.98890.98410.98680.9753
Teacher 020.988910.98370.98910.99190.97890.98140.98220.98220.99190.98660.99190.97960.97960.97960.96570.9950.98410.98680.9753
Teacher 030.98370.983710.98370.98370.97890.98140.98220.98220.98370.98370.98370.97960.97960.97960.96570.98370.98370.98370.9753
Teacher 040.98890.98910.983710.98910.97890.98140.98220.98220.98910.98660.98910.97960.97960.97960.96570.98910.98410.98680.9753
Teacher 050.98890.99190.98370.989110.97890.98140.98220.98220.9930.98660.9930.97960.97960.97960.96570.99190.98410.98680.9753
Teacher 060.97890.97890.97890.97890.978910.97890.97890.97890.97890.97890.97890.97890.97890.97890.96570.97890.97890.97890.9753
Teacher 070.98140.98140.98140.98140.98140.978910.98140.98140.98140.98140.98140.97960.97960.97960.96570.98140.98140.98140.9753
Teacher 080.98220.98220.98220.98220.98220.97890.981410.98530.98220.98220.98220.97960.97960.97960.96570.98220.98220.98220.9753
Teacher 090.98220.98220.98220.98220.98220.97890.98140.985310.98220.98220.98220.97960.97960.97960.96570.98220.98220.98220.9753
Teacher 100.98890.99190.98370.98910.9930.97890.98140.98220.982210.98660.99430.97960.97960.97960.96570.99190.98410.98680.9753
Teacher 110.98660.98660.98370.98650.98660.97890.98140.98220.98220.986610.98660.97960.97960.97960.96570.98660.98410.98660.9753
Teacher 120.98890.99190.98370.98910.9930.97890.98140.98220.98220.99430.986610.97960.97960.97960.96570.99190.98410.98680.9753
Teacher 130.97960.97960.97960.97960.97960.97890.97960.97960.97960.97960.97960.979610.99890.99890.96570.97960.97960.97960.9753
Teacher 140.97960.97960.97960.97960.97960.97890.97960.97960.97960.97960.97960.97960.998910.99930.96570.97960.97960.97960.9753
Teacher 150.97960.97960.97960.97960.97960.97890.97960.97960.97960.97960.97960.97960.99890.999310.96570.97960.97960.97960.9753
Teacher 160.96570.96570.96570.96570.96570.96570.96570.96570.96570.96570.96570.96570.96570.96570.965710.96570.96570.96570.9657
Teacher 170.98890.9950.98370.98910.99190.97890.98140.98220.98220.99190.98660.99190.97960.97960.97960.965710.98410.98680.9753
Teacher 180.98410.98410.98370.98410.98410.97890.98140.98220.98220.98410.98410.98410.97960.97960.97960.96570.984110.98410.9753
Teacher 190.98680.98680.98370.98680.98680.97890.98140.98220.98220.98680.98660.98680.97960.97960.97960.96570.98680.984110.9753
Teacher 200.97530.97530.97530.97530.97530.97530.97530.97530.97530.97530.97530.97530.97530.97530.97530.96570.97530.97530.97531

It is necessary to convert the fuzzy equivalence relation presented in Table 6 to a crisp value known as the defuzzification process, as shown in Table 7 at possibility α-value 0.98. Figure 1 shows the partitioning of 20 teachers at various intervals of α-cut for student S12. Cluster {1,2,3,4,5,7,8,9,10,11,12,17,18,19} having the maximum number of experts as shown in Table 8 at possibility α-value 0.98 for student ID 12 is selected.

Table 7

Fuzzy Equivalence Relation among 20 Experts Using Max-Min Composition.

Student 12Teacher 01Teacher 02Teacher 03Teacher 04Teacher 05Teacher 06Teacher 07Teacher 08Teacher 09Teacher 10Teacher 11Teacher 12Teacher 13Teacher 14Teacher 15Teacher 16Teacher 17Teacher 18Teacher 19Teacher 20
Teacher 0111111011111100001110
Teacher 0211111011111100001110
Teacher 0311111011111100001110
Teacher 0411111011111100001110
Teacher 0511111011111100001110
Teacher 0600000000000000000000
Teacher 0711111011111100001110
Teacher 0811111011111100001110
Teacher 0911111011111100001110
Teacher 1011111011111100001110
Teacher 1111111011111100001110
Teacher 1211111011111100001110
Teacher 1300000000000011100000
Teacher 1400000000000011100000
Teacher 1500000000000011100000
Teacher 1600000000000000010000
Teacher 1711111011111100001110
Teacher 1811111011111100001110
Teacher 1911111011111100001110
Teacher 2000000000000000000001
Figure 1: Dendrogram on Fuzzy Equivalence Relation [S′12]20 × 20${[{S'_{12}}]_{20\, \times \,20}}$ for Partitioning of Teachers on Different Intervals of α-Cut Using the Cosine-Amplitude Method.
Figure 1:

Dendrogram on Fuzzy Equivalence Relation [S12]20×20 for Partitioning of Teachers on Different Intervals of α-Cut Using the Cosine-Amplitude Method.

Table 8

The Partitioning of Teachers on Different Intervals of α-Cut 0.98 for Student ID 12.

Clustering methodPossibility α-valueNo. of clustersObtained partitions of all 20 experts {clusters with elements}, different clusters are separated by ‘;’ for Student ID-12
Cosine amplitude0.98145{{1,2,3,4,5,7,8,9,10,11,12,17,18,19}; {6}; {13,14,15}; {16}; {20}}

The detailed computational procedure given in Section 3 is performed on all 237 students in order to identify similar experts at the defined α-cut level (α=0.98). The frequency distribution of occurrence for each 20 experts in context with all 237 students is shown in Table 9. The final ranking of all 20 experts at α-cut (α=0.98) using the cosine-amplitude method for all 237 students, ranging from highest to lowest, is given below. A histogram of all teachers is shown in Figure 2.

Table 9

Ranking of 20 Expert’s at α-Cut 0.98 Using Cosine-Amplitude Method for 237 Students.

Teacher IDNo. of times expert’s occurrence in selected clusters out of 237 times% of expert’s occurrence in selected cluster at α-Cut 0.98
T1021992.41
T0121891.98
T0221691.14
T0820586.5
T1519783.12
T1419682.7
T1319682.7
T1119180.59
T1818276.79
T1217975.53
T0316870.89
T1715569.62
T0416268.35
T2016167.93
T0515866.67
T0715364.56
T1614159.49
T1913556.96
T069841.35
T097531.65
Figure 2: Histogram of 20 Experts for Belonging to Partition Having Maximum Cardinality at α-Cut (α=0.98) for All 237 Students.
Figure 2:

Histogram of 20 Experts for Belonging to Partition Having Maximum Cardinality at α-Cut (α=0.98) for All 237 Students.

(T10<T1<T2<T8<T15<T14<T13<T11<T18<T12<T3<T17<T4<T20<T5<T7<T16<T19<T6<T9)

Educational administrators/policy makers select 11 out of 20 experts using a benchmark of 70% over the experts’ occurrence frequency for all 237 students.

<T10,T1,T2,T8,T15,T14,T13,T11,T18,T12,T3>

are selected for evaluating the students’ academic performance in using the Marathi subject in Maharashtra State Board of Secondary School Certificate Exam, India.

The inter-rater reliability (Fleiss κ coefficient) among all 20 experts, selected 11 experts, and rejected 9 experts are computed for the dataset of 237 students’ evaluation score sheet, as shown in Table 10. The Fleiss κ coefficient for the selected 11 experts is 0.41, indicating moderate agreement among all 11 experts according to Fleiss’ guidelines to interpret the κ statistics. It can be also inferred that for all the 20 teachers as well as rejected 9 teachers, the computed κ coefficient is lower than that of the selected 11 teachers. This additional information will help the decision maker for the final selection of 11 experts. In our view, we can consider the perception of all the 11 experts for aggregating into multiexpert knowledgebase to obtain a fair result of evaluation. Eleven out of 20 experts are selected for future evaluation of the students’ performance of Marathi subject in Maharashtra State Secondary Examination Board, Mumbai, India.

Table 10

Summary of Fleiss κ Statistics for Agreement between All 20 Experts vs. Selected 11 Experts vs. Rejected Nine Experts.

Fleiss κ statisticsData set of all 20 teachersData set of selected 11 teachersData set of rejected 9 teachers
P_BAR0.4752110.49365170.4823996
Pe0.1418010.14139330.145761
κ Coefficient0.38849960.41026750.3940801

In the recent past, researchers have used various statistical and fuzzy logic-based methods in computing similarity between the experts with no general agreement. In summary, development of the state of the art, especially in expert selection, is a distant dream. In the absence of a standard universal approach, it was considered appropriate to use the authenticated method of κ coefficient for the validation of similar experts having high inter-rater agreement. The κ coefficient is invariably used in medical imaging and other related fields. The expert identification problem to justifiably distinguish between a reliable and an unreliable expert by lay people is highly debated in recent social epistemology [19]. The method for the selection of similar experts presented in this paper by a layperson with only esoteric knowledge can be used in several areas of science and technology in general, and selection of similar examiners in the education system in particular. The software implementation of this approach is done in MATLAB R2008a and Microsoft Access 2010.

6 Concluding Remarks

The fair and unbiased evaluation of students’ academic performance depends on the reliability in experts’ judgments. A combination of the epistemic uncertainty in experts’ subjective judgment and aleatory uncertainty in object’s features is the basis of the formalism presented in this paper. The authors have outline the formalism for the selection of experts based on their subjective judgment using fuzzy relational calculus. The identification of similar experts/examiners in performance evaluation relates to mostly fuzzy classification based on fuzzy similarity relation. The authors propose to extend this concept to large data sets in order to ensure its credibility. The method is somewhat like hierarchical clustering in multivariate data analysis. Are experts similar in their thinking? This is one of the questions that need to be answered while implementing a fuzzy inference system. The authors have looked into this issue and will incorporate the knowledgebase of the identified 11 experts in the fuzzy expert system to infer the performance of students in linguistic terms with a degree of certainty.

Bibliography

[1] R. H. Ashton, Combining the judgments of experts: how many and which ones?, Organ. Behav. Hum. Dec.38 (1986), 405–414.10.1016/0749-5978(86)90009-9Search in Google Scholar

[2] A. H. Ashton and R. H. Ashton, Aggregating subjective forecasts: some empirical results, Manage. Sci.31(1985), 1499–1508.10.1287/mnsc.31.12.1499Search in Google Scholar

[3] S. M. Bai and S. M. Chen, Automatically constructing grade membership functions for student’s evaluation for fuzzy grading systems, in: Proceeding World Automation Congress (WAC), Budapest, Hungary, 2006.10.1109/WAC.2006.376011Search in Google Scholar

[4] R. Biswas, An application of fuzzy sets in student’s evaluation, Fuzzy Set. Syst.74 (1995), 187–194.10.1016/0165-0114(95)00063-QSearch in Google Scholar

[5] J. S. Carroll and J. W. Payne, The psychology of parole decision processes: a joint application of attribution theory and information-processing psychology, in: J. S. Carroll and J. W. Payne, eds., Cognition and Social Psychology, pp. 13–32, Erlbaum, Hillsdale, NJ, 1976.Search in Google Scholar

[6] S. M. Chen and C. H. Lee, New methods for student’s evaluating using fuzzy set, in: Proceedings of International Conference on Artificial Intelligence, 1996.Search in Google Scholar

[7] S.-M. Chen and H.-Y. Wang, Evaluating student’s answer script based on interval valued fuzzy grade sheets, Expert Syst. Appl.36 (2009), 9839–9846.10.1016/j.eswa.2009.02.005Search in Google Scholar

[8] R. T. Clemen and R. L. Winkler, Combining probability distributions from experts in risk analysis, Risk Anal.19(1999), 187–203.10.1111/j.1539-6924.1999.tb00399.xSearch in Google Scholar

[9] A. A. DeSmet, D. G. Fryback and J. R. Thornbury, A second look at the utility of radiographic skull examination for trauma, Am J. Radiol.132 (1978), 95–99.10.2214/ajr.132.1.95Search in Google Scholar PubMed

[10] E. Ebbesen and V. Konecni, Decision making and information integration in the courts: the setting of bail, J. Pers. Soc. Psychol.32 (1975), 805–821.10.1037/0022-3514.32.5.805Search in Google Scholar

[11] H. Einhorn, Expert judgment: some necessary conditions and an example, J. Appl. Psychol.59 (1974), 562–571.10.1037/h0037164Search in Google Scholar

[12] K. A. Ericsson and N. Charness, Expert performance – its structure and acquisition, Am. Psychol.49 (1994), 725–747.10.1037/0003-066X.49.8.725Search in Google Scholar

[13] J. L. Fleiss, Statistical methods for rates and proportions, 2nd ed., John Wiley, New York, pp. 38–46, 1981.Search in Google Scholar

[14] J. L. Fleiss and J. Cohen, The equivalence of weighted κ and the intraclass correlation coefficient as measures of reliability, Educ. Psychol. Meas.33 (1973), 613–619.10.1177/001316447303300309Search in Google Scholar

[15] J. E. Foss, W. R. Wright and R. H. Coles, Testing the accuracy of field textures, Soil Sci. Soc. Am. Pro.39 (1975), 800–802.10.2136/sssaj1975.03615995003900040051xSearch in Google Scholar

[16] A. I. Goldman, Knowledge in Social World, Oxford, pp. 276–271, 1999.10.1093/0198238207.001.0001Search in Google Scholar

[17] A. I. Goldman, Experts: which ones should you trust?, Philos. Phenomen. Res.63 (2001), 85–110.10.1093/0195138791.003.0007Search in Google Scholar

[18] M. R. Grier, Decision making about patient care, Nurs. Res.25 (1976), 105–110.10.1097/00006199-197603000-00007Search in Google Scholar

[19] M. Hoffmann, How to identify moral experts? An application of Goldman’s criteria for expert identification to domain of morality, Analyse and Kritik34 (2012), 299–313.10.1515/auk-2012-0210Search in Google Scholar

[20] C.-H. Hsies, Evaluating students answer scripts with fuzzy arithmetic, Tamsui Oxford J. Management Sci.12 (1996), 15–25.Search in Google Scholar

[21] G. J. Klir and B. Yuan, Fuzzy Sets and Fuzzy Logic Theory and Application, Prentice Hall PTR, Upper Saddle River, NJ, 1995.Search in Google Scholar

[22] O. Kosheleva, How to make sure that the grading scheme encourages students to learn all the material: fuzzy-motivated solution and its justification, world conference on soft computing, San Francisco State University, 2011.Search in Google Scholar

[23] B. Kosko, Fuzzy entropy and conditioning, Inform. Sciences40 (1986), 165–174.10.1016/0020-0255(86)90006-XSearch in Google Scholar

[24] H. S. Lee, Automatic clustering of business processes in business systems, Eur. J. Oper. Res.114 (1999), 354–362.10.1016/S0377-2217(98)00125-8Search in Google Scholar

[25] H. S. Lee, An optimal algorithm for computing the max-min transitive closure of a fuzzy similarity matrix, Fuzzy Set. Syst.123 (2001), 129–136.10.1016/S0165-0114(00)00062-2Search in Google Scholar

[26] T. K. Li and S.-M. Chen, A new method for students learning achievement evaluation by automatically generating the weights of attributes with fuzzy reasoning capability, in: Proceedings of the Eighth International Conference on Machine Learning and Cybernetics, IEEE, Baoding, 2009.Search in Google Scholar

[27] R. Libby and R. K. Blashfield, Performance of a composite as a function of the number of judges, Organ. Behav. Hum. Perform.21 (1978), 121–129.10.1016/0030-5073(78)90044-2Search in Google Scholar

[28] S. Lichtenstein and B. Fischhoff, Do those who know more also know more about how much they know?, Organ Behav. Hum. Perf.20 (1977), 159–183.10.1016/0030-5073(77)90001-0Search in Google Scholar

[29] S. Lichtenstein, B. Fischhoff and L. D. Phillips, Calibration of probabilities: the state of the art to 1980, in: D. Kahneman, P. Slovic and A. Tversky, eds., Judgment under Uncertainty: Heuristics and Biases, Cambridge University Press, Cambridge, UK, 1982.Search in Google Scholar

[30] S. Makridakis and R. L. Winkler, Averages of forecasts: Some empirical results, Manage. Sci.29 (1983), 987–996.10.1287/mnsc.29.9.987Search in Google Scholar

[31] T. McGrew, How foundationalists do crossword puzzles, Philos. Stud.96 (1999), 333–350.Search in Google Scholar

[32] J. R. Nolan, An expert fuzzy classification system for supporting the grading of student writing samples, Expert Syst. Appl.15 (1998), 59–68.10.1016/S0957-4174(98)00011-6Search in Google Scholar

[33] A. O’Hagan, Eliciting expert beliefs in substantial practical applications, The Statistician47 (1998), 21–35.10.1111/1467-9884.00114Search in Google Scholar

[34] S. Oskamp, The relationship of clinical experience and training methods to several criteria of clinical prediction, Psychol. Monogr.76 (1962), 1–27.10.1037/h0093849Search in Google Scholar

[35] S. Oskamp, Overconfidence in case-study judgments, J. Consult. Psychol.29 (1965), 261–265.10.1017/CBO9780511809477.021Search in Google Scholar

[36] S. Raha, N. R. Pal and K. S. Ray, Similarity based approximate reasoning: methodology and application, IEEE T. Syst. Man. Cy. A.32 (2002), 541–547.10.1109/TSMCA.2002.804787Search in Google Scholar

[37] T. J. Ross, Fuzzy Logic with engineering applications, 3rd ed. John Wiley and Sons Ltd., UK, 2010.10.1002/9781119994374Search in Google Scholar

[38] I. Saleh and S. Kim, A fuzzy system for evaluating students learning achievement, Expert. Syst. Appl.36 (2009), 6236–6243.10.1016/j.eswa.2008.07.088Search in Google Scholar

[39] S. S. Salunkhe, Y. V. Joshi and A. Deshpande, Fuzzy similarity measures as a basis for the selection of examiners, J. Fuzzy Set Valued Anal.2015 (2015), 1–14.10.5899/2015/jfsva-00267Search in Google Scholar

[40] O. R. Scholz, Experts: What they are and how we recognize them – A discussion of Alvin Goldman’s view, Grazer Philosophische Studien79 (2009), 187–205.10.1163/18756735-90000864Search in Google Scholar

[41] J. Shanteau, Competence in experts: the role of task characteristics, Organ Behav. Hum. Dec.53 (1992), 252–266.10.1016/0749-5978(92)90064-ESearch in Google Scholar

[42] M. R. Steenbergen and G. Marks, Evaluating expert judgments, Eur. J. Polit. Res.46 (2007), 347–366.10.1111/j.1475-6765.2006.00694.xSearch in Google Scholar

[43] S. Tamura, S. Higuchi and K. Tanaka, Pattern classification based on fuzzy relations, IEEE T. Syst. Man Cyb.SMC-1 (1971), 61–66.10.1109/TSMC.1971.5408605Search in Google Scholar

[44] D. Trumbo, C. Adams, M. Milner and L. Schipper, Reliability and accuracy in the inspection of hard red winter wheat, Cereal Sci. Today7 (1962), 62–71.Search in Google Scholar

[45] H.-Y. Wang and S.-M. Chen, New methods for evaluating the answer scripts of students using fuzzy sets, IEA/AIE 2006, LNAI 4031, 2006.10.1007/11779568_48Search in Google Scholar

[46] H.-Y. Wang and S.-M. Chen, Evaluating students answer scripts using vague values, Appl. Intell.28 (2008), 183–193.10.1007/s10489-007-0060-4Search in Google Scholar

[47] P. Williams, The use of confidence factors in forecasting, B. Am. Meteorol. Soc.32 (1951), 279–281.10.1175/1520-0477-32.8.279Search in Google Scholar

[48] R. L. Winkler and R. T. Clemen, Experts vs. multiple methods: combining correlation assessments, Decision Analysis1 (2004), 167–176.10.1287/deca.1030.0008Search in Google Scholar

[49] R. L. Winkler and S. Makridakis, The combination of forecasts, J. R. Stat. Soc., Series A, 146, Pt. 2, (1983) 150–157.10.2307/2982011Search in Google Scholar

[50] L. A. Zadeh, The concept of a linguistic variable and its application to approximate reasoning, Inform. Sciences8 (1965), 199–249.10.1007/978-1-4684-2106-4_1Search in Google Scholar

[51] L. A. Zadeh, Similarity relations and fuzzy orderings, Inform. Sciences3 (1971), 177–200.10.1016/S0020-0255(71)80005-1Search in Google Scholar

[52] H. J. Zimmermann, Fuzzy set theory and its application, 4th ed. Springer, USA, 2001.10.1007/978-94-010-0646-0Search in Google Scholar

Received: 2015-9-22
Published Online: 2015-12-17
Published in Print: 2016-4-1

©2016 by De Gruyter

This article is distributed under the terms of the Creative Commons Attribution Non-Commercial License, which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Downloaded on 12.10.2025 from https://www.degruyterbrill.com/document/doi/10.1515/jisys-2015-0105/html
Scroll to top button