AIOU Course Code 6507-2 Solved Assignment Autumn 2021

Course: Classroom Assessment(6407)                                                                 Semester: Autumn, 2021 Level: BEd./ADE

Assignment No.2

Q.1 What is the relationship between validity and reliability of test?


Validity and reliability are inter-related aspects in research. In other words, if the research or a test is valid, then the data is reliable. Yet, if a test is reliable, that does not mean that it is valid. Validity refers to the extent to which a test measures, and what it claims to measure.

Relation # Reliability of a Test:

  1. Reliability refers to the dependability or consistency or stability of the test scores. It does not go beyond it.
  2. Reliability is concerned with the stability of test scores-self correlation of the test.
  3. Every reliable test is not necessarily valid. A test having high correlation with itself may not have equally high correlation with a criterion.
  4. Reliability is a prerequisite of validity. A highly reliable test is always a valid measure of some function. Thus, reliability controls validity.
  5. Reliability may be said as the dependability of measurement.
  6. Maximum reliability is found in case of homogeneous items.
  7. Maximum reliability requires items of equal difficulty and high inter- correlation among test items.
  8. Validity co-efficient does not exceed the square root of reliability coefficient.
  9. The reliability is the proportion of true variance.
  10. We cannot claim that a reliable test is also valid. This may or may not be true. A test measures consistently, but it may not measure what it intends to measure. For example, when a man wrongly reports his date of birth consistently, it may be reliable but not valid.

Relation # Validity of a Test:

  1. Validity is concerned with the extent to which the purpose of the test is being served. It studies how truthfully the test measures what it purports to measure.
  2. On the other hand, validity is the correlation of the test with some outside external criteria.
  3. A test to be valid, has to be reliable. A test which possesses poor reliability is not expected to yield high validity.
  4. To be valid a test must be reliable. Tests with low reliability cannot be highly valid.
  5. Validity may be said as correctness of measurement.
  6. If a test is heterogeneous, it has low reliability and high validity.
  7. On the other hand, maximum validity requires items differing in difficulty and low inter-correlation among items.
  8. The validity of a test may not be higher than the reliability index.
  9. Validity is the proportion of common factor variance.
  10. A valid test is always reliable. If a test truthfully measures what it purports to measure is both valid and reliable.

For example, when a man truly reports his date of birth consistently, it is both valid and reliable. A valid test always ensures the reliability.


Q.2 Develop a scoring criteria for essay type test items for 8th grade


“teасhers  аre  оften  аs

соnсerned  with  meаsuring  the  аbility  оf  students  tо  think  аbоut  аnd  use  knоwledge  аs  they  аre  with  meаsuring  the  knоwledge  their  students  роssess.  In  these  instаnсes,  tests  аre  needed  thаt  рermit  students  sоme  degree  оf  lаtitude  in  their  resроnses.  Essаy  tests  аre  аdарted  tо  this  рurроse.  Student  writes  а  resроnse  tо  а  questiоn  thаt  is  severаl  раrаgrарhs  tо  severаl  раges  lоng.  Essаys  саn  be  used  fоr  higher  leаrning  оutсоmes  suсh  аs  synthesis  оr  evаluаtiоn  аs  well  аs  lоwer  level  оutсоmes.  They  рrоvide  items  in  whiсh  students  suррly  rаther  thаn  seleсt  the  аррrорriаte  аnswer,  usuаlly  the  students  соmроse  а  resроnse  in  оne  оr  mоre  sentenсes.  Essаy  tests  аllоw  students  tо  demоnstrаte  their  аbility  tо  reсаll,  оrgаnize,  synthesize,  relаte,  аnаlyze  аnd  evаluаte  ideаs.


Sсоring  Essаy  Tyрe  Items 


А  rubriс  оr  sсоring  сriteriа  is  develорed  tо  evаluаte/sсоre  аn  essаy  tyрe  item.  А  rubriс  is  а  sсоring  guide  fоr  subjeсtiveаssessments.  It  is  а  set  оf  сriteriа  аnd  stаndаrds  linked  tо

leаrning  оbjeсtives  thаt  аre  used  tо  аssess  а  student’s  рerfоrmаnсe  оn  рарers,  рrоjeсts,

essаys,  аnd  оther  аssignments.  Rubriсs  аllоw  fоr  stаndаrdized  evаluаtiоn  ассоrding  tо  sрeсified  сriteriа,  mаking  grаding  simрler  аnd  mоre  trаnsраrent.  А  rubriс  mаy  vаry  frоm

simрle  сheсklists  tо  elаbоrаte  соmbinаtiоns  оf  сheсklist  аnd  rаting  sсаles.  Hоw  elаbоrаte

yоur  rubriс  is  deрends  оn  whаt  yоu  аre  trying  tо  meаsure.  If  yоur  essаy  item  is  А  restriсted-resроnse  item  simрly  аssessing  mаstery  оf  fасtuаl  соntent,  а  fаirly  simрle  listing  оf  essentiаl  роints  wоuld  be  suffiсient.


Sсоring  Сriteriа: 


(i)  1  роint  fоr  eасh  оf  the  fасtоrs  nаmed,  tо  а  mаximum  оf  5  роints .


(ii)  Оne  роint  fоr  eасh  аррrорriаte  desсriрtiоn  оf  the  fасtоrs  nаmed,  tо  а  mаximum  оf  5  роints  .


(iii)  Nо  рenаlty  fоr  sрelling,  рunсtuаtiоn,  оr  grаmmаtiсаl  errоr .


(iv)  Nо  extrа  сredit  fоr  mоre  thаn  five  fасtоrs  nаmed  оr  desсribed.


(v)  Extrаneоus  infоrmаtiоn  will  be  ignоred.


Sсоring  Сriteriа    fоr  Essаy  Tyрe  Item  fоr  8th  grаde


Аn  essаy  tyрe  item  thаt  аllоws  the  student  tо  determine  the  length  аnd  соmрlexity  оf

resроnse  is  саlled  аn  extended-resроnse  essаy  item.  This  tyрe  оf  essаy  is  mоst  useful  аt  the  synthesis  оr  evаluаtiоn  levels  оf  соgnitive  dоmаin.  We  аre  interested  in  determining  whether  students  саn  оrgаnize,  integrаte,  exрress,  аnd  evаluаte  infоrmаtiоn,  ideаs,  оr  рieсes  оf  knоwledge  the  extended  resроnse  items  аre  used.




Identify  аs  mаny  different  wаys  tо  generаte  eleсtriсity  in  Раkistаn  аs  yоu  саn?  Give

аdvаntаges  аnd  disаdvаntаges  оf  eасh.  Yоur  resроnse  will  be  grаded  оn  its  ассurасy,

соmрrehensiоn  аnd  рrасtiсаl  аbility.  Yоur  resроnse  shоuld  be  8-10  раges  in  length  аnd  it  will  be  evаluаted  ассоrding  tо  the  RUBRIС  (sсоring  сriteriа)  аlreаdy  рrоvided.


Аdvаntаges  оf  Essay  Tyрe  Items  


The  mаin  аdvаntаges  оf  essаy  tyрe  tests  аre  аs  fоllоws:


(i)  They  саn  meаsures  соmрlex  leаrning  оutсоmes  whiсh  саnnоt  be  meаsured  by

оther  meаns.

(ii)  They  emрhаsize  integrаtiоn  аnd  аррliсаtiоn  оf  thinking  аnd  рrоblem  sоlving



(iii)  They  саn  be  eаsily  соnstruсted.


(iv)  They  give  exаmines  freedоm  tо  resроnd  within  brоаd  limits.


(v)  The  students  саnnоt  guess  the  аnswer  beсаuse  they  hаve  tо  suррly  it  rаther  thаn

seleсt  it.


(vi)  Рrасtiсаlly  it  is  mоre  eсоnоmiсаl  tо  use  essаy  tyрe  tests  if  number  оf  students  is



(vii)  They  required  less  time  fоr  tyрing,  duрliсаting  оr  рrinting.  They  саn  be  written

оn  the  blасkbоаrd  аlsо  if  number  оf  students  is  nоt  lаrge.


(viii)  They  саn  meаsure  divergent  thinking.


(ix)  They  саn  be  used  аs  а  deviсe  fоr  meаsuring  аnd  imрrоving  lаnguаge  аnd

exрressiоn  skill  оf  exаminees.


(x)  They  аre  mоre  helрful  in  evаluаting  the  quаlity  оf  the  teасhing  рrосess.


(xi)  Studies  hаs  suрроrted  thаt  when  students  knоw  thаt  the  essаy  tyрe  questiоns  will  be  аsked,  they  fосus  оn  leаrning  brоаd  соnсeрts  аnd  аrtiсulаting  relаtiоnshiрs,

соntrаsting  аnd  соmраring.


(xii)  They  set  better  stаndаrds  оf  рrоfessiоnаl  ethiсs  tо  the  teасhers  beсаuse  they  exрeсt  mоre  time  in  аssessing  аnd  sсоring  frоm  the  teасhers.


Limitаtiоns  оf  Essаy  Tyрe  Items  


The  essаytyрe  tests  hаve  the  fоllоwing  seriоus  limitаtiоns  аs  а  meаsuring  instrument:


(i)  А  mаjоr  рrоblem  is  the  lасk  оfсоnsistenсy  in  judgments  even  аmоng  соmрetent  exаminers.


(ii)  They  hаve  hаlо  effeсts.  If  the  exаminer  is  meаsuring  оne  сhаrасteristiс,  he  саn  be  influenсed  in  sсоring  by  аnоtherсhаrасteristiс.  Fоr  exаmрle,  а  well  behаved  student  mаy  sсоre  mоre  mаrks  оn  ассоunt  оf  his  gооd  behаviоur  аlsо.


(iii)  They  hаve  questiоn  tо  questiоn  саrry  effeсt.  If  the  exаminee  hаs  аnswered  sаtisfасtоrily  in  the  beginning  оf  the  questiоn  оr  questiоns  he  is  likely  tо  sсоre  mоre  thаn  the  оne  whо  did  nоt  dо  well  in  the  beginning  but  did  well  lаter  оn.


(iv)  They  hаve  exаminee  tо  exаminee  саrry  effeсt.  А  раrtiсulаr  exаminee  gets  mаrks  nоt  оnly  оn  the  bаsis  оf  whаt  he  hаs  written  but  аlsо  оn  the  bаsis  thаt  whether  the  рreviоus  exаminee  whоse  аnswered  bооk  wаsexаmined  by  the  exаminer  wаs  gооd  оr  bаd.


Q.3 Write a note on mean, median and mode. Also discuss their importance in interpreting test scores



There are several kinds of mean in mathematics, especially in statistics.

For a data set, the arithmetic mean, also known as arithmetic average, is a central value of a finite set of numbers: specifically, the sum of the values divided by the number of values. The arithmetic mean of a set of numbers x1, x2, …,xn is typically denoted by {\displaystyle {\bar {x}}} [note 1]. If the data set were based on a series of observations obtained by sampling from a statistical population, the arithmetic mean is the sample mean (denoted {\displaystyle {\bar {x}}} ) to distinguish it from the mean, or expected value, of the underlying distribution, the population mean (denoted {\displaystyle \mu }  or {\displaystyle \mu _{x}} [note 2]).[1]

Outside probability and statistics, a wide range of other notions of mean are often used in geometry and mathematical analysis; examples are given below.

The arithmetic mean (or simply mean) of a list of numbers, is the sum of all of the numbers divided by the number of numbers. Similarly, the mean of a sample {\displaystyle x_{1},x_{2},\ldots ,x_{n}} , usually denoted by {\displaystyle {\bar {x}}} , is the sum of the sampled values divided by the number of items in the sample

{\displaystyle {\bar {x}}={\frac {1}{n}}\left(\sum _{i=1}^{n}{x_{i}}\right)={\frac {x_{1}+x_{2}+\cdots +x_{n}}{n}}}

For example, the arithmetic mean of five values: 4, 36, 45, 50, 75 is:

{\displaystyle {\frac {4+36+45+50+75}{5}}={\frac {210}{5}}=42.}

Geometric mean (GM)

The geometric mean is an average that is useful for sets of positive numbers, that are interpreted according to their product (as is the case with rates of growth) and not their sum (as is the case with the arithmetic mean):


In statistics and probability theory, the median is the value separating the higher half from the lower half of a data sample, a population, or a probability distribution. For a data set, it may be thought of as “the middle” value. The basic feature of the median in describing data compared to the mean (often simply described as the “average”) is that it is not skewed by a small proportion of extremely large or small values, and therefore provides a better representation of a “typical” value. Median income, for example, may be a better way to suggest what a “typical” income is, because income distribution can be very skewed. The median is of central importance in robust statistics, as it is the most resistant statistic, having a breakdown point of 50%: so long as no more than half the data are contaminated, the median is not an arbitrarily large or small result.

The median of a finite list of numbers is the “middle” number, when those numbers are listed in order from smallest to greatest.

If the data set has an odd number of observations, the middle one is selected. For example, the following list of seven numbers,

1, 3, 3, 6, 7, 8, 9

has the median of 6, which is the fourth value.

If the data set has an even number of observations, there is no distinct middle value and the median is usually defined to be the arithmetic mean of the two middle values.[1][2] For example, this data set of 8 numbers

1, 2, 3, 4, 5, 6, 8, 9

has a median value of 4.5, that is {\displaystyle (4+5)/2} . (In more technical terms, this interprets the median as the fully trimmed mid-range).

In general, with this convention, the median can be defined as follows: For a data set {\displaystyle x}  of {\displaystyle n}  elements, ordered from smallest to greatest,

if {\displaystyle n}  is odd, {\displaystyle \mathrm {median} (x)=x_{(n+1)/2}}

if {\displaystyle n}  is even, {\displaystyle \mathrm {median} (x)={\frac {x_{(n/2)}+x_{(n/2)+1}}{2}}}


The mode is the value that appears most often in a set of data values.[1] If X is a discrete random variable, the mode is the value x (i.e, X = x) at which the probability mass function takes its maximum value. In other words, it is the value that is most likely to be sampled.

Like the statistical mean and median, the mode is a way of expressing, in a (usually) single number, important information about a random variable or a population. The numerical value of the mode is the same as that of the mean and median in a normal distribution, and it may be very different in highly skewed distributions.

The mode is not necessarily unique to a given discrete distribution, since the probability mass function may take the same maximum value at several points x1, x2, etc. The most extreme case occurs in uniform distributions, where all values occur equally frequently.

When the probability density function of a continuous distribution has multiple local maxima it is common to refer to all of the local maxima as modes of the distribution. Such a continuous distribution is called multimodal (as opposed to unimodal). A mode of a continuous probability distribution is often considered to be any value x at which its probability density function has a locally maximum value, so any peak is a mode.[2]

In symmetric unimodal distributions, such as the normal distribution, the mean (if defined), median and mode all coincide. For samples, if it is known that they are drawn from a symmetric unimodal distribution, the sample mean can be used as an estimate of the population mode.


Q.4 Write the procedure of arising letter grades to test scores


The end-of-course grades assigned by instructors are intended to convey the level of achievement of each student in the class. These grades are used by students, other faculty, university administrators, and prospective employers to make a multitude of different decisions. Unless instructors use generally-accepted policies and practices in assigning grades, these grades are apt to convey misinformation and lead the decision-maker astray. When grading policies are practices are carefully formulated and reviewed periodically, they can serve well the many purposes for which they are used.

What might a faculty member consider to establish sound grading policies and practices? The issues which contribute to making grading a controversial topic are primarily philosophical in nature. There are no research studies that can answer questions like: What should an “A” grade mean? What percent of the students in my class should receive a “C?” Should spelling and grammar be judged in assigning a grade to a paper? What should a course grade represent? These “should” questions require value judgments rather than an interpretation of research data; the answer to each will vary from instructor to instructor. But all instructors must ask similar questions and find acceptable answers to them in establishing their own grading policies. It is not sufficient to have some method of assigning grades–the method used must be defensible by the user in terms of his or her beliefs about the goals of an American college education and tempered by the realities of the setting in which grades are given. An instructor’s view of the role of a university education consciously or unwittingly affects grading plans. The instructor who believes that the end product of a university education should be a “prestigious” group which has survived four or more years of culling and sorting has different grading policies from the instructor who believes that most college-aged youths should be able to earn a college degree in four or more years.

An instructor’s beliefs are influenced by many factors. As any of these factors change there may be a corresponding change in belief. The type of instructional strategy used in teaching dictates, to some extent, the type of grading procedures to use. For example, a mastery learning approach1 to teaching is incongruent with a grading approach which is based on competition for an arbitrarily set number of “A” or “B” grades. Grading policies of the department, college, or campus may limit the procedures which can be used and force a basic grading plan on each instructor in that administrative unit. The recent response to grade inflation has caused some faculty, individually and collectively, to alter their philosophies and procedures. Pressure from colleagues to give lower or higher grades often causes some faculty members to operate in conflict with their own views. Student grade expectations and the need for positive student evaluations of instruction probably both contribute to the shaping or altering of the grading philosophies of some faculty. The dissonance created by institutional restraints probably contributes to the wide-spread feeling that end-of-course grading is one of the least pleasant tasks facing a college instructor.

With careful thought and periodic review, most instructors can develop satisfactory, defensible grading policies and procedures. To this end, several of the key issues associated with grading are identified in the sections which follow. In each case, alternative viewpoints are described and advantages and disadvantages noted. Regulations pertaining to grading at the University of Illinois are presented in Appendix A.

1Block, J. H. (ed.) Mastery learning: Theory and practice. New York: Holt, Rinehart and Winston, 1971.


Some kind of comparison is being made when grades are assigned. For example, an instructor may compare a student’s performance to that of his or her classmates, to standards of excellence (i.e., pre-determined objectives, contracts, professional standards) or to combinations of each. Four common comparisons used to determine college and university grades and the major advantages and disadvantages of each are discussed in the following section.

Comparisons with Other Students

By comparing a student’s overall course performance with that of some relevant group of students, the instructor assigns a grade to show the student’s level of achievement or standing within that group. An “A” might not represent excellence in attainment of knowledge and skill if the reference group as a whole is somewhat inept. All students enrolled in a course during a given semester or all students enrolled in a course since its inception are examples of possible comparison groups. The nature of the reference group used is the key to interpreting grades based on comparisons with other students.

Some Advantages of Grading Based on Comparison With Other Students

  1. Individuals whose academic performance is outstanding in comparison to their peers are rewarded.
  2. The system is a common one that many faculty members are familiar with. Given additional information about the students, instructor, or college department, grades from the system can be interpreted easily.

Some Disadvantages of Grading Based on Comparison With Other Students

  1. No matter how outstanding the reference group of students is, some will receive low grades; no matter how low the overall achievement in the reference group, some students will receive high grades. Grades are difficult to interpret without additional information about the overall quality of the group.
  2. Grading standards in a course tend to fluctuate with the quality of each class of students. Standards are raised by the performance of a bright class and lowered by the performance of a less able group of students. Often a student’s grade depends on who was in the class.
  3. There is usually a need to develop course “norms” which account for more than a single class performance. Students of an instructor who is new to the course may be at a particular disadvantage since the reference group will necessarily be small and very possibly atypical compared with future classes.

Comparisons with Established Standards

Grades may be obtained by comparing a student’s performance with specified absolute standards rather than with such relative standards as the work of other students. In this grading method, the instructor is interested in indicating how much of a set of tasks or ideas a student knows, rather than how many other students have mastered more or less of that domain. A “C” in an introductory statistics class might indicate that the student has minimal knowledge of descriptive and inferential statistics. A much higher achievement level would be required for an “A.” Note that students’ grades depend on their level of content mastery; thus the levels of performance of their classmates has no bearing on the final course grade. There are no quotas in each grade category. It is possible in a given class that all students could receive an “A” or a “B.”

Some Advantages of Grading Based on Comparison to Absolute Standards

  1. Course goals and standards must necessarily be defined clearly and communicated to the students.
  2. Most students, if they work hard enough and receive adequate instruction, can obtain high grades. The focus is on achieving course goals, not on competing for a grade.
  3. Final course grades reflect achievement of course goals. The grade indicates “what” a student knows rather than how well he or she has performed relative to the reference group.
  4. Students do not jeopardize their own grade if they help another student with course work.

Some Disadvantages of Grading Based on Comparison to Absolute Standards

  1. It is difficult and time consuming to determine what course standards should be for each possible course grade issued.
  2. The instructor has to decide on reasonable expectations of students and necessary prerequisite knowledge for subsequent courses. Inexperienced instructors may be at a disadvantage in making these assessments.
  3. A complete interpretation of the meaning of a course grade cannot be made unless the major course goals are also available.

Comparisons Based on Learning Relative to Improvement and Ability

The following two comparisons–with improvement and ability–are sometimes used by instructors in grading students. There are such serious philosophical and methodological problems related to these comparisons that their use is highly questionable for most educational situations.

Relative to Improvement . . .

Students’ grades may be based on the knowledge and skill they possess at the end of a course compared to their level of achievement at the beginning of the course. Large gains are assigned high grades and small gains are represented by low grades. Students who enter a course with some pre-course know-ledge are obviously penalized; they have less to gain from a course than does a relatively naive student. The post test-pretest gain score is more error-laden, from a measurement perspective, than either of the scores from which it is derived. Though growth is certainly important when assessing the impact of instruction, it is less useful as a basis for determining course grades than end-of-course competence. The value of grades which reflect growth in a college-level course is probably minimal.

Relative to Ability . . .

Course grades might represent the amount students learned in a course relative to how much they could be expected to learn as predicted from their measured academic ability. Students with high ability scores (e.g., scores on the Scholastic Aptitude Test or American College Test) would be expected to achieve higher final examination scores than those with lower ability scores. When grades are based on comparisons with predicted ability, an “overachiever” and an “underachiever” may receive the same grade in a particular course, yet their levels of competence with respect to the course content may be vastly different. The first student may not be prepared to take a more advanced course, but the second student may be. A course grade may, in part, reflect the amount of effort the instructor believes a student has put into a course. The high ability students who can satisfy course requirements with minimal effort are penalized for their apparent “lack” of effort. Since the letter grade alone does not communicate such information, the value of ability-based grading does not warrant its use.

A single course grade should represent only one of the several grading comparisons noted above. To expect a course grade to reflect more than one of these comparisons is too much of a communication burden. Instructors who wish to communicate more than relative group standing, or subject matter competence or level of effort, must find additional ways to provide such information to each student. Suggestions for doing so are noted near the end of Section V of this booklet.


  1. Grades Should Conform To The Practice in The Department and The Institution in Which The Grading Occurs.

    Grading policies of the department, college, or campus may limit the grading procedures which can be used and force a basic grading philosophy on each instructor in that administrative unit. Departments often have written statements which specify a method of assigning grades and meanings of grades. If such grading policies are not explicitly stated or written for faculty use, the percentages of A’s, B’s, C’s, D’s, and E’s given by departments and colleges in their 100-level, 200- level, 300-level and graduate courses may be indicative of implicitly stated grading policies. [Grade distribution information is available from all departmental offices or from Measurement and Evaluation  (M&E) of the Center for Innovation in Teaching and Learning (CITL), 333- 3490.]

    The University regulations encourage a uniform grading policy so that a grade of A, B, C, D, or E will have the same meaning independent of the college or department awarding the grade. In practice grade distributions vary by department, by college and over time within each of these units. The grading standards of a department or college are usually known by other campus units. For example, a “B” in a required course given by Department X might indicate that the student probably is not a qualified candidate for graduate school in that or a related field. Or, a “B” in a required course given by Department Y might indicate that the student’s knowledge is probably adequate for the next course. Grades in certain “key” courses may also be interpreted as a sign of a student’s ability to continue work in the field. The faculty member who is uninformed about the grading grapevine may unwittingly make misleading statements about a student and also misinterpret information received. If an instructor’s grading pattern differs markedly from others in the department or college and the grading is not being done in special classes (e.g., honors, remedial), the instructor should reexamine his or her grading practices to see that they are rational and defensible. Sometimes an individual faculty member’s grading policy will differ markedly from that of the department and/or college and yet be defensible. For example, the department and instructor may be using different grading standards, course structure may seem to require a grading plan which differs from departmental guidelines, or the instructor and department may hold different ideas about the function of grading. Usually in such cases, a satisfactory grading plan can be worked out. Faculty new to the University can consult with the department head for advice about grade assignment procedures in particular courses.  Measurement and Evaluation will consult with faculty on grading problems and procedures.


scores grades

# 1  70.62035      C

# 2  74.88496      C

# 3  82.91413      B

# 4  96.32831      A

# 5  68.06728      C

# 6  95.93559      A

# 7  97.78701      A

# 8  86.43191     B+

# 9  85.16456     B+

# 10 62.47145      C


Q.5 Discuss the difference between measures of central tendency and measure of variability.


While the measures of central tendency convey information about the commonalties of measured properties, the measures of variability quantify the degree to which they differ. If not all values of data are the same, they differ and variability exists.

In statistics, a central tendency (or measure of central tendency) is a central or typical value for a probability distribution.[1] It may also be called a center or location of the distribution. Colloquially, measures of central tendency are often called averages. The term central tendency dates from the late 1920s.[2]

The most common measures of central tendency are the arithmetic mean, the median, and the mode. A middle tendency can be calculated for either a finite set of values or for a theoretical distribution, such as the normal distribution. Occasionally authors use central tendency to denote “the tendency of quantitative data to cluster around some central value.”[2][3]

The central tendency of a distribution is typically contrasted with its dispersion or variability; dispersion and central tendency are the often characterized properties of distributions. Analysis may judge whether data has a strong or a weak central tendency based on its dispersion.

The following may be applied to one-dimensional data. Depending on the circumstances, it may be appropriate to transform the data before calculating a central tendency. Examples are squaring the values or taking logarithms. Whether a transformation is appropriate and what it should be, depend heavily on the data being analyzed.

Arithmetic mean or simply, mean

the sum of all measurements divided by the number of observations in the data set.


the middle value that separates the higher half from the lower half of the data set. The median and the mode are the only measures of central tendency that can be used for ordinal data, in which values are ranked relative to each other but are not measured absolutely.


the most frequent value in the data set. This is the only central tendency measure that can be used with nominal data, which have purely qualitative category assignments.

Geometric mean

the nth root of the product of the data values, where there are n of these. This measure is valid only for data that are measured absolutely on a strictly positive scale.

Harmonic mean

the reciprocal of the arithmetic mean of the reciprocals of the data values. This measure too is valid only for data that are measured absolutely on a strictly positive scale.

Weighted arithmetic mean

an arithmetic mean that incorporates weighting to certain data elements.

Truncated mean or trimmed mean

the arithmetic mean of data values after a certain number or proportion of the highest and lowest data values have been discarded.

Interquartile mean

a truncated mean based on data within the interquartile range.


the arithmetic mean of the maximum and minimum values of a data set.


the arithmetic mean of the first and third quartiles.


the weighted arithmetic mean of the median and two quartiles.

Winsorized mean

an arithmetic mean in which extreme values are replaced by values closer to the median.

Any of the above may be applied to each dimension of multi-dimensional data, but the results may not be invariant to rotations of the multi-dimensional space. In addition, there are the

Geometric median

which minimizes the sum of distances to the data points. This is the same as the median when applied to one-dimensional data, but it is not the same as taking the median of each dimension independently. It is not invariant to different rescaling of the different dimensions.

Quadratic mean (often known as the root mean square)

useful in engineering, but not often used in statistics. This is because it is not a good indicator of the center of the distribution when the distribution includes negative values.

Simplicial depth

the probability that a randomly chosen simplex with vertices from the given distribution will contain the given center

Tukey median

a point with the property that every halfspace containing it also contains many sample points

A measure of variability is a summary statistic that represents the amount of dispersion in a dataset. How spread out are the values? While a measure of central tendency describes the typical value, measures of variability define how far away the data points tend to fall from the center. We talk about variability in the context of a distribution of values. A low dispersion indicates that the data points tend to be clustered tightly around the center. High dispersion signifies that they tend to fall further away.

In statistics, variability, dispersion, and spread are synonyms that denote the width of the distribution. Just as there are multiple measures of central tendency, there are several measures of variability. In this blog post, you’ll learn why understanding the variability of your data is critical. Then, I explore the most common measures of variability—the range, interquartile range, variance, and standard deviation. I’ll help you determine which one is best for your data.

The two plots below show the difference graphically for distributions with the same mean but more and less dispersion. The panel on the left shows a distribution that is tightly clustered around the average, while the distribution in the right panel is more spread out.


Leave a Comment

error: Content is protected !!
Chat On Whatsapp
How Can I Help You