Course: Classroom Assessment(6407) Semester: Autumn, 2021 Level: BEd./ADE
Q.l What is Formative and Summative Assessment? Distinguish between them with the help of relevant examples.
The goal of formative assessment is to monitor student learning to provide ongoing feedback that can be used by instructors to improve their teaching and by students to improve their learning. More specifically, formative assessments:
- help students identify their strengths and weaknesses and target areas that need work
- help faculty recognize where students are struggling and address problems immediately
Formative assessments are generally low stakes, which means that they have low or no point value. Examples of formative assessments include asking students to:
- draw a concept map in class to represent their understanding of a topic
- submit one or two sentences identifying the main point of a lecture
- turn in a research proposal for early feedback
The goal of summative assessment is to evaluate student learning at the end of an instructional unit by comparing it against some standard or benchmark.
Summative assessments are often high stakes, which means that they have a high point value. Examples of summative assessments include:
- a midterm exam
- a final project
- a paper
- a senior recital
Information from summative assessments can be used formatively when students or faculty use it to guide their efforts and activities in subsequent courses.
Summative assessment, summative evaluation, or assessment of learning is the assessment of participants where the focus is on the outcome of a program. This contrasts with formative assessment, which summarizes the participants’ development at a particular time. Summative assessment is widely taught in educational programs in the United States. Michael Scriven claims that while all assessment techniques can be summative, only some are formative.
The goal of summative assessment is to evaluate student learning at the end of an instructional unit by comparing it against a standard or benchmark. Note, ‘the end’ does not necessarily mean the end of an entire course or module of study. Summative assessments may be distributed throughout a course, after a particular unit (or collection of topics) has been taught, and there are advantages to doing so. In many disciplines in the UK Higher Education sector, there has been a move away from 100% end of course assessments, to a model where summative assessments are distributed across a course, which helps to scaffold students’ learning. Summative assessment usually involves students receiving a grade that indicates their level of performance, be it a percentage, pass/fail, or some other form of scale grade. Summative assessments are weighted more than formative assessments. For example-test after 6 months in schools, Semester exams in B. Ed after each 6 months.
Summative assessments are often high stakes, which means that they have a high point value. Examples of summative assessments include: a midterm exam, a final project, a paper, or a senior recital.
Formative assessment, formative evaluation, formative feedback, or assessment for learning, including diagnostic testing, is a range of formal and informal assessment procedures conducted by teachers during the learning process in order to modify teaching and learning activities to improve student attainment. The goal of a formative assessment is to monitor student learning to provide ongoing feedback that can help students identify their strengths and weaknesses and target areas that need work. It also helps faculty recognize where students are struggling and address problems immediately.  It typically involves qualitative feedback (rather than scores) for both student and teacher that focuses on the details of content and performance. It is commonly contrasted with summative assessment, which seeks to monitor educational outcomes, often for purposes of external accountability.
Q.2 How to prepare table of specifications? What are different ways of developing table of specifications?
Steps in Preparing the Table of Specification: 1. List down the topics covered for inclusion in the test. 3. 04/08/13 2. Determine the objectives to be assessed by the test. 3. Specify the number of days/hours spent for teaching a particular topic. 4. LOGO 4. Determine percentage allocation of the test items for each of the topics covered.
table of specifications or TOS is a test map that guides the teacher in constructing a test. The TOS ensures that there is balance between items that test lower level thinking skills and those which test higher order thinking skills ( or alternatively, a balance between easy and difficult items) in the test. The simplest TOS consists of four (4) columns: (a) level of objective to be tested, (b) statement of objective, (c) item numbers where such an objective is being tested, and (d) Number of items and percentage out of the total for that particular objective. A prototype table is shown below:
Table of Specifications Prototype
In the table of specifications we see that there are five items that deal with knowledge and these items are items 1,3,5,7,9. Similarly, from the same table we see that five items represent synthesis, namely: 12, 14, 16, 18, 20. The first four levels of Bloom’s taxonomy are equally represented in the test while application (tested through essay) is weighted equivalent to ten (10) points or double the weight given to any of the first four levels. The table of specifications guides the teacher in formulating the test. As we can see, the TOS also ensures that each of the objectives in the hierarchy of educational objectives is well represented in the test. As such, the resulting test that will be constructed by the teacher will be more or less comprehensive. Without the table of specifications, the tendency for the test maker is to focus too much on facts and concepts at the knowledge level.
Three steps are involved in creating a Table of Specifications: 1) choosing the measurement goals and domain to be covered, 2) breaking the domain into key or fairly independent parts- concepts, terms, procedures, applications, and 3) constructing the table.
A Table of Specifications provides the teacher with evidence that a test has content validity, that it covers what should be covered. Designing a Table of Specifications. Tables of Specification typically are designed based on the list of course objectives, the topics covered in class, the amount of time spent on those topics,
The table of specifications (TOS) is a tool used to ensure that a test or assessment measures the content and thinking skills that the test intends to measure. Thus, when used appropriately, it can provide response content and construct (i.e., response process) validity evidence. A TOS may be used for large-scale test construction, classroom-level assessments by teachers, and psychometric scale development. It is a foundational tool in designing tests or measures for research and educational purposes. The primary purpose of a TOS is to ensure alignment between the items or elements of an assessment and the content, skills, or constructs that the assessment intends to assess. That is, a TOS helps test constructors to focus on issue of response content, ensuring that the test or assessment measures what it intends to measure. For example, if a teacher is interested in assessing the students’ understanding of lunar phases, then it would be appropriate to have a test item asking them to draw the phases of the moon. However, a test item asking them to identify the first person to walk on the moon would not have the same content validity to assess students’ knowledge of lunar phases. In addition, a TOS can also be used to provide response process validity evidence for test constructors. Response process refers to the kind of thinking that is expected of the test taker in completing the assessment. For the lunar phases, for example, a teacher may expect students to memorize the phases of the moon and therefore a knowledge-level (relying on recognition or memory) question would be appropriate. Alternatively, if the teacher taught the lessons such that students tracked the moon for a month, developed lunar journals, and discussed the reasons for the different phases, then the assessment should target higher level thinking such as analysis, evaluation, and synthesis. As such, asking students to draw a model of the lunar phases with annotated explanations would be better aligned to the kind of thinking that students experienced during instruction. The TOS is typically constructed as a table that includes key information to help teachers align the learning objectives that represent the content and cognitive levels intended for students to achieve with class time spent and the number of test items. Table 1 provides an example of a TOS for a chapter test on “New Ideas for a New Century,” from Molefi Kete Asante’s (1995) African American History: A Journey of Liberation. This entry explored the roles of prominent African American leaders from 1895 to 1919. Before constructing the TOS, the teacher decided the total number of items to include (i.e., 10) and quantity and type of those items (i.e., five multiple-choice and five short answers), and the decision was made based on the time allocated for students to complete the test and students’ general testtaking abilities. Next, the teacher referred to the lesson plans and notes to determine the content in columns A–C (i.e., day, learning objectives, time spent on objective). To calculate the percentage of class time for each objective (column D), the teacher divided the minutes spent teaching each objective (column C) by the total minutes for the unit and multiplied by 100. Determining the percentage of time spent in class on each objective is one approach to identifying how many items on the test should address any particular objective and enhances test content validity evidence.
Q.3 Define criterion and Norm-reference testing. Make a comparison between them
Norm-referenced is a type of test that assesses the test taker’s ability and performance against other test takers. Criterion-Reference is a type of test that assesses the test taker’s ability to understand a set curriculum.
Criterion-referenced vs. norm-referenced
To understand what happened, we need to understand the difference between criterion-referenced tests and norm-referenced tests.
The first thing to understand is that even an assessment expert couldn’t tell the difference between criterion-referenced test and a norm-referenced test just by looking at them. The difference is actually in the scores—and some tests can provide both criterion-referenced results and norm-referenced results!
Criterion-referenced tests compare a person’s knowledge or skills against a predetermined standard, learning goal, performance level, or other criterion. With criterion-referenced tests, each person’s performance is compared directly to the standard, without considering how other students perform on the test. Criterion-referenced tests often use “cut scores” to place students into categories such as “basic,” “proficient,” and “advanced.”
If you’ve ever been to a carnival or amusement park, think about the signs that read “You must be this tall to ride this ride!” with an arrow pointing to a specific line on a height chart. The line indicated by the arrow functions as the criterion; the ride operator compares each person’s height against it before allowing them to get on the ride.
Note that it doesn’t matter how many other people are in line or how tall or short they are; whether or not you’re allowed to get on the ride is determined solely by your height. Even if you’re the tallest person in line, if the top of your head doesn’t reach the line on the height chart, you can’t ride.
Criterion-referenced assessments work similarly: An individual’s score, and how that score is categorized, is not affected by the performance of other students. In the charts below, you can see the student’s score and performance category (“below proficient”) do not change, regardless of whether they are a top-performing student, in the middle, or a low-performing student.
This means knowing a student’s score for a criterion-referenced test will only tell you how that specific student compared in relation to the criterion, but not whether they performed below-average, above-average, or average when compared to their peers.
Norm-referenced measures compare a person’s knowledge or skills to the knowledge or skills of the norm group. The composition of the norm group depends on the assessment. For student assessments, the norm group is often a nationally representative sample of several thousand students in the same grade (and sometimes, at the same point in the school year). Norm groups may also be further narrowed by age, English Language Learner (ELL) status, socioeconomic level, race/ethnicity, or many other characteristics.
Q.4 What are the types of selection types tests items? What are the advantages of multiple choice questions
Types of Selection Methods Selection methods or screening devices include application blanks, employment interviews, aptitude tests, and personality test. You can download 8 Ultimate HR Tools for HR Managers HERE.
The employee tests administered in the selection process may be classified in different ways. These tests range from one or two short form pencil and paper tests to elaborate combination of projective tests. These tests are designed to measure aptitude (general mental intelligence and special aptitudes), interest, creativity, judgment, temperament, and personality.
Some of the types of employee tests are:- 1. Aptitude Tests 2. Achievement Tests 3. Situational Tests 4. Interest Tests 5. Personality Tests 6. Intelligence or Mental Alertness 7. Mechanical Ability Tests.
- Clerical and Stenographic Skills Test 9. Temperament Tests 10. Judgment Test 11. Abilities Tests 12. Skills Tests 13. Honesty Tests.
Types of Tests
Tests are classified into five types.
(i) Aptitude tests;
(ii) Achievement tests;
(iii) Situational tests;
(iv) Interest tests; and
(v) Personality tests.
(i) Aptitude Tests:
These tests measure whether an individual has the capacity or latent ability to learn a given job if given adequate training. Aptitudes can be divided into general and mental ability or intelligence and specific aptitudes such as mechanical, clerical, manipulative capacity etc.
(a) Intelligence Tests:
These tests in general measure intelligence quotient of a candidate. In detail these tests measure capacity for comprehension, reasoning, word fluency, verbal comprehension, numbers, memory and space. Other factors such as digit spans — both forward and backward, information known, comprehension, vocabulary, picture arrangement and object assembly.
Though these tests are accepted as useful ones, they are criticised against deprived sections of the community. Further, it is also criticised that these tests may prove to be too dull as a selection device. Intelligence tests include- sample learning, ability, the adaptability tests, etc.
(b) Mechanical Aptitude Tests:
These tests measure the capacities of spatial visualisation, perceptual speed and knowledge of mechanical matter. These tests are useful for selecting apprentices, skilled, mechanical employees, technicians, etc.
(c) Psychomotor Tests:
These tests measure abilities like manual dexterity, motor ability and eye-hand co-ordination of candidates. These tests are useful to select semi-skilled workers and workers for repetitive operations like packing and watch assembly.
(d) Clerical Aptitude Tests:
Measure specific capacities involved in office work. Items of this test include spelling, computation, comprehension, copying, word measuring, etc.
(ii) Achievement Tests:
These tests are conducted when applicants claim to know something as these tests are concerned with what one has accomplished. These tests are more useful to measure the value of specific achievement when an organisation wishes to employ experienced candidates.
These tests are classified into:
(a) Job knowledge test, and
(b) Work sample test.
(a) Job Knowledge Test:
Under this test a candidate is tested in the knowledge of a particular job. For example, if a junior lecturer applies for the job of a senior lecturer in commerce, he may be tested in job knowledge where he is asked questions about Accountancy Principles, Banking, Law, Business Management, etc.
(b) Work Sample Test:
Under this test a portion of the actual work is given to the candidate as a test and the candidate is asked to do it. If a candidate applies for a post of lecturer in Management he may be asked to deliver a lecture on Management Information System as work sample test.
Thus, the candidate’s achievement in his career is tested regarding his knowledge about the job and actual work experience.
(iii) Situational Test:
This test evaluates a candidate in a similar real life situation. In this test the candidate is asked either to cope with the situation or solve critical situations of the job.
Q.5 Which factors affect the reliability of test
So those factors must be taken into account at the stage of constructing a test and at the stage of application. Some factors affecting the reliability are as follows: The Length of the test, the length of the test affects the real values and the variances of the observed values.
Some intrinsic and some extrinsic factors have been identified to affect the reliability of test scores.
(A) Intrinsic Factors:
The principal intrinsic factors (i.e. those factors which lie within the test itself) which affect the reliability are:
(i) Length of the Test:
Reliability has a definite relation with the length of the test. The more the number of items the test contains, the greater will be its reliability and vice-versa. Logically, the more sample of items we take of a given area of knowledge, skill and the like, the more reliable the test will be.
However, it is difficult to ensure the maximum length of the test to ensure an appropriate value of reliability. The length of the tests in such case should not give rise to fatigue effects in the testees, etc. Thus, it is advisable to use longer tests rather than shorter tests. Shorter tests are less reliable.
The number of times a test should be lengthened to get a desirable level of reliability is given by the formula:
When a test has a reliability of 0.8, the number of items the test has to be lengthened to get a reliability of 0.95 is estimated in the following way:
Hence the test is to be lengthened 4.75 times. However, while lengthening the test one should see that the items added to increase the length of the test must satisfy the conditions such as equal range of difficulty, desired discrimination power and comparability with other test items.
(ii) Homogeneity of Items:
Homogeneity of items has two aspects: item reliability and the homogeneity of traits measured from one item to another. If the items measure different functions and the inter-correlations of items are ‘zero’ or near to it, then the reliability is ‘zero’ or very low and vice-versa.
(iii) Difficulty Value of Items:
The difficulty level and clarity of expression of a test item also affect the reliability of test scores. If the test items are too easy or too difficult for the group members it will tend to produce scores of low reliability. Because both the tests have a restricted spread of scores.
(iv) Discriminative Value:
When items can discriminate well between superior and inferior, the item total-correlation is high, the reliability is also likely to be high and vice-versa.
(v) Test instructions:
Clear and concise instructions increase reliability. Complicated and ambiguous directions give rise to difficulties in understanding the questions and the nature of the response expected from the testee ultimately leading to low reliabi
(vi) Item selection:
If there are too many interdependent items in a test, the reliability is found to be low.
(vii) Reliability of the scorer:
The reliability of the scorer also influences reliability of the test. If he is moody, fluctuating type, the scores will vary from one situation to another. Mistake in him give rises to mistake in the score and thus leads to reliability.
(B) Extrinsic Factors:
The important extrinsic factors (i.e. the factors which remain outside the test itself) influencing the reliability are:
(i) Group variability:
When the group of pupils being tested is homogeneous in ability, the reliability of the test scores is likely to be lowered and vice-versa.
(ii) Guessing and chance errors:
Guessing in test gives rise to increased error variance and as such reduces reliability. For example, in two-alternative response options there is a 50% chance of answering the items correctly in terms of guessing.
(iii) Environmental conditions:
As far as practicable, testing environment should be uniform. Arrangement should be such that light, sound, and other comforts should be equal to all testees, otherwise it will affect the reliability of the test scores.
(iv) Momentary fluctuations:
Momentary fluctuations may raise or lower the reliability of the test scores. Broken pencil, momentary distraction by sudden sound of a train running outside, anxiety regarding non-completion of home-work, mistake in giving the answer and knowing no way to change it are the factors which may affect the reliability of test score.