The bell curve is to education what the rack is to interrogation.
Both produce unreliable measures of truth.
Jason Norris wrote me yesterday in response to my post “The Dark Side of Education: Testing”. Of immediate interest was my comment on the (abusive) use of a Bell curve in grading. Specifically he asked me to elaborate on this statement: “Teaching, like counseling, is a . . .
. . . direct intervention. The best result, in teaching or counseling, finds everyone excelling.”
Here’s the excerpt he cited:
A student averages a 96% on four course exams, but earns a D in the course, because many students in the course scored better than he did. The grading scale for that semester, given at the end of the semester, was 99% (A), 98% (B), 97% (C), 96% (D) and less than 96% (F). Simple fairness dictated that every student should have received an A. Were the exams too easy, or did the students excel? If the former, the professor should have awarded As to all students and re-written the exams for the next semester. If the latter, the students deserved As. Adherence to an arbitrary distribution of grades (A-F) is sinful, especially when the grade point equivalents are determined after the grades are earned! 
 The faulty reasoning goes like this: there should be few but equal numbers of As and Fs, more but equal numbers of Bs and Ds, and the most falling in the middle with Cs. This is considered “fair.” Let’s apply this evaluation standard to, say, counseling. There should be few but equal numbers of clients who overcome their problems and those who commit suicide, more but equal numbers of clients who get better, and get worse, and finally most clients who fall in the middle: they stay the same. Of course this is nonsense.
Teaching, like counseling, is a direct intervention. The best result, in teaching or counseling, finds everyone excelling. While I have never had a class in which everyone earned As, it is not because of built-in traps and minefields to trip up students. In my most difficult class (research design and statistical analysis), about half earn As (having worked very hard all semester), another third earn Bs, and a few earn Cs. Only a dozen or so students have failed the course over the last 30 years.
So Jason, I understand you to be asking for a little more on the Bell curve, on why it should never be used to establish grades, and perhaps a positive alternative to the practice.
The Bell Curve
The so-called bell curve or bell-shaped curve is a common designation for what statisticians call the Normal Curve. It is a theoretical distribution of random scores, and looks like the diagrams below:
The bell shape of the Normal Curve is obvious. The numbers along the bottom of the curve — just a quick jump into technical details here — are standard (“z”) scores, computed by dividing a raw score (“X”) by the standard deviation of the group of scores.
The “0″ point on the numbered scale reflects the mean, or average, of a group of scores, and divides the group’s scores into two equal halves (50% each). The theoretical relationship between “area under the curve” and “z” is eternal in the universe. In the top diagram above we see that the area under the curve between 0 and -1 is 34% (0.34). The same area is reflected in the positive (+1, 34%). The area between -1 and +1 standard scores is 68%. The bottom diagram shows the area between +2 and -2 standard scores (z) is 96%. (0 to -2, 48%; 0 to +2, 48%). Not shown here is the area between +3 and -3, which is 99.9%.
You will notice that the left and right halves of a bell curve are symmetrical, which means they mirror each other. Give me any positive standard score, which cuts off a given (and eternally stable) area under the theoretical curve, and I’ll cut off the exact same area on the other end, simply by applying the same standard score in the negative direction. That is, +1.77 cuts off the same area in the positive half as -1.77 cuts off in the negative half. Symmetrical.
(Skip this paragraph if you have an aversion for statistic speak.) Let’s say I have a class of 40 students who take an examination. By adding together the scores (ΣX) and dividing by 40 (N), I produce the average score in the class (mean). Using the mean, I can also compute the difference between each score and the group mean (deviation, x). If I add together all the deviations (Σx), I will always get zero. But if I square each deviation and add them together, I produce the sum of squared deviations, or sum of squares. If I divide by N and take the square root, I produce the standard deviation (the degree of variability) in the group of 40 scores. Now I can compute the standard score (z) from any raw score (X) by subtracting the mean from the score (X – mean) and dividing by the standard deviation. These z-scores are used to compute areas under the Normal curve.
Why the Normal Curve Should Never Be Used To Establish Grades
If we were to convert the Normal curve into a grading template, it would look like this:
You may see the grades reversed (A on the left, F on the right) in such diagrams, but this is just one more distortion of the Normal Curve, where scores increase from left to right (reflecting the grade template shown above).
Regardless, this system has a lot to commend itself, at least a first glance. The highest segment of scores receive “A”s, the lowest segment of scores receive “F”s, and the largest group of scores in the middle (the “average” scores) receive “C”s. Such a “scientific” basis for assigning grades must be superior to the unscientific and biased schemes for assigning grades, including any number of types of favoritism. As far as it goes, it is better. But as I briefly stated before, the bell curve has a fatal flaw, a flaw which has plagued students every since it was first used as a basis for grading.
That fatal flaw is found in the requirement that any area in the positive half of the curve must be duplicated in the negative. A true bell-curve grading scheme requires that instructors produce equal numbers of “A”s and “F”s, equal numbers of “D”s and “B”s, with the largest area reserved for the middle “C”s. Lest you think I am creating an artificial straw man — “no one assigns grades this way!” — let me assure you I experienced just such a “gracious” grading scheme, and on more than one occasion in several different schools. Professors using such a bell-driven system did not establish the grading scheme until the end of the semester, after all “grades” had been earned. Then the break-down of post-hoc grades was determined, forcing students into “bell-curve” categories. The story of the 96% D is up close and personal — coming from a faculty colleague’s experience — and is absolutely true.
This arbitrary scheme, based on a gross misunderstanding of teaching, learning, and statistical theory, is sinful because it forces students into an impossible situation. No grading system is established at the beginning of the course to provide guidance for study. The arbitrary system used to determine course grades is created at the end in order to “properly place” students according to their scores and bell curve categories. This practice places great pressure on students to beat the system, to take short-cuts, to cheat and then lie about it. Jesus does not take kindly to the powerful taking advantage of the weak, provoking them to sin.
“But if anyone causes one of these little ones who believe in me to sin,
it would be better for him to have a large millstone hung around his neck
and to be drowned in the depths of the sea”.
So why would one use such a system? There are several common reasons. They know no better. They imitate their professors who used the same flawed system. They want to appear sophisticated in their grading decisions. They fear ridicule from colleagues for being “too easy” on students (ie., “too many As”). One should never “force” student grades into bell curve categories.
A Positive Alternative to Bell Curve Thinking
When I joined my faculty 30 years ago, I found myself assigned to a small academic committee. One of the other members of that committee was a highly respected Old-Head, who had himself served on the faculty for the preceding 30 years. In our conversations, he mentioned that he assigned grades using the Bell Curve. I asked him why. His response surprised me because it was filled with humility and grace. “Who am I, as a mere professor, to determine what constitutes an ‘A’ in my course. I could set it at 90, or 93, or 95. But I have taught this material for decades. How do I know how it finds my students. And so I teach and test, and then I let my students’ scores set the curve. They determine who gets an ‘A’ and who gets an ‘F.’”
As gracious as this sounds, it dies under the weight of the fatal flaw. No matter how hard the “bottom students” study, how much they study, they will earn “F”s in the humble prof’s class, even if they score in the 90s on their exams. Every class has, by definition, students at the bottom. Even if the entire class earns grades above 90%, there will be high, medium, and low 90s. Some few, regardless of scores, are doomed before they crack their brand new textbooks, before they add their first note to their pristine notebooks, before they take their first exams. Faulty weights. Sin in the name of theological education. A 95% “F.”
There is a better way. Many better ways. But all begin with determining what students should learn in a given course. By “learn,” I mean more than “be exposed to.” What facts should be ‘recall-able?’ What concepts should be ‘explain-able?’ What values should be ‘embrace-able?’ What skills should be ‘mastered-able?’ Exams tend to measure factual recall and conceptual explanation best. (Priorities and attitudes, in and out of class, measure student values; practical projects demonstrate student skills). Since grades are often based on written exam scores, we’ll focus there.
1. List key facts, concepts, and principles addressed in the given course.
2. Write specific (ie, measurable) instructional objectives targeting key facts, concepts, and principles in the course. What should students be able to do with these elements? How will you know that they can?
3. Write test items that measure the levels of learning required in the objectives.
4. Create a course sequence that provides information, repetition, explanations, and analyses that prepare students to properly engage exam questions. Answer questions. Explain misunderstandings.
5. Assign grades earned from exam scores without regard to the artificial bell-curve categories. That is, anyone earning 90% or higher earns an A, even if that includes every student in the class. If the course exams prove to be “too easy,” raise the bar by creating more challenging exams. Use of the Discrimination Index assures “fairness” in these more challenging exams. (See chapter 15 in Created to Learn 2 for the DI procedure).
6. Revise outcomes, objectives, and test items as needed, based on student performances on exams and other graded elements. Over the years, incoming student needs will change. Revise accordingly. Continue until you retire.
“Grading on the Curve”
There is a more popular view of “grading on the curve” that has nothing to do with the bell-shaped curve. I have included one version of this view below, just to clarify the distinction between bell-driven grade assignments and grading on the curve. (Click here to see actual website and article)
Grading on a curve
The standard grading scale is: 90-100% =A, 80-89% =B, 70-79% =C, 60-69% =D, and 59% -below =F. Grading on a curve moves that scale to account for any flaw in instruction. The most common method is to take the highest score made on a test and use that as the maximum total score. For example you have a 100 question test, with each question worth 1 pt. You have 10 students that took the test. The scores were as follows:
Student 1: 91
Student 2: 85
Student 3: 82
Student 4: 81
Student 5: 74
Student 6: 74
Student 7: 68
Student 8: 64
Student 9: 57
Student 10: 50
(Since it was a 100 question test the number correct equals the percent correct)
On a normal grading scale student 1 would get an A, students 2,3, and 4 would get B’s, students 5 and 6 would get C’s, students 7 & 8, would get D’s, and students 9 & 10 would get F’s.
If you grade on a curve (using the most common method of taking the highest score and making that the maximum score) you would get the following:
Student 1: scored 91 out of 91 for 100%
Student 2: scored 85 out of 91 for 93%
Student 3: scored 82 out of 91 for 90%
Student 4: scored 81 out of 91 for 89%
Student 5: scored 74 out of 91 for 81%
Student 6: scored 74 out of 91 for 81%
Student 7: scored 68 out of 91 for 75%
Student 8: scored 64 out of 91 for 70%
Student 9: scored 57 out of 91 for 63%
Student 10: scored 50 out of 91 for 55%
So, using the grading curve students 1, 2, and 3 would get A’s, students 4, 5, and 6 would get B’s, students 7 and 8 would get C’s, student 9 would get a D, and student 10 would get a F. Some teachers that have that one brilliant student that always gets the maximum score on the test… and thus blowing the curve, take the second or third highest score and use that as the new maximum. This is the simplest way to grade on a curve. There are other methods that use standard deviation, but I don’t think that is what you are looking for at this time.
It should be clear that “grading on the curve” as described above has nothing to do with forcing a bell-curve distribution on grades. The distribution of scores above is 3 As, 3 Bs, 2 Cs, 1 D, and 1 F (3 3 2 1 1). Under a bell-driven system, the 10 grades would need to be (1 2 4 2 1) — 1 A, 2 Bs, 4 Cs, 2 Ds, and 1 F. “Grading on the curve” merely inflates the entire distribution so that “top score” is taken as “maximum” score. And yet, the confusion between the two is not uncommon.
For those of you who cannot get enough of this riveting material, check out the following website where a professor explains to his students how he computes his “grade curve.” While his approach is closer to the theoretical Normal Curve (I enjoyed his discussion of mean and median), it presents us with another serious problem. A student in his class can make a course average as low as 44% (see “Grade Cutoffs”) and receive a passing grade. Doesn’t seem to be much of a standard of excellence — do you want the pilot of your plane or the surgeon preparing to remove your brain tumor to be a 44%-er? Still, this is another case of a non-bell-curve “grading on the curve.” We do well not to confuse the two.
Scripture and the Bell Curve
For those seeking a Scriptural rule on the matter, the following references call for honest dealings in business. The warnings are given to dishonest merchants who use two sets of weights, one lighter, the other heavier, to defraud suppliers and customers. When buying grain, they would use the heavier weights in the hand-held balance to get more grain than honest weights would allow. When they sold the grain, they would use the lighter weights, providing less grain than honest weights would allow. These merchants gained more profit than was due them, and Scripture declares that God hates such dishonesty.
Use honest scales and honest weights, an honest ephah (dry measure, bushel) and an honest hin (liquid measure, 5 liters).
I am the Lord your God, who brought you out of Egypt.
Do not have two differing weights in your bag — one heavy, one light.
The Lord abhors dishonest scales, but accurate weights are his delight.
Honest scales and balances are from the Lord; all the weights in the bag are of his making.
Differing weights and differing measures — the Lord detests them both.
The Lord detests differing weights, and dishonest scales do not please him.
Shall I acquit a man with dishonest scales, with a bag of false weights?
In the context of the Christian academy, these verses call for grades to honestly reflect the learning students achieve, not some ill-applied statistical template.
The Nature of the Bell Curve Invalidates It for Grade Assignment
The very nature of the bell curve invalidates it for grade assignment duties since it is based on random numbers. The natural occurrence of descriptive measurements fall into normal (bell-shaped) distributions, especially when the number of measurements is large. But learning in an academic setting is not “natural.” It is an intentional intervention. When professors teach, the expectation is that learners will learn. While there are differences among achievements of students, these differences do not occur randomly. To force skewed achievement distributions into bell-shaped categories is simply wrong-headed.
In short, let grades fall where they will. Base grades on performance, without applying an inappropriate theoretical model of “fairness.” The better you organize your courses, the better you teach, the more effectively you intervene in student learning, the LESS the grade distribution in your courses will fit a Normal Curve. It will form a curve with high grades clustered to the right and a long tail stretching to the left (technically called a “negative skew,” shown below). To return to the counseling analogy, the negative skew is our desired goal in extending help, where many overcome their problems and few wallow in personal failure.
To force highly motivated, achievement-oriented students (“called of God to prepare for a lifetime of ministry”) into lower grade categories — for no other reason than to make them fit The Bell — destroys their motivation to learn. A bell-driven system converts success-seekers into failure-avoiders. Learning is replaced by “time-served.” Joy is turned to cynicism. Learners so abused grow up to be teachers who abuse others. Natural influence exchanged for unnatural power.
My goal is to create challenging courses, enable outstanding learning, and produce grade distributions with high negative skews.
Set high standards, help your students achieve them, and celebrate their victories. Reject the theoretical bell curve for the unfortunate, inappropriate, and misguided standard it demands.
Jason Norris is a PhD student majoring in Foundations of Education at Southwestern Seminary, Fort Worth, Texas. To reach his website, click here