A critique of the NSF Funded study:
Standards-Based Middle Grade Mathematics Curricula: Impact on student achievement, by Richard T. Lapan, Barbara J. Reys, David E. Barnes, and Robert E. Reys: University of Missouri-Columbia, Columbia, MO  65211

Reviewer: Prof. Wayne Bishop, Math & Comp Sci, Cal State LA (213)343-2159

[No date is given but several bibliographical references are 1998 and the study was based on results of the 1996-7 year.  I have no idea if this paper has been submitted, let alone accepted for publication, but it is being used by districts considering CMP as proof that charges in regard to the impact of reform standards-based mathematics curricula on student performance such as those of Hung-Hsi Wu, whose December 1997 article in the MAA Monthly is referenced, are being addressed satisfactorily.]

Summary: This study should never been released.  It is flawed by design, by implementation, and by conclusion.

Let me start with their conclusions:

"No significant differences were found between the groups with respect to traditional mathematics achievement. However, students in the two standards-based curricula significantly outperformed the control group in mathematics problem solving. No differences in mathematics problem solving were found between students in the two standards-based curricula.  No gender differences in traditional mathematics achievement or mathematics problem solving were found.  Mathematics problem solving scores for African American students in the two standards-based curricula were significantly higher than scores for African American students in the Control group."

The article purports to study two of the NSF funded NCTM Standards-based mathematics curricula, STEM, Sixth Through Eighth Grade Mathematics (Billstein & Williamson) and CMP, Connected Mathematics Program (Lappan, Fey, Fitzgerald, Friel, and Philips).  Page 6 does acknowledge that it is "not a large scale investigation" but that it "begins to address the impact of implementing standards-based mathematics curriculum for middle grade students".  In fact, it does no such thing.  Even with the limitations described below, it is a comparison of three varieties of apples, not of  apples and oranges.  The documentís own words in description of the "control" makes that clear.  It is not an old Addison Wesley , HBJ, or Scott Foresman curriculum, let alone a viable alternative reform such as SRA CMC, Saxon, or Excel:

ďThe Control district is a K12 school district of about 14,000 students in an upper-middle class mid-size university town. It prides itself in having a strong academic program, as do the STEM and CMP districts. The Control district has three newly built middle schools housing grades 6 and 7. The official mathematics curriculum used at grade 6 is the 6th grade book from the elementary series used at grades 3-6. This curriculum has a 1993 copyright date and the publisher states the curriculum is 'based on the NCTM Standards.í Indeed, it has more hands-on activities than a pre-standards textbook. The 6th grade book is organized into 8 modules, all included in a hard bound student text. The first module deals with organizing and summarizing data. This is followed by modules on factors and multiples, fractions, area and perimeter, ratios and percent, geometric shapes, and probability. Self-reports by teachers in the district indicate that the textbook is used to some extent by some teachers and not at all by other teachers. Most teachers consistently supplement the textbook with activities from other resources, including the previously used 6th grade text (a 1987 copyrighted traditional basil text). They reported using materials from a variety of sources to provide opportunities for more skill practice, investigations, and problem solving."

So the curriculum already in place is one from which reasonable teachers are reacting reasonably all across the country - don't get rid of your old books, even if you have to hide them from your principal and the district mathematics curriculum "expert".

Putting aside the fact that the "Control" is neither pre-Standards nor alternatively reform, how good of a study is this? Well, of the 14,000 students in this district, one would assume over 1000 in Grade 6.  This study involves 46 of them.  Out of this group, 8 students were African American so the conclusion that the "African American students in the two standards-based curricula were significantly higher" is a comparison with performance from 8 students already in a Standards-based curriculum.

In fact, well-balanced groups of small numbers of students might give some preliminary information about the mathematical growth of students that would be worthy of reporting (there were 94 in the CMP group and 115 in the STEM) if they were very carefully studied.  Was that the case here?  No such thing.  We get no information on how the 46 were chosen, no correlating verbal data, nor do we get any "before" data on the students, only "after".  Thus, it is impossible to tell from the data how much, if any, of the final results could be attributed to the programs being studied.  Amazingly, the Control didnít even take the same principal test.  From page 13, "The SAT [Stanford Achievement Tests version 9] has been described as one of the most psychometrically sound test batteries for assessing student achievement," so it is all the more interesting that the SAT line for the Control in the resulting data tables is based on the CAT, California Achievement Tests version 5, with their values "linearly transformed to equate with SAT", subtracting approximately 4 points from each CAT score!  The authors document a 1997 study by the Psychological Corporation in justification for this adjustment.  No address nor even city is given for this corporation so the idea is a bit suspect.  With means of over 1000 students hovering close to 50 and with comparable standard deviations (21.8 and 21.2), I assume that the reported scores were NPR for each test rather than a raw score in obvious need of common scaling. If that is the case, the linear scaling subtracting approximately 4 from each CAT score becomes only a "fudge factor" to get a better result.  If 4 is added to each of these SAT scores, we have a mean score more than 10 points above the STEM score though still lagging CMP by 5.3.  For the (8) African American students, the Control then exceeded the CMP by 3.6 and the STEM by over 20 points.

This latter point is interesting independent of any fudge factors.  The African American scores in the Control  lagged the average in the Control by about 10 points.  In CMP, it lagged by almost 20 and in the STEM program by even more than 20.  That is, even on this small data set with inconsistent testing, if one gets an impression from the actual data, rather than from the authorsí summary, and recalls that this is only one short year, it would appear that the Control is not doing as much damage to this sometimes vulnerable community as either of the two NSF-funded projects.

Once again, however, concluding anything at all from this study, either from the raw data or from the  chauvinistic summary, would be a mistake.  There is one conclusion that can safely be made.  The NSF should never again fund a study that involves these authors in any substantial way.  The goal of education research should not be to offer "proof" of  preconceptions but to assess and to evaluate with the results to fall where they may.  Clearly that was not the case.

Respectfully submitted,
Wayne Bishop. Ph. D. Department of Mathematics & Comp Sci.
(323)343-2159