The National Academies: Advisers to the Nation on Science, Engineering, and Medicine
NATIONAL ACADEMY OF SCIENCES NATIONAL ACADEMY OF ENGINEERING INSTITUTE OF MEDICINE NATIONAL RESEARCH COUNCIL
Current Operating Status
CORE HOMEPAGE

ABOUT CORE

FOCUS OF CORE

CORE MEETINGS, WORKSHOPS & PRODUCTS

RELATED NRC EFFORTS


Workshop on Understanding and Promoting Knowledge Accumulation in Education:

Tools and Strategies for Education Research

Day 1 – June 30, 2003

Remarks by Dr. Michael Nettles

DR. MICHAEL NETTLES: Thank you very much. One of the areas that where we have a great deal of fragmentation and questionable stability is the way we measure achievement, and perhaps - there is no question about the value of the measurement of achievement in our professions and in society and perhaps it is the strongest piece of evidence that we ever present to the broader public, and perhaps the most - arguably the most valuable of all of the measures of achievement that we have is the National Assessment of Educational Progress. This is a longitudinal - I shouldn’t say a longitudinal, but a long-standing assessment of student achievement in the United States beginning in the late ‘60s when the Congress asked Frank Keppell(?), then Commissioner of Education, what was the status and condition of education in the country, and the question couldn’t be answered.

So that was the origin of the National Assessment of Educational Progress. It began with a contract to the Education Commission of the States and was administered and developed as most other tests are developed, where each individual took a whole test.

In the late ‘70s, early ‘80s, really - the Educational Testing Service won the contract for the National Assessment and redesigned it to be a matrix sample.

The lay people on this call it “matrix sampling.” The experts call it “BCB Spiraling,” “Balancing Complete Block with a Spiral Administration,” and the most that we can - the easiest way to describe how it is administered is that it is given to whole classrooms of students and no student takes the whole test, and there is overlapping items among the ones who take it, who participate in it.

Now, one of the limitations of this assessment is that, in fact, it is a sample and no one can get results back, and, therefore, there are always questions about motivation to do well. There are questions about how well we are representing people in different places, depending upon their exposure to assessment.

Now, we could argue that perhaps the gap is filled in by all the other tests that we administer, and we are, for the first time, as a consequence of No Child Left Behind, in part, able to begin to explore this event more; that is, that most states, like Michigan, where I am a resident at the moment, administers a test at the state level to individual students. It is the Michigan Education Assessment Program. At similar grades, it is the National Assessment Program, and, presumably, we can take a look at the relationship and the direction of student performance on MEAP over time, and compare it with student achievement at the state level of NAEP over time and see if they are moving in the same direction and to see if the results are much the same, and we haven’t done that yet, although, now, the data are in to allow us to be able to do that. In fact, the National Assessment Governing Board may be called upon to - by the Secretary of Education to confirm state results. Otherwise, there are all sorts of questions about what happens in 2013, when various things related to this policy are supposed to take place; that is, every student is supposed to reach some level of achievement.

Now, there are other challenges that this panel faces when it comes to measuring achievement and trying to get a sense of the cumulation of knowledge. One is the people who are involved. In every one of these assessments, there are a variety of players. There are people who are educators who can never be left out of this, whether it is for representation of the profession or it has content experts.

Then, there are methodological experts who have to tell me - professional people and the content people - how to structure the assessments, whether it is from the test development part of this enterprise or the psychometric; that is, what the data are supposed to mean and how well they represent what is being measured and who is being measured, and then there are policy experts that we have come to appreciate as scholars over the past two decades, much more than we ever probably could have imagined in the past. I think Frank Keppell may have been the case - the forecast of what was to happen.

One of the greatest incidents occurred - a clash(?) - occurred in the early ‘80s when the National Assessment Governing Board decided that the public had little tolerance or little appreciation for scale scores that the psychometric people have been telling us throughout the history of testing are very important. In fact, NAEP has a 500-point scale. So the National Assessment Governing Board says, “Let’s communicate those scales in three achievement levels,” and the achievement levels are basic, proficient and advanced.

These started off as developmental levels, but, now, Congress has accepted them as basically the best way to communicate student performance in public policy, and, in fact, regardless of how state assessments are administered, if they are confirmed or NAEP is used as a confirming exercise, we have to somehow figure out how to communicate whatever metrics are being used at the state level on the National Assessment achievement levels.

Now, one of the challenges that we have found with the achievement levels is that a sizeable percentage - in fact, the majority - of the nation’s fourth and eighth graders are not performing at a level that the achievement levels are established, and herein lies another clash between experts and public-policymakers where the public-policymakers are prevailing, and that is that it is customary for the experts to establish these score metrics or scales in a fashion that the population is distributed on whatever the scale is. Public-policymakers say, “Well, you tell us what people ought to know and be able to do, and if they don’t reach it, we just, you know, count them as below the lowest level.” Ergo, we have over half of the population of fourth and eighth graders not reaching the basic level of achievement on the National Assessment of Educational Progress, and the governors have been reporting the status and condition of education based on those achievement levels, and the country has established its goals, those achievement levels, proficient, in fact, where less than a third of the nation’s fourth and eighth graders are performing, as a standard that is acceptable for performance.

So policymakers, beyond wanting to be able to communicate and report in a language that we think the public can appreciate, have other ideas in mind as well, and that is the measurement against world-class or what might be considered to be world-class standards, however they are conceived, whether it’s on TIMSS or in some other way.

The purposes also represent a big challenge, and it is our tendency to try to accomplish every purpose with a single instrument, regardless of what it is.

For example, the National Assessment of Educational Progress was originally conceived of as an instrument that would tell us how the nation is performing in general; that is, before eighth- and twelfth-grade levels.

We would like to be able to - it is a sort of natural tendency, I should say, to want to know whether the fourth graders who were measured in the beginning are improving by the time they reach the eighth grade, even though we use different samples over those two points in time, and, again, in the twelfth grade. So this represents a real big challenge on the topic of accumulating knowledge.

People also would like to use the National Assessment of Educational Progress, although not pushing this, as a diagnostic instrument. There is a tendency to want to know what it is really telling us about what students really need to - as a treatment, as a consequence of what we learn, and it is not designed at all to do that.

The other instruments like classroom-based assessments do a lot better job at that, and, yet, those are really not even related to the National Assessment frameworks or construction.

There is also - in addition to measuring student progress, we also want to use the assessments that we have for rewards and sanctions. Now, one could argue that in the No Child Left Behind, in the midnight hour, confirming language was removed, but that is all that it takes for us to begin to think about national assessment being used as an instrument, whereby states could actually receive, say, block grants or any other amounts of money or support for either improvement or punishment or whatever they might - and accountability as well is another in international standards.

Now, a huge challenge that we have in any of these assessments, and the National Assessment is no exception, is equity issues. We have to always raise the question about how fair and equitable the assessments are, particularly by race, maybe to a lesser extent or to some combined extent with social class, and, also, in particularly cases of mathematics and the way we measure things, gender.

So the race issue, though, is one that seems to be prominent in every assessment that we administer. National assessments no exception.

For example, in the case of the National Assessment of Educational Progress - and we talk about core measures. One of the core measures is reading. At fourth grade, the National Assessment of Educational Progress reveals repeatedly that about 80 percent - close to 80 percent of African American youngsters are reading below basic. This is a level - these achievement levels are - no question - subjective, but they are a consensus that is developed by teachers, school educators, school administrators, and the lay public. So an ultimate question becomes, “Are these achievement levels or is the content of - make a real issue regarding to race equity?” and most people think that this is reflecting the difference in the quality of schools, rather than the difference in the content or the construct of the assessments, but this is a question that is persistent and an issue that carries - goes with us over the past two decades.

Now, one of the challenges that the National Assessment will face is the changing definition of race, variations in populations in the states, when we do our sample selections. Differences in other background characteristics are also important in the way we include people and exclude people and states on the basis of different rules and regulations that the states establish for exclusion, and so on.

Now, the final thing I’ll just say is that we also face a challenge in the major focus of this session, which is the commonality of this core of measures that we have. We seem to have, at the National Assessment of Educational Progress, agreed that it is really important to measure reading, writing and science at the national and state levels to use in public policy, but we also have established geography, history, civics, the arts, economics, science and foreign languages, as a matter of fact, as important aspects - subjects to measure.

Now, how much the content standards and the content of these subjects - or the domains that are measured within these subject areas, to what extent they are common with the states is a question that we always raise and not one that has been really clearly addressed.

With that, I’ll take my cue and stop.

RSS News Feed | Subscribe to e-newsletters | Feedback | Back to Top