|
DR. GREENO: I will introduce both Greg and George and then they can pass the baton seamlessly. Greg is an economist. He works at Northwestern where he is a chair and professor in the School of Education and Social Policy and he is going to tell us about a couple of projects that he is affiliated with, the New Hope Project and Moving to Opportunity both of which are designed to ameliorate the intensive poverty.
George is a sociologist. He works at AIR in Palo Alto. He is a senior vice president of AIR and he, over the years has managed several major projects on evaluating and analyzing additional programs.
I worked for him in one of these when the National Academy of Education was conducting a series of studies on NAEP and he has also worked on the evaluating the California class size reduction.
So, please begin, Greg?
DR. DUNCAN: It is a pleasure to be here and I would like to thank many people who are interested in talking about mixed methods research.
The studies that I want to talk about — I am very much concerned with achievement-oriented outcomes but there are different contexts in which educational achievement happen. Certainly schools are the most important but families matter also and neighborhoods matter as well.
So, the examples that I am going to choose are intervention projects not within schools but within families: New Hope which is my primary focus and also a neighborhood oriented intervention, Moving to Opportunity. I have chosen to focus on these because these are the interventions I have been involved with. So, that is my intimate knowledge.
The paper I think was circulated to people. It is going to be published. Steve Gibson is a co-author. Let me just give an overview of the talk.
My bottom line is a very positive one with regard to combining the experimental mode and a certain kind of qualitative mode of understanding.
I want to talk in the context of the examples that I know about key design features for the qualitative part of the work that was most important.
I want to provide some illustrations about how mixed methods help to understand program effects, how they fed into future waves of quantitative data collection and then I also want to talk about the limits of mixed methods in the context again of experimental research.
New Hope is a really interesting community-based project in Milwaukee. It was developed by a group of very progressive community organizers who, long before welfare reform, saw work as really a key for supporting low-income families. They developed this and were talked into, much against their initial inclination, adopting a random assignment evaluation strategy.
It ran between 1994 and 1995. Its enrollment ran then, and it enrolled about 1400 families altogether assigning half to the experimental group, have to the control group.
Treatment itself is very simple. The offer was to families if you work 30 hours a week then you are entitled to a whole series of benefits, a wage supplement that brings your family income above the poverty line, health insurance, volunteer subsidy that is providing very high quality here.
A community service job was available to people who needed it if they couldn't come up with 30 hours from their own job search efforts and the whole system was administered by a very supportive set of case workers.
So, by the program designers' standards these are the kinds of benefits that ought to be an ongoing set of supports that are available to all families all the time but they wanted to show what would happen over and above the control conditions which is business as usual in Milwaukee, Wisconsin, what the incremental benefits would be to this kind of intervention relative to the control conditions.
To say business as usual in 1995 in Wisconsin is not quite the right way to characterize it because Wisconsin under Tommy Thompson had one of the most radical welfare reforms that was just being put into place.
So, the control group was being subjected to, I am not sure what the right word is this set of ongoing reform that was taking place. They were not denied any benefits. New Hope families could choose that if they wanted to but they could also choose the benefits provided they could work 30 hours. So, that is the basic treatment.
The larger kind of policy context with this research and other research that I have done with a number of co-authors is to try to shift the welfare reform discussion from its obsession on mothers' work and mothers' welfare receipt to thinking about child well-being and the family process that contributes to child well-being.
I am actually spending the year at the Russell Sage Foundation writing a book about New Hope with the idea that it is a possible model for what the next stage of low income work support ought to be.
We have dramatically reduced welfare rolls and have dramatically increased employment but we haven't dramatically changed family economic circumstances and so New Hope is a model for that.
The evaluation team contained many people. Bob Granger was at MDRC before he became president of W.T. Grant Foundation, a group of graduate students at Northwestern, Aletha Houston was really the primary person behind the child evaluation component. Vonnie McLoyd helped out. The head of the qualitative part was Tom Weisner who is at UCLA, an anthropologist.
It grew out of our collaboration. We talked about how do you get people to get together from different disciplines and talk about multiple methods studies. Ours came from the MacArthur childhood network. Jackie Eccles was chair of that and we also got some money from NICHD not because of the qualitative part at all. The reviewers didn't like that so much. So, we can talk about that part of it a little bit later if you want to.
There was a very extensive quantitative part to the survey which is a program evaluation, a baseline survey. We conducted a series of child family surveys, teacher surveys. We did a Woodcock-Johnson test. We had information about program take up and then we tracked administrative data on earnings and welfare receipt. So, we brought together information from a number of sources and the quantitative analysis impacts are published in Child Development and we hope the 5-year follow-up in developmental psychology is accepted there.
What about the qualitative? So, quantitative was very conventional. The qualitative component ended up selecting 45 families at random from the larger set of New Hope families. So, half came from the control group. Half came from the experimental group and it was a random selection within those two groups.
It was a fairly directed qualitative component. Field workers made six visits per year for years 2, 3 and 4 and then we are just finishing up a follow-up after 8 years. The interviews were really semi-structured conversations covering a set of predefined topics about family life, about work, about school and it was very much driven by Tom Weisner's theoretical orientation which is developed among the Navajo. It is a very cross-cultural kind of orientation where he sees as key the idea of the sustainability of family and he thought that if New Hope would be able to help kids it would be because it would enable families to establish and sustain their routines in a way to accomplish the culturally defined goals that they have.
There is a very theoretical orientation to what information was gathered but it was gathered in a very open-ended way. It is amazing to see really skilled qualitative interviewers because in the course of what seems to be a very, very casual conversation over an hour or an hour and one-half they are touching on all sorts of topics that are the ones that are on the template that we were trying to pull out and the fact that there were multiple visits meant that if you didn't get all the information on a particular topic on a particular visit you could monitor how the field workers were doing and go back to those topics.
So, what are the key lessons from the multiple method effort? One is about design decisions, this idea of randomly sampling the quality of your cases is very unusual I think in qualitative work but it was very, very important. We had endless conversations about if we were going to select 50 or so families how should we do it; should it be only the experimentals because they are really the interesting part?
Should we try to pick exemplar cases that were doing particularly well? We started this a little more than two years into the project. We wished we had been there at the very beginning but we didn't have the funding together for it. So, we were a couple of years in. We knew a lot about how families were doing in terms of work and so forth and we could see some families had gone from not working to working and seemed to be exemplars of how New Hope might be having its effect. We thought we knew a lot of other aspects of these families that we might want to feature in the cases that we selected, but it turns out — this is before we actually had the quantitative data available to us to know the experimental effects — we knew very, very little about what was really going on in the program and so randomly sampling the cases provides an extremely valuable kind of insurance against the hubris of thinking you know what you are doing when you start out on a project selecting cases in this kind of way.
So, it ensures that you are not assigning a zero selection probability to any particular kind of family which turns out to be very, very important and control cases are interesting for a number of reasons. The experimental cases are certainly interesting as well.
Indeed, the data themselves have been used to examine a host of other topics that we hadn't begun to anticipate when we first started the research.
The other key design lesson that I talked a little bit about when I made my comment in the previous session is the importance of having the same people trained in the multiple methods, having the same people be the ethnographers and being the quantitative data analysts. In our program at Northwestern the graduate students receive qualitative training. They receive quantitative training. some receive quite a bit of quantitative training but it was just fascinating to see these very capable graduate students go out, have these extended conversations with the five or six families that they were assigned to and then come back and be doing the impact analysis and trying to reconcile the two.
It is a bit like watching a child grow up in a bilingual household where they kind of know one of the languages and they kind of learn the other language and they get it confused for a little while and then all of a sudden it emerges as a very unified kind of knowledge.
You could just see them trying to understand what at the qualitative level the quantitative results were saying and vice versa and in some cases this played out in papers and dissertations.
Let me give you some examples. One is in trying to understand the basic nature of program events. It turns out that New Hope's impacts on achievement and behavior was very much more positive for boys than girls.
So, the graph that I am about to show shows scores on the academic subscale, the teacher report social scales rating systems. This is the academic subscales, one of the ways that we evaluated children's achievement. By this teacher rating these are, I haven't figured out a way to get the blue off of there yet. I will show you the boys' part in a second.
So, there were no differences for girls. The control boys were far behind the girls. That is a standard kind of gender difference that we get. These are all very poor families living in the two most distressed neighborhoods in Milwaukee and the New Hope effect was for boys, insignificant, for girls, it was very large.
Look at effect sizes. Here I changed the Y axis to showing effect size, right? That difference for boys that we saw in the previous graph translates into a third of the standard deviation impact. That is the difference between the experimental and control.
PARTICIPANT: Greg, can you tell us how old the kids were?
DR. DUNCAN: Thank you. The full set of kids that we tracked were age 0 to 10 at baseline. So, they were 2 to 12 in the follow-up and these results are essentially for all the school-age kids who were a subset, the 5 to 6 to 12 year olds at the 2-year follow-up.
PARTICIPANT: And the Woodcock-Johnson was --
DR. DUNCAN: We didn't do the Woodcock-Johnson at 2 years. That was at 5 years. We actually didn't think there would be achievement effects at 2 years. So, we didn't put the money into Woodcock-Johnson and then we did the teacher survey and we got these big effects which persist by the way at 5 years which is very interesting but very large effects on teacher rating of academic achievement, even bigger effects on the social skills rating system component of boys' positive behavior.
If you asked the boys themselves about whether they expected to complete college, what sort of jobs they expected to get you get positive impacts there. For parents you get positive impacts on their assessments of how their kids are doing, so, across reporters and then year 5 in the Woodcock-Johnson on some of the subscales you get this very consistent finding that the boys are doing very well in the experimental group relative to the treatment group but there is no impact for the girls.
We had quite a number of process-type variables t hat we gathered in the survey. I didn't talk about the model but it was a very reasonably well-defined model of how this treatment was played through first with income and work in the family process and ultimately through child well-being.
So, we measured a number of those kind of family process variables and there were some things that showed up like for formal child care kind of situations there tended to be more impacts, experimental impacts for boys than girls on the extent to which families were using after school programs and child care.
So, the families seemed to be putting their boys more into those programs but the qualitative data really provided a kind of understanding of why this was true.
Here is an African-American mother of four. "Not all places have gangs but my neighborhood is infested with gangs and violence and drugs. My son I am worried about and he may be veering in the wrong direction. It is different for girls. For boys it is dangerous. Gangs are full of older men who want these young boys to do their dirty work, and they will buy them things."
So, these are basically elementary schoolchildren and in elementary school at least gangs are perceived to be much more of a threat for the boys than for the girls and the graduate student who did this who is now an assistant professor at the University of Washington interestingly then came back and investigated this topic on a totally different data set, the National Longitudinal Survey of Youth to try to see to what extent it seemed to be the case that mothers, parents were treating their boys and girls differently, and it turned out that on average that wasn't the case but if you looked at the subset of families living in bad neighborhoods there was a clear bias in favor of boys in terms of the kind of activities that they were getting into.
So, it is experimental. It is an interesting hypothesis revealed by the qualitative data that was further confirmed by a quantitative analysis on a different data set.
Yes?
PARTICIPANT: Could you just go back one because we are having a little trouble back here. It looks to me like it says something like achieving positive behavior and expectations and we are seeing a dark gold bar and a yellow bar. Could you explain just a little bit for the girls?
DR. DUNCAN: Okay, these are estimates of impacts expressed in terms of fractions of standard deviations for experimental girls versus control girls on this teacher report of achievement. It is an insignificant tenth of the standard deviation. For the teacher report of positive behavior it is an insignificant 0.05 of the standard deviation.
For expectations for girls it is essentially no difference between experimental and controls and contrast for boys across those things the differences between experimental and control boys are on the order of one-third to one-half a standard deviation.
PARTICIPANT: So, those with the dark gold bars it is the experimental versus --
DR. DUNCAN: The height of these bars is the difference between the experimental and controls.
PARTICIPANT: What is the difference between the light yellow and the darker gold?
PARTICIPANT: There is a line inside the bar.
DR. DUNCAN: Right. It is my lack of skill with color tones.
(Laughter.)
DR. DUNCAN: I tried to be fancy with this different kind of shading and I obviously failed miserably.
All right, so, that was one nice example where the qualitative and quantitative worked together well. Another doesn't focus on achievement but rather it focuses on the impacts of the program on employment and so that is another important outcome, in particular earnings year by year and overall the employment impacts of this program were fairly modest. These were not all welfare recipients who came in from a situation where none of them worked. Anyone in the community, these were zip code defined areas that had a low enough income could come in and be randomized for the New Hope treatment.
So, some families came in who were already working full time. They were interested in earning supplement because that would have added to their income and if anything they cut their hours back in part for family reasons which is a perfectly reasonable kind of reaction, but it does dilute the overall kind of employment impacts that we could get from this and reminds you that employment isn't the be all and end all of what these programs should be about. But we tried to find subgroups, to find on the basis of baseline characteristics, that distinguished groups where there was an impact or not and it turned out that this classification of families according to the number of barriers they faced — things like not having a high school education, things like having a criminal record, things like having two or more young children in the household, not having a driver's license — it was a diverse set of barriers and an index of those barriers distinguished the group that would have made it anyway. All right, these again are differences between experimental group and control group mean earnings year 1, 2, 3, 4 after baseline, no significant impacts for the people who would have made it anyway.
For the people who had multiple barriers, the program wasn't intensive enough to really help them. It didn't help drug problems. It didn't help mental health problems. It was just this offer. But this group of about maybe 25 to 30 percent of the families that were just one barrier away from success had remarkable impacts. These are thousands of dollars per year and actually if you run this out now we have 8 years of data and these differences persist all the way out to 8 years.
This kind of characterization came from one of the field workers — our initial classification was employed at baseline versus not and that kind of got this difference; people who were already employed didn't have this big impact — she kept worrying about the interviews that she was taking with the multiple barrier families where she didn't seem to be seeing the kind of improvement that you might think would be the case. So, she really dug into this barrier classification and came up with this first kind of conceptually about how people might be one barrier away and then she did it in a very rigorous way quantitatively, trying to nail down the subgroup for which the employment impacts were biggest.
Interestingly the child achievement impacts were consistent across these three groups and so it wasn't just employment gains or the earnings gains that led to kids' improving their achievement.
PARTICIPANT: The negative numbers were people who actually lost money or were in debt or --
DR. DUNCAN: This is the difference again between experimental and controls.
Now, part of what is going on, you always have to be aware of what is happening in the control group, right, and if you look at earnings, if you look at employment, the control group people were just taking off as well because this is welfare reform in Wisconsin, a super-heated economy, right? So, they were going great guns. I think employment rates among the controls went up from about 50 percent to 80 percent. So, earnings went up also and this is showing how much better or worse the experimentals did relative to those people.
Okay, and then here is an example of the qualitative account of somehow who is one barrier away. She, Maria, very much attributes her success to New Hope and it was all the subsidized child care. For other people it was a community service job, right? You could see how this kind of cafeteria of benefits played out differently for different people's lives because they were in different circumstances.
Another thing that turns out to be useful from qualitative data that is made possible by the fact that this is a random sample of qualitative cases is that even if you have only 44 observations you can come up with some useful confidence intervals. If you are, which is often the case, trying to measure something on the qualitative cases through these multiple long conversations, to the extent that you can try to quantify sustainability, let us say — this very difficult concept that was guiding Tom Weisner in designing the sustainability of family routines — you could come up with a measure of that based on the field notes. And then one could use 44 observations with a very broad confidence interval to try to estimate what the level of sustainability was. But the best example really is the qualitative work classifying families according to whether they had children with problems that interfered with the family's functioning, the mother's work — they could come in a whole diverse set of forms, but it is clear in the field notes when children had particular problems that were genuinely causing problems for those families. If you look at the kind of standard survey questions about activities of daily living you get estimates of maybe 15 or 20 percent, but Tom and Cindy Bernheimer estimated more like 65 percent of the families in the combined control group and experimental group had these kind of child-related problems.
So, the confidence interval around 65 percent is very broad but you can at least say we’re quite confident that it is not 10 percent or 15 percent. The same way with drug problems — you could really sense after a while whether a family had drug problems and one belief is that most families in this kind of classification have drug problems but it was showing up in maybe only 5 percent of the families. So, you can't say that it is 5 percent versus 15 percent but you could certainly say that it is not characteristic of most of the families in the study.
So, only if you randomly sample from the larger group can you make that kind of claim. There were also benefits from the qualitative work that generated survey measures. We had a 2-year follow-up at the time of the qualitative work, got money for a 5-year follow-up and some of the kind of measures that we put in the 5-year follow-up came from the qualitative work that happened in between. We also have a nice — actually one of the students did a dissertation about the take up, and there was a lot of very nice back and forth between what the qualitative interviews were saying about why certain families took up the program.
I think the program designers thought that this wonderful program — everyone would take it up all the time and that wasn't the case at all.
Yes?
PARTICIPANT: Did you have any problems with the families that you had randomly sampled choosing not to participate in the qualitative part of the study or was it pretty much you were able to pick those 45 families and they all said, "Great"?
DR. DUNCAN: Eighty-six percent of them said that they would do it for both the treatment and controls. So, what we feared that the control group, and this happened also in the MTL study that I won't have much time at all to talk about but given the full court press you know with informed consent and everything, but people are often desperate to have someone with a sympathetic ear to talk with about the kind of things that you talk about in these kinds of situations.
So, response rates have been very, very high all the way through.
PARTICIPANT: I have found in work in the past that to be the case but to gain that initial entree often is quite difficult.
DR. DUNCAN: Right, but once you get it, it can be very popular
Let me finish and then we will have time for discussion. It is very tempting to try to identify program effects from the qualitative data but the paper talks about this. If you think about the fact that you have got an overlapping distribution on almost all of the variables of interest in experimental and controls you don't know whether your qualitative experimental case is on the left hand side of the distribution or the right hand side and 44 is enough to estimate a pretty broad confidence interval around a point estimate, but once you start dividing experimental controls we really can't do much.
The Tom Cook question is an excellent one. I wish I had an answer to it. I would say allocate scarce research dollars between the quantitative and qualitative and is 44 way too few; is 44 too much? Every dollar you spend on that is one less dollar for other things you might be doing as part of evaluation.
PARTICIPANT: Can you estimate as he did sort of what fraction of the total study was qualitative in cost versus quantitative?
DR. DUNCAN: That is a good question. My impression is that it was on the order of 20 to 25 percent.
PARTICIPANT: Qualitative?
DR. DUNCAN: Qualitative, right. I am guessing. I shouldn't do that. It certainly wasn't most of the budget but it wasn't as intensive as what Tom did.
I don't want to say that the mixed methods are what we need all the time, but in this particular case where you are marrying the random assignment experiment with qualitative methods they certainly help a lot in the ways that I talked about.
What I don't have time to talk about is this companion residential mobility study where we are also doing random sampling. It is a larger number of qualitative cases. These were families that were recruited into a program that enabled them to move to very low poverty neighborhoods and we are doing one round of qualitative research doing conversations with the mothers, with the adolescents, very long conversations, very rich conversations with adolescents, spending time in the schools but not as much time as I am sure most of you would want us to spend in schools, to try to understand one of the key program findings from that. It is a gender difference but it goes in the opposite direction, that it is the girls in the Moving to Opportunity program that seem to benefit more than the boys, but that is for another presentation.
Thank you very much.
|