If this week’s blogging has taught us nothing, it is to be wary of what the headlines of research tell us and read the whole thing. (Unless it is research into what meat is in the burgers you ate last night, in which case best not to read it)
This is research published in the International Journal of Conflict and Violence, which sounds like bedside reading for a ruler of a Middle East country, but I assume is broadly anti conflict and violence, rather than photoshoots of girls with kalashnikovs. I daresay that they still have a hard time booking a venue for a Christmas party though.
But this is some research into whether Triple P, which has a lot of Government goodwill backing, takes a lot of public money, and which I spend hours per week hearing people whine about needing a referral to Triple P, actually makes any difference, and it suggests (at least in the headlines) not. An important word in research terms, randomised, comes up, which is promising. You need the study to be randomised so that the researchers didn’t come in with an agenda and pick the hundred best clients Triple P had, or the worst.
[Of course, as these were studies in Birmingham, perhaps it doesn’t tell us much more than one of (a) The parents who use Triple P in Birmingham aren’t responsive to it, (b) the social problems in Birmingham don’t respond all that well to Triple P – and as I recall substance misuse and crack cocaine were pretty frequent issues in cases there or (c) The Triple P deliverers in the Birmingham area are not doing it quite right. ]
But let’s read more.
The Impact of Three Evidence-Based Programmes Delivered in Public Systems in Birmingham, UK
The Birmingham Brighter Futures strategy was informed by epidemiological data on child well-being and evidence on “what works,” and included the implementation and evaluation of three evidence-based programmes in regular children’s services systems, as well as an integrated prospective cost-effectiveness analysis (reported elsewhere). A randomised controlled trial (RCT) of the Incredible Years BASIC parenting programme involved 161 children aged three and four at risk of a social-emotional or behavioural disorder. An RCT of the universal PATHS social-emotional learning curriculum involved children aged four–six years in 56 primary schools. An RCT of the Level 4 Group Triple-P parenting programme involved parents of 146 children aged four–nine years with potential social-emotional or behavioural disorders. All three studies used validated standardised measures. Both parenting programme trials used parentcompleted measures of child and parenting behaviour. The school-based trial used teacher reports of children’s behaviour, emotions, and social competence. Incredible Years yielded reductions in negative parenting behaviours among parents, reductions in child behaviour problems, and improvements in children’s relationships. In the PATHS trial, modest improvements in emotional health and behavioural development after one year disappeared by the end of year two. There were no effects for Triple-P. Much can be learned from the strengths and limitations of the Birmingham experience.
I’m not familiar with the Incredible Years model, but that seems to be pretty good, PATHS not too bad, and Triple P negligible. Now, if that turns out to be a reputable and replicable study (by which I mean someone else using those methods anywhere else in the country would get similar results) that’s a big deal. Firstly, as I said, Triple P gets a lot of State funding, and is a regular go-to resource. Secondly, parents pretty much only get one shot at an intervention to improve their parenting, so if what we’re sending them to isn’t as good at making a difference as it should be, that’s pretty important.
All this with the caveat that I am not a qualified interpreter of research – I am feeling more and more that we need a Ben Goldacre type to tackle the research that’s going around in family justice to tell us whether the conclusions are robust and fair.
As we know, Birmingham is a very large local authority, the biggest in the country, and it has a range of social problems, so it doesn’t seem like a bad pick for the area. It’s not like it has been skewed by picking Saffron Walden say and claiming that this is representative of the country at large.
Another telling thing is that this research is published, which means it is peer-reviewed and checked over, rather than just being something these guys have written and sent out a press release about. That makes me feel more reassured – the first 4 things I look for are :-
1. Is it peer reviewed
2. Is it randomised?
3. Did the initial sample get skewed?
4. Was it funded by someone with a vested interest in outcome?
And none of those red alarm bells are set off
My next concern is whether you can objectively measure a change in parenting or behaviour, or whether that is by its nature subjective, and if the latter, who is measuring it?
Lets see what they say:-
All of the evaluations applied the “intention to treat” principle, meaning that results include those children, parents, or schools that dropped out of the study. The findings therefore
reflect what happens in real-world situations, with many intervention recipients either not starting or not completing an intervention paid for by the local authority. Each
of the trials used a “waiting list” design, meaning that children or schools not receiving the intervention were given priority to receive it in future if the results of the evaluation
were positive. Children in the control conditions received “services as usual”, which in some cases involved substantial support – for example, the SEAL (Social and Emotional Aspects
of Learning) programme in the case of the PATHS trial. Participants in the programme groups could also continue to receive services as usual – that is, no services were
withdrawn – although it is acknowledged that logistically this may have been difficult (for example, if PATHS lessons
used curriculum time previously allocated to SEAL).
Typically, experimental evaluation is expensive. In order to reduce costs, the Social Research Unit sought only to replicate the findings established in other trials, thereby collecting considerably less data than is usually the case. The experimental approach was taken, randomly allocating units to control and intervention groups. Sample sizes reflect
a calculation of the statistical power needed for any programme effect identified by the evaluations to be greater than chance. Robust measurement was also required. These elements are typical of a good RCT. The focus on replicating findings from other trials offers a different angle, however. Specifically, the data collection was restricted to the factors in the logic model underpinning the evidence-based programme, including the risks targeted, the fidelity of implementation of core elements of the intervention, and the outcomes sought. Other hypothesised
moderators and other contextual information are excluded. The net result is a high-quality evaluation with less data and therefore less cost.
Well, the important thing here is that they didn’t discount those who dropped out – a common trick with research is to not include anyone who doesn’t finish the course of treatment, which of course skews out those people who didn’t feel it was working or had unpleasant experiences. I don’t know enough to know that this is bulletproof, but it is not yelling out at me that there are massive holes in it.
The evaluation of Triple-P was a parallel randomised controlled trial, with pre-post test design. It involved 146 children aged four–nine years whose symptoms indicated a
potential social-emotional or behavioural disorder, determined using the “high need” threshold on the SDQ “total difficulties” score (17 or above out of 40). The sample
comprised 105 boys and 41 girls. The mean age was 82 months (SD = 21). The sample also comprised a high proportion of low-income families: 62 percent of children
were entitled to free school meals compared to 33 percent for Birmingham as a whole.
The parent(s) of half (73) of these children were randomly assigned to attend Triple-P parenting groups, with the remaining half placed on a waiting list and receiving services
as usual. Researchers performed the randomisation foreach eligible child using an online programme, designed by NWORTH. Children were randomised on a 1:1 ratio, using
a dynamic allocation method, stratified by age and sex.
Baseline (Wave 1) data was collected on all children. Follow-up (Wave 2) occurred six months after baseline and included 137 children.2 The programme was delivered to
intervention group parents at some point during those six months. The missing nine cases (three control, six intervention) were made up of two formal withdrawals from the study and seven that could not be contacted. The primary outcome instruments were the SDQ and ECBI. Parenting behaviour was measured using the Arnold and O’Leary Parenting Scale (APS). Estimated mean differences were used to calculate the impact of Triple-P. ANCOVA tests controlled for children’s start scores on respective measures, the age and sex of the child, and the area from
which families were recruited.
I won’t put the figures in, because they’re a bit complex, without reading the whole report for yourselves.
But this is the telling paragraph in relation to Triple P
4.3. Standard Level-4 Triple-P
As Table 3 illustrates, the results for this programme are not promising. Children of parents attending Triple-P sessions improved their behaviour and were happier sixmonths after the course concluded, but at roughly the same rate as children in the control group receiving services as normal. These results are not consistent with most other Triple-P trials around the world. However, as far as we are aware, only four randomised trials (including this one) have been undertaken independent of the programme originator (see also Gallart and Matthey 2005; Hahlweg et
al. 2010; Malti, Ribeaud, and Eisner 2011). When these four studies are viewed together, the evidence of impact on child development is equivocal.
What that is obliquely saying is, that although there might be a great deal of research that Mars bars are really good for you, if you strip out all the research commissioned by Mars, it turns out the research shows that they’re not that good for you.
Their results were that the improvements and changes were no better in the Triple P group than in the group who didn’t have it – a PPPlacebo effect perhaps? Food for thought at least.
Just few brief points since it is very long since I used statistics.
In evidence based assessments of research there is a hierarchy of types of studies on which to base judgement.
The best trials are double randomised controlled ones.
Peer review is not quite so ‘objective’ a system as people who do not do regular research think.
One must be very critical and knowledgeable to judge a research paper well.
I have not read the one you have written about so cannot make comment- except small sample sizes and judgements of families/ children’s behaviours on rating scales always strike me as somewhat subjective. Validation again is something that needs to be looked at in several contexts. E.g. is the behaviour cultural? The validation tools may reflect validation on specific groups only.
You can see I have a high degree of scepticism about social research (from experience).
Thank you. I would freely acknowledge that my knowledge base on research is strictly amateurish. I do think this tranche of research is just revealing that we ought to be commissioning some proper robust research of all the many things that we are throwing resources at to see if they objectively work or could be done better.
If you haven’t already seen this review on the Triple P evidence, this may be of interest: http://www.biomedcentral.com/1741-7015/10/130
Oh thank you very much Louise, that’s an interesting piece. I like the suggestion that given how much money is being spent on such intervention programmes, that perhaps more rigorous testing of efficacy should be used. (I note their suggestion that it be more in keeping with that applied to pharmaceuticals)