Measurement & Evaluation

What the First Social Impact Bond Won’t Tell Us

Two views on evaluating the pilot SIB in Peterborough, UK.

Social impact bonds (SIBs) are a high-profile innovation in funding public services. The pilot SIB in Peterborough, UK, which aims to reduce recidivism, has been widely watched and—despite not yet producing results—already widely emulated.

Given the international interest in SIBs and similar pay-for-success schemes, it’s important to determine whether the Peterborough SIB works. The Ministry of Justice describes the program’s evaluation method as “the Rolls Royce of evaluation.” However, Professor Sheila Bird of Cambridge University and the UK Medical Research Council says: “[It] might well be a brilliant success; it might achieve little. But we aren’t going to know either way.”

This article sets out how we can determine whether the SIB works or does not work in three respects.

The first is straightforward: whether the investors should be repaid. Determining this will be easy, because it depends solely on the re-offending rate and the contractual terms—both of which will be clear.

Second, we need to consider whether the intervention itself works to reduce re-offending—a central question. Determining this will be more difficult, because this first SIB is using a variety of interventions—only some of these have been evaluated rigorously and the combination has never been evaluated.

The issue is attribution: figuring out whether the re-offending rate amongst the Peterborough prisoners has anything to do with the charities’ work. Both sides agree that the way to see what the charities have achieved is to compare:

  1. The one-year re-offending rates of men with whom the charities work.
  2. The one-year re-offending rates of a group of similar men with whom the charities haven’t worked. This “control group” screens out effects of, say, changes in society, the law, or sentencing procedures.

It’s essential that the “treatment group” and control group are effectively identical beforehand; if they are, the sole difference between them is the program, which alone must account for differences in re-offending rates between the groups. Bird would have liked to have had the treatment group and control group selected at random to ensure that the groups were effectively identical. But this isn’t what is happening. Social Finance says it was impossible: Within the prison, the program is advertised and open to anybody whose sentence is a year or less. Prisoners are used to—and exasperated by—being apparently arbitrarily excluded from things, and neither Social Finance, the nonprofit company that invented social impact bonds and is running the Peterborough pilot, nor the prison governor wanted this program to generate ill-will in that way. Social Finance says that its “investors wouldn’t tolerate excluding some people.” Sheila’s view is that random selection inside prisons (as outside them) is not only possible, but also pretty common.

If randomising prisoners wasn’t possible, the next best option would have been randomising prisons: In other words, several randomly selected prisons would get the program while others wouldn’t, and the re-offending rates of their populations would be compared. Social Finance says that this wasn’t possible either, because the Ministry of Justice would never have allowed a pilot in several prisons at once.

Interestingly, Peterborough prison wasn’t chosen at random, but rather because the prison governor was willing to engage. As Bird remarks, that may indicate an usual trait in the governor, which itself may influence the results. It’s not impossible that a prison governor willing to take on this innovative project is unusually progressive in other respects too: Perhaps Peterborough prison offers other unique programs that could skew the results.

To construct a control group, the bond evaluation uses Propensity Score Matching (PSM), a system often used when samples can’t be randomised. With PSM, you start by figuring out what indicators have historically correlated with eligibility for the treatment (propensity to be eligible). In this case, prisoners at institutions other than Peterborough who have the same “propensity scores” as the treatment group serve as a control group. Social Finance is doing an unusually elaborate PSM by having about ten “control” prisoners for each “treatment” prisoner.

Nonetheless, there are major objections to PSM as a way of attributing any observations. One is that PSM can only ever look at indicators that are observable, such as age, background, and criminal history. Yet it’s often unobservable factors—such as attitude or resilience—that drive behavior.

Another problem is that the only data available for the PSM are what’s stored in the Police National Computer, and it is surprisingly basic. For instance, it can’t distinguish whether somebody has mental health problems or a history of heroin use, which obviously would influence their behavior and the care they need.

Astonishingly, even the Ministry of Justice explicitly acknowledges that the control group may be pointless (see page 7 of this Ministry of Justice document about the evaluation).

The third respect is whether the bond structure itself works. Social Finance says that just the existence of this first bond proves that it is possible. It has defined performance criteria against which a public body agreed to repay and found private donors willing to provide funding based on those criteria.

But when we eventually see the re-offending rates of the treatment and control groups, we won’t know whether to attribute any differences to:

  • Social Finance’s particular mix of interventions
  • The money (The SIB brings in about 1,667 pounds, about $2,540, per prisoner. Bird thinks any prison governor could use that amount to dramatically reduce re-offending. It’s possible that the prison governors could out-perform Social Finance’s program.)
  • The new financing mechanism itself (We won’t know whether it produces better outcomes than if that money had been put into that intervention through, say, a grant program.)

The core problem might be that Social Finance is delivering on a contract: It isn’t doing social science research, which is central to distinguishing between possible causes. But does the difficulty of seeing the effect of the financing mechanism itself matter? Well, not for Social Finance or its donors in this first instance. Their proximate issue is delivering the contractual obligations such that they get paid. But surely it would have been helpful to Social Finance’s future work to show the effect of the SIB mechanism itself.

It certainly matters to the Ministry of Justice, which 1) may end up paying for a service that didn’t achieve anything beyond what that particular prison governor would have achieved without that money, and 2) won’t therefore know what service they should roll out to other prisons if the Peterborough service does apparently succeed.

It matters even more to UK taxpayers who are funding all of this—and hoping that they’re not burgled or mugged. Yet they’re unlikely to object because the intricacies of randomisation and PSM for determining attribution are a shade too complex.

“All these problems could have been averted,” says Bird. She says, for example, that this first SIB could have been tested against a known intervention with a conventional funding mechanism.

And yet, we should not let the best be the enemy of the good. Clearly, we are likely to get better public services when the interests of the provider and purchaser are better aligned, and SIBs are a step in the right direction. Despite the Peterborough SIB’s curious design choices, it has taught us many things—and will teach us many more.

Tracker Pixel for Entry


  • BY Bernadette Wright

    ON April 3, 2013 01:53 PM

    No research design is the “Rolls Royce of evaluation”—different types of interventions and different evaluation questions need different types of research designs. Any single design has limitations. For interventions that work via multiple mechanisms and processes, the best approach is usually to use a combination of multiple methods and data sources to assess what effects the mix of interventions had and how results were achieved.

  • Louise Bennett 's avatar

    BY Louise Bennett

    ON April 4, 2013 01:21 AM

    Don’t see the point of this analysis. You say - we won’t know whether it produces better outcomes than if that money had been put into that intervention through, say, a grant program. The point of SIBs is that the UK government clearly want to give out less grant funding and SIBS offer a way for under-capitalised charities which can’t go into payment-by-result contracts to get support to do this.

  • BY EducationState

    ON April 4, 2013 01:25 PM

    It’s telling that “this article sets out how we can determine whether the SIB works or does not work in three respects” and puts the repayment of investors FIRST.

    Says it all.

  • Hugo Chu's avatar

    BY Hugo Chu

    ON April 4, 2013 02:20 PM

    Whether the scheme work is not an issue. But the point is that the SIB itself shows the innovative paradigm shift on social financing, i.e. from charity to invesment, and shows the value of social partnership. In this aspect, it works already.

  • BY Caroline Fiennes

    ON April 5, 2013 02:06 AM

    Caroline Fiennes (the article’s author) here.

    Education State: the sequencing of the article is journalism, not philosophy. The repayment of investors can be dealt with in a single sentence, whereas the other two respects are so complex as to require ~400 words each. Basic rule of journalism is to avoid drowning the reader with complexity very early on.

    Hugo Chu: What do you mean ‘it works’? How do you know that? For all we know, the interventions / mix of interventions being use is *harmful* (we don’t know because they’ve not been evaluated) and/or the taxpayer will pay £1700 for an outcome which a prison governor could have achieved alone for £5. Those situations don’t sound like ‘working’ to me.

    Louise Bennett: I don’t understand your point. You say ‘SIBS offer a way for under-capitalised charities which can’t go into payment-by-result contracts to get support to do this’, but the charities involved in the SIB are in payment-by-results contracts, by definition.

  • BY Mike Belinsky

    ON April 5, 2013 09:56 AM

    Caroline, I offer some thoughts on your article in a response here:


  • Vineet's avatar

    BY Vineet

    ON April 5, 2013 10:47 AM

    Hi Caroline,

    Thanks for an interesting piece.  Not being an expert on these things (either SIBs or recidivism or RCTs) some observations did nevertheless come to mind (in a personal capacity):

    1.  The point is made a few times that the SIB is using a range of interventions, making it difficult to attribute anything to a specific intervention.  I think a key point of the SIB is in fact to give the NGOs involved the autonomy to be able to use, adapt and course-correct whatever techniques are needed to reduce reoffending.  Reducing reoffending seems to best be a user-centred process, with a spectrum of activities (some inside the prison, some at the prison-gate, and some with subsequent beyond-prison support around housing, jobs, financial inclusion etc).  These appear to require deep local knowledge, an emphasis on the person at the centre, and the ability of the delivery organisation to be flexible.  So, the crux of the SIB would seem to me be that it’s not a specific “intervention”, more a holistic program.  It would be interesting to hear what the underlying NGOs say the SIB offers them in contrast to traditional programmatic contracts or grants?

    2.  Is randomisation inside a prison really that easy?  Given that some of the services involve in-prison contact, how would strict treatment-control separation be maintained (there would appear to be plenty of opportunities for “contamination” or whatever the RCT word is).  Cluster-randomisation would appear more possible, but would have needed a much greater systematic commitment - at odds with a pilot, and perhaps also interfering with prison-level governor autonomy?

    3.  Selection bias w Peterborough having opted in to this is indeed a possibility.  But if the SIB services provided by the NGOs include a lot of post-prison support, then (if the SIB treatment cohort does show reduced reoffending) the alternative explanation would be that it was something unique about Peterborough that was the driving factor in spite of such post-prison support.  That is theoretically possible, but would also appear to be empirically determinable through qualitative analyses: did any of the core prison services change at Peterborough at the same time as the SIB was launched?  If so, then yes any subsequent performance could be influenced by this.  But if not, then isn’t it more likely that the subsequent performance (good or bad) is a reflection of the NGOs executing the SIB?

    4.  The limitations of the items used in the PSM analysis would not appear to be a fault of the SIB, more of the underlying data gathered by police records?  If the omitted items (whether unobservable or missing observable such as addiction or mental health) are randomly distributed across the imprisoned population of the country, then the risk of skewing is much lower.  Unless of course Peterborough is argued to have something distinctive with lower heroin or mental health, or more resilient folks in those shires.  The latter is again theoretically possible, but seems to me to lead to questions of external validity - which even the best RCTs are still exposed to?

    5.  The argument of whether it is the additionality of the funding that drives performance gains is an interesting one.  A fair comparison would indeed be to (a) give the prison service additional per-capita funding (as you point out), and (b) give NGOs similar core multi-year grants (rather than an incentivised, actively managed SIB) and then see how these perform.  The difficulty with (a) is that much of the work is beyond the prison-gate, so how does a prison governor manage such outreach - is this additional work that they are not best placed for, compared to say local NGOs.  (It would also appear to raise interesting questions of whether released prisoners would be socially signalled by being seen to getting post-release visits from prison workers - and therefore their rehabilitation compromised - but I’m not an expert on these matters by any means).  The challenge with (b) is that if such core multi-year grant funding that gives such operational autonomy to NGOs was readily available, it would already be being supplied by grant-givers?

    Thank you for an interesting article - apologies for such a long comment!

  • Julien Lake's avatar

    BY Julien Lake

    ON April 7, 2013 02:15 AM

    I think Louise Bennett makes a couple of important points.

    There is currently no chance whatsoever of the UK government spending grant money on an experiment to reduce reoffending. Budgets are under huge pressure and for this government to grant money to an experiment on cutting reoffending while taxing an axe to police budgets is impossible. However to transfer the financial risk of that experiment to third parties is very attractive and ultimately enabling. This investment simply would not have been made through a traditional grant funding mechanism, or through growth to the prison budget, and so evaluating it in the context of what contribution either of those routes might have made is of limited value.

    It is also important to understand that this is not payment by results in the context of what is happening in the UK at present. The service provider (the NGO) is currently meeting its operating costs with monies from the investors and it is they who carry the bulk of the financial risks.

    Elsewhere (e.g. the Work Programme) providers are carrying financial risks themselves and are forced to meet operating costs and cash flow needs from other sources at considerable risk. While there is undoubtedly a payment on results within the bond structure it is very different from the payment-by-results model being used to support many of the UK government’s other social projects.

    As to the question: does it work? Clearly in this instance it does, something that would have otherwise been impossible is taking place. The value of the actual service interventions is harder to understand but the bond itself works.

  • Mike Venables's avatar

    BY Mike Venables

    ON April 15, 2013 07:10 AM

    Good piece thank you.

    What is also relevant there is the recent statement by HM Treasury that they will be looking very hard at whether PbR is as good value for money as directly funded interventions and the implication that if PbR does not offer better vfm they will not support such programmes.  Given the cost of private investment, absent third party income, it is hard to see how SIBs can deliver equal vfm.

Leave a Comment


Please enter the word you see in the image below:


SSIR reserves the right to remove comments it deems offensive or inappropriate.