Chasing the Holy Grail of Outcomes

A Boys & Girls Clubs of the Peninsula volunteer supports a middle school student with her math homework. (Credit: BGCP)

I love talking about my work at the Boys & Girls Clubs of the Peninsula (BGCP), and answering questions about our vision, mission, and programs. That is, until someone asks, “So you’ve been doing this for 15 years. What is your impact?” I wish I had a crisp, punchline response.

When I joined the nonprofit sector 15 years ago, I was confident I would have a succinct answer. I understand the importance of measuring outcomes. I majored in mathematical economics in college, got an MBA, and worked for McKinsey & Company. I love analysis. I studied philanthropy with the Philanthropy Workshop West, Legacy Venture, and SV2. I bought into the gospel of strategic philanthropy.

But the task is more challenging than I expected. On the one hand, based on personal observation, I strongly believe we are providing a valuable service to our community and improving kids’ lives. I can articulate how we are having a positive impact. But despite investing in program monitoring, we still lack a succinct measure of impact. I wonder how many resources we should allocate toward assessing impact and what evaluation approaches will actually help us increase our effectiveness. Clearly we need to do something, but we don’t want to chase an unattainable Holy Grail. Is there a satisfying middle ground?

I want to measure impact for these three reasons:

To improve program design. We want to spend our partners’ resources as effectively as possible, and we’d like a scorecard to guide us and enhance accountability. One of the hardest parts about managing a nonprofit with a broad mission like BGCP is the lack of simple metrics. Without metrics, how can we know which staff and programs are the most effective, and where we should allocate scarce resources?
To increase fundraising. If we could prove our impact, we could raise more money, expand our budget, and serve more students.
To enhance employee morale. Few people acknowledge this, but it’s a big one. When staff—who work crazy hours, and dedicate their hearts and souls to a mission—can see the impact they are having, they are less likely to burn out. My team is hungry for feedback and would respond ambitiously to a real-time scorecard. Even if the results were poor, the challenge to improve and clarify their goals would motivate them.

As we have invested in measuring impact, we have kept running into three seemingly intractable obstacles:

The subjectivity of defining success. BGCP is about raising kids and providing opportunities. While working at BGCP, I’ve been raising three of my own kids. How do I measure my success as a parent? By my kids’ grades and the colleges that admit them? By how well-behaved they are? By how many friends they have? In truth, what BGCP does is comparable to coaching my daughter’s soccer teams. What would I say if a parent asked for the outcomes? I think I did a decent job as coach; the kids had fun, wanted to keep playing, bonded as a team, and learned some life lessons. But that’s my subjective assessment. Another example: Many of us pay thousands of dollars for our own kids to attend summer camps. How do we measure the value of that experience? Is it realistic to expect BGCP to provide this kind of information?
Social service organizations like BGCP address long-term problems. Our ultimate goal is for our students to graduate from high school ready for college or career, and we won’t know if we’re successful with our second graders for at least 10 years. What do we do about the student who comes to us every day for four years, from second to fifth grade, but then stops coming? Very few youth remain with us from age 6 through 18. The students we serve often have little stability in their lives. Many families move out of financial necessity, kids have access to different programs as they change grades, and many high schoolers must work to help their families pay rent. We can measure intermediate successes like avoiding summer learning loss. But that’s not the ultimate goal—it’s a means to an end.
The challenge of distinguishing between causation and correlation. To claim causation would require that we manage a control group and possibly randomization, which is beyond the scope of ours and most nonprofits’ capacity. Did BGCP’s programs make the difference, or was it a teacher at school?

I have reviewed results from countless organizations to find approaches we could replicate, and let me offer this caveat emptor to philanthropists: When reading a nonprofit’s annual reports or other documents, take a look behind the numbers. When you see percentages, understand the numerator and denominator before drawing any conclusions. I’ve seen organizations report that 95 percent of their youth graduate from high school, but they only measure students who are still active at graduation time. Those who drop out of school almost certainly drop out of the program and are therefore not included in the denominator. I recently saw a college access program report that 90 percent of its participants enroll in college. But on closer review, I realized that reflects the proportion of their high school graduates who enroll in college, but excludes students who joined the program as high school sophomores and dropped out during high school, never making it to senior year.

I do not mean to imply that nonprofits are intentionally deceiving donors. Rather, they are under pressure to have succinct and compelling outcomes, and they report what they can. Philanthropists should acknowledge the challenges nonprofits face and avoid celebrating simplistic claims.

Also, be aware that selection bias is the norm; most programs with results select whom they serve. Their constituents may be similar to others in race and income, but they are usually above average in terms of motivation, resilience, or other character skills. My favorite example of this is my alma mater, Harvard Business School (HBS), which reports that its alumni have higher salaries than alumni from other business schools. But is it HBS’s value-add (classroom learning, networking) that results in high salaries? Or is it that its admissions team correctly identifies people who are most likely to make the most money? If HBS has such impact, why doesn’t it have a random lottery to admit students?

I have heard people say nonprofits should be run “more like businesses” and be accountable in the same way for-profits are. But for-profits report income, not outcomes. Every nonprofit leader knows exactly how much money he or she raised and spent. That’s easy. Which companies report outcomes? Does Microsoft report how productivity increased with its software? Does 24 Hour Fitness report on how much healthier its customers are? McKinsey on how much better its clients perform?

We also know exactly how many “customers” we have. This is a reasonable proxy for value creation at for-profits, because customers pay for their own services. But nonprofits have two customers: recipients and funders. Our recipients don’t pay for their services, so demand alone doesn’t prove value creation.

Despite these challenges, at BGCP we continue our quest to become a more data-informed organization through these actions:

Establishing a learning culture that hungers for results. We hire staff who aspire to continuously learn and who crave impact data. Our stars ask the best questions, welcome being challenged, and are constantly seeking ways to increase impact.
Testing a theory of change based on leading research to guide our program design and implementation. This is our roadmap for resource allocation and highlights what we should measure. While long-term outcomes are far away, the theory of change identifies measurable intermediate outcomes that research has proven to drive desired outcomes.
Focusing on execution. As a baseline, we’re clear about which activities we’re committing to do and holding ourselves accountable. This is not a proxy for outcomes, but at least it shows we are running effectively.
Showing impact through stories. Stories don’t replace data, but we use them to test our theory of change and provide valid proof points. Having our students tell their stories in their own words inspires staff, other students, partners, and donors. The stories make our work real.
Surveying all stakeholders, including youth, staff, parents, donors, and partners, and then reviewing that data to identify areas for improvement. Stakeholder satisfaction is an indicator of an effective program.
Committing to complete transparency. We share all of our measures and data equally with all stakeholders. We highlight our weaknesses, where we have failed, and what questions we haven’t yet answered. We share anything we have discussed internally with any external stakeholders.
Investing in an impact and evaluation team that operates at the intersection of program strategy and organizational learning. We need a team free from day-to-day execution challenges to steadily beat the evaluation drum. While the team is strong at data collection and analysis, its greatest value-add is creating space for staff to review, question, reflect, and discuss data to drive program improvements.

Today, 15 years since I joined BGCP, I still struggle with the question of how far to push our evaluation work. What are we trying to prove? That we are changing lives? That we are well managed? I struggle with how many resources to deploy on evaluation, because every dollar we spend there is a dollar less we spend on delivering programs. We don’t want to become a research organization. But at least we know we are executing our plan, asking the right questions, and striving to improve. We will likely never capture the Holy Grail of outcomes, but we are confident we can still do good well.

Read more stories by Peter Fortenbaugh.

Measurement & Evaluation

Chasing the Holy Grail of Outcomes

Create a free SSIR account to access this content.

This article is free.