Why Workshops, Playbooks, and Good Intentions Will Never Scale Your Experimentation Programme

The industry experts keep prescribing the same remedies. They keep not working.

If you lead an experimentation programme, a research function, or the executive team accountable for either, you have almost certainly tried some version of this sequence:

You hired capable people. You invested in testing tools. You wrote a playbook. You ran workshops. You created Slack channels, hosted demo days, built onboarding materials, and maybe even flew everyone to the same city for a two-day alignment session.

And then you waited for it to work.

This is what it looks like when an organisation tries to scale evidence-based decision making through culture alone. The intentions are sound. The approach is structurally incapable of delivering what you need.

The pattern that keeps repeating

Working closely with experimentation teams across different organisations, one pattern appears with remarkable consistency: quality variance between teams within the same company. One team runs rigorous experiments with clear hypotheses, proper sample sizes, and honest analysis. Another team in the same organisation, with access to the same playbook, the same tools, and the same training, produces work that would not survive five minutes of scrutiny.

The centre of excellence sees this. They know exactly where the gaps are. But they cannot close them. They can advise. They can coach. They can send reminder emails and host refresher sessions. What they cannot do is enforce a standard, because they have been given responsibility without authority.

This is the structural trap that most experimentation programmes fall into. The people closest to the quality problem have no mechanism to solve it. The people with the authority to solve it often do not see it, because the rituals and reporting designed to create visibility are showing them a curated version of reality.

The problem is not effort.

Workshops, playbooks, training sessions, and rituals all belong to the same category: they are passive instruments. They describe what good looks like. They do not enforce it. They do not verify it. They do not connect it to anything upstream or downstream.

A playbook can tell a team how to write a hypothesis. It cannot ensure the hypothesis actually gets written before the test is configured. A workshop can teach a product manager how to interpret statistical significance. It cannot prevent that same product manager from shipping a variant based on a directional result three days later. A demo day can celebrate a well-run experiment. It cannot surface the twelve poorly run ones that nobody submitted for review.

The gap between describing good practice and operationalising it is not a communication problem. It is an infrastructure problem. And no amount of better communication closes an infrastructure gap.

Three specific ways passive scaling fails

1. Playbooks cannot enforce themselves

This is the most fundamental and most frequently ignored limitation. A playbook is a set of guidelines. Guidelines are, by definition, optional. They rely on individuals choosing to follow them, remembering to follow them, interpreting them correctly, and applying them consistently under time pressure and competing priorities.

In organisations where experimentation teams have spoken candidly about this, the same observations surface repeatedly. Process documentation collects digital dust. Teams skip steps when deadlines close in. Quality standards become suggestions rather than requirements. The people who are most diligent about following the playbook are typically the people who needed it least.

The centre of excellence or programme lead who wrote the playbook has no mechanism to know whether it is being followed. They find out when something goes wrong, if they find out at all.

2. Workshops create a moment, not a system

A well-run workshop can shift understanding. It can create alignment in a room. It can give people language and frameworks they did not have before. These are genuinely valuable outcomes, and dismissing them would be unfair.

But a workshop is a point-in-time event. The understanding it creates begins to decay the moment participants walk out the door. New team members join who were not in the room. Priorities shift. The urgency of day-to-day delivery displaces the principles that felt so clear during the session.

Organisations that rely on workshops to scale experimentation governance are essentially running a system where quality depends on collective memory. Memory fades. Turnover erases it. Pressure distorts it. The workshop becomes something people vaguely remember attending rather than something that structurally shapes how work gets done.

3. Rituals become theatre

Demo days, experiment review meetings, insight-sharing ceremonies: these are the rituals that experimentation programmes adopt to create visibility and accountability. The logic is reasonable. If teams know their work will be reviewed publicly, quality should improve.

In practice, something else tends to happen. The experiments that get presented are the ones that look good. Inconclusive results, flawed designs, and abandoned tests quietly disappear from view. Teams learn which stories play well in the room and optimise for presentation rather than rigour. Programme leads lose sight of the full portfolio because they only ever see the curated version.

The ritual creates a feeling of oversight without the substance of it. This is arguably worse than no ritual at all, because it gives leadership false confidence that the programme is healthy when the underlying evidence quality may be deteriorating.

The deeper issue: the Evidence Gap

All three of these failures are symptoms of a structural disconnect that sits at the heart of most experimentation and research programmes. Organisations generate evidence through experiments, user research, and data analysis. They also make strategic decisions about products, features, markets, and investments. But the connection between those two activities is almost always informal, incomplete, and unreliable.

This is the Evidence Gap. It operates across three layers:

The first layer is evidence integrity. Are the experiments and research studies being conducted to a standard that makes their conclusions trustworthy? Playbooks attempt to address this layer, but without enforcement they cannot guarantee it.

The second layer is evidence synthesis. When multiple experiments and research studies touch the same question, can the organisation bring those findings together to form a coherent picture? Workshops and knowledge-sharing rituals attempt to address this layer, but without infrastructure they produce anecdotes rather than synthesis.

The third layer is decision influence. When reliable, synthesised evidence exists, does it actually reach the people making the decisions it should inform? And does it reach them in time, in a form they can act on? No passive scaling method even attempts to address this layer.

Governance is the connective tissue that runs through all three layers. Not governance as bureaucracy, but governance as the system that ensures evidence is trustworthy, connected, and influential. Without it, the organisation is running on hope: hope that teams follow the playbook, hope that insights get shared, hope that the right person sees the right evidence before the decision is already made.

None of this means playbooks and workshops are worthless

It would be easy to read everything above and conclude that playbooks, workshops, and training are a waste of time. They are not. They are essential. A team that has never been taught how to write a hypothesis, how to interpret results, or how to design a valid experiment cannot be governed into competence by infrastructure alone. You have to build the knowledge first.

The problem is not that organisations invest in these things. The problem is that they stop there. The playbook gets written and the job feels done. The workshop gets delivered and the box is ticked. The starting point gets mistaken for the destination.

Playbooks, workshops, and training build the foundation: shared language, baseline competence, awareness of what good looks like. But a foundation is not a building. Without what comes next, the foundation sits exposed to every organisational pressure that erodes standards over time: turnover, deadline pressure, competing priorities, and the simple human tendency to take shortcuts when nobody is watching.

What comes next is not more training or a better playbook. It is two things that turn the foundation into something that holds.

Two things change this. Only two.

Both are necessary. Neither is sufficient alone.

1. Executive Mandate

Infrastructure without executive mandate is shelfware.

You can build the most sophisticated governance system imaginable, and if nobody with organisational authority has said “this is how we work,” teams will route around it the moment it creates friction.

Mandate is the organisational decision that evidence quality and decision integrity are not optional. It is not a request from the centre of excellence. It is not an invitation to adopt best practices. It is a structural commitment from someone with the authority to make it stick across teams, business units, and reporting lines.

This is precisely why the centre of excellence model struggles. A CoE typically holds expertise but not authority. It can recommend standards but not require them. It can identify quality gaps but not close them. When a product team decides that speed matters more than rigour this quarter, the CoE has no lever to pull. Executive mandate gives them one.

Mandate does not mean heavy-handed control or bureaucratic gatekeeping. It means clarity. This is the standard. This is how evidence quality is assessed. This is how experiments connect to decisions. When that clarity comes from leadership with genuine organisational authority, the playbook stops being a suggestion and starts being the way things work.

2. Infrastructure

Mandate without infrastructure is a speech. You can declare that evidence quality matters, and without systems to enforce, verify, and connect that evidence, the declaration decays into another set of good intentions within weeks.

Infrastructure is what makes the mandate operational. It is the difference between saying “every experiment must have a valid hypothesis” and structurally requiring one before a test can be configured. Between saying “insights should be shared across teams” and building a system of record where findings are connected, searchable, and cumulative. Between saying “evidence should inform decisions” and routing evidence to the people who need it, connected to the decisions it should inform, with a clear audit trail showing whether it was used.

For evidence integrity, infrastructure means quality gates embedded in the workflow. A hypothesis is evaluated before a test can proceed. Sample size calculations are required, not recommended. Implementation verification happens as a step in the process, not as a best practice someone may or may not remember.

For evidence synthesis, infrastructure means a living system that builds organisational knowledge automatically as work is completed. Not a Confluence page that someone updates when they have time. Not a quarterly insights deck that is outdated before it is finished.

For decision influence, infrastructure means evidence is surfaced to the people who need it, connected to the decisions it should inform. Leadership does not have to ask for updates. The system delivers what matters.

Why both

Mandate without infrastructure produces frustration. Leadership says quality matters, but practitioners have no system to deliver it consistently. Standards exist on paper. Compliance depends on individual effort. The quality variance between teams persists because there is nothing structural to close it.

Infrastructure without mandate produces adoption failure. A well-designed system sits unused because teams were never told it was required. The CoE champions it. A few enthusiastic teams adopt it voluntarily. The rest carry on as before, and the quality variance between teams persists for exactly the same reason.

Together, they close the gap. Mandate creates the organisational commitment. Infrastructure makes that commitment operationally real. The playbook stops being a document people might read and becomes the enforced standard of how evidence-based work gets done.

Before you buy anything new: audit what you already have

Most organisations already have tools. The question is whether those tools are providing infrastructure or just digitising the same passive instruments that were not working on paper.

Run through these questions honestly. They map to the three layers of the Evidence Gap, and the answers will tell you whether your current setup is genuinely enforcing standards or simply making it more convenient to ignore them.

The uncomfortable pattern

If most of your answers point to manual processes, optional fields, and “someone would need to check,” your tools are not providing infrastructure. They are providing a workspace. A workspace where good practice is possible but not enforced, where synthesis depends on individual initiative, and where evidence reaches decisions only when someone remembers to carry it there.

That is not infrastructure. That is a more expensive version of the playbook.

The question for leadership

If you are accountable for the outcomes of your experimentation or research programme, the question is not whether your teams are talented or your playbook is comprehensive. The question is whether you have both of the things that actually change how evidence connects to decisions.

Do you have mandate: a clear, authoritative commitment that evidence quality and decision integrity are not optional?

Do you have infrastructure: systems that enforce, verify, and connect evidence across the organisation without depending on individual memory or motivation?

If either is missing, you are running on hope. Hope is not a governance model.

The organisations that will close the Evidence Gap are not the ones with the best workshops or the most detailed playbooks. They are the ones where leadership provides the mandate, and infrastructure makes it the default.

Manuel da Costa

Founder of Efestra - The Experimentation Governance System that is enabling organizations to make better decisions through experimentation

Efestra's Decision Intelligence Suite

Confident experimentation

Audit your evidence

Research meets decisions

Governance at scale

Meet MOSAIC - AI that helps you

Discover the gaps between the evidence your teams generate and the decisions it should be reaching.

Back to Blog

Experimentation Adoption