Skip to main content
Product & EngineeringHow-To
10 min read
Updated 3/16/2026

How to Run Your First A/B Test

Design, implement, and analyze your first A/B test to make data-driven product decisions. Learn the fundamentals of experiment design, statistical significance, and when to call a test.

Before You Start

  • 1

    Product analytics already set up with event tracking

  • 2

    Enough traffic to reach statistical significance (at least 1,000 visitors per week)

  • 3

    A specific hypothesis you want to test

Step-by-Step Guide

1

Formulate a clear hypothesis with a measurable outcome

Write your hypothesis in this format: 'If we [change], then [metric] will [increase/decrease] by [amount] because [reason].' For example: 'If we change the CTA button from blue to green on the pricing page, then click-through rate will increase by 10% because green creates a stronger visual contrast.' Pick one primary metric to measure. Having multiple primary metrics dilutes your results.

Focus your first test on a high-traffic page with a clear conversion goal. Testing on low-traffic pages takes months to reach significance.

2

Calculate sample size and test duration

Use an A/B test calculator (Evan Miller's is the standard) to determine how many visitors you need per variation. Input your current conversion rate, the minimum detectable effect you care about (typically 10-20% relative improvement), and your desired statistical power (80%) and significance level (95%). Divide the required sample by your daily traffic to get the test duration. Most tests need 2-4 weeks.

Never peek at results early and stop a test because one variant looks like a winner. This leads to false positives. Commit to your calculated duration.

3

Set up the experiment in your testing platform

In PostHog, create a new experiment and define your feature flag variants (control and treatment). In Optimizely, use the visual editor or code-based experiment builder. Assign users randomly and persistently: once a user sees variant B, they should always see variant B. Set your traffic allocation (start with 50/50 split). Target the specific page or user segment defined in your hypothesis.

Use feature flags rather than deploying separate code paths. This lets you turn experiments on and off instantly without a code deploy.

posthogoptimizely
4

Implement the variant and QA thoroughly

Build the treatment variant (the change you are testing). Test it in staging to ensure both variants render correctly, tracking events fire properly, and there are no broken layouts or errors. Verify the experiment assigns users correctly by clearing cookies and checking variant allocation. Test on mobile, desktop, and the top 3 browsers your audience uses.

Ask a teammate to review the variant without telling them what changed. If they cannot spot the difference, your change might be too subtle to move the needle.

5

Launch, monitor, and analyze results

Start the experiment and monitor for the first 24 hours to catch any technical issues (error spikes, broken layouts). Do not look at conversion data during this period. After your pre-calculated duration, check results. Look for: statistical significance (p-value below 0.05), practical significance (is the lift meaningful for your business), and consistency across segments (mobile vs desktop, new vs returning users).

If the test is inconclusive, that is still a result. It means the change does not matter enough, and you should test something bolder. Most first tests fail, and that is expected.

6

Document findings and decide next steps

Record your hypothesis, the result, the winning variant, the observed lift, and the confidence level. Share findings with your team. If the treatment won, roll it out to 100% of users. If the control won, revert the change. Either way, use what you learned to inform your next test. Build a testing backlog prioritized by potential impact and ease of implementation.

Create a shared experiment log (Notion or a spreadsheet) where all test results live. This institutional knowledge compounds over time and prevents re-testing the same ideas.

Help us improve this page

Found an error or have a suggestion? We'd love to hear from you.