rom feature flags to canary releases, discover effective production testing strategies

Testing in Production (TIP): What Is Production Testing?

Testing in production enables you to see how your updated product or features will work in the real world. Learn what it is, why it matters, and how to use it.

Table of Contents

                How does testing in production work?

                Testing in production typically involves running tests on the actual system your users interact with. You’ll usually select a small group of users or servers to test your changes on—this means that if any problems crop up, not all your customers or infrastructure are affected.

                Additional options include only testing specific changes (feature flagging) or purposely causing product errors (chaos engineering). We’ll go over these in more detail.

                As the test progresses, you’ll monitor your production environment to see how your product performs. You might analyze load times, error rates, user engagement, and other metrics using accurate user monitoring (RUM) tools. This will give you good insight into how usable and successful the updated product is.

                Once you’re happy with the results and have made any necessary tweaks, you can gradually roll out the changes to more users and servers. You can launch to this broader base more confidently as you know how your product will behave in real life.

                Production testing strategies

                You can implement a production test in many ways. TIP encompasses several techniques and strategies—the method (or methods) you choose might depend on the nature of your product, what you’re testing, and your teams’ resources.

                Picking a tried-and-tested technique helps ensure your experiments have the most impact. It provides a solid framework for getting actionable results, meaning you’re not just testing for the sake of it.

                Here are some of the most effective approaches to help balance testing new functionalities with mitigating risks.

                Incremental rollouts

                You can take an incremental rollout approach rather than releasing your update to all your users at once. Start by exposing the change to a small percentage of your user base (e.g., 5%-10%). If everything looks good, you can gradually ramp up the exposure over time until it’s enabled for everyone.

                Feature flags

                Feature flags let you wrap new elements behind toggles in your product’s code. This enables you to selectively turn on or off the functionality for specific user segments or environments. Based on the group’s response, you can iterate and fix any issues before turning on the new feature for all your users.

                Dark launches

                With dark launches, you can change the production environment but initially keep it disabled (or “dark”) for all users. This enables you to run the code through real user traffic to check for issues without impacting the user experience. Once verified, you can “launch” and enable it or gradually enable it using feature flags.

                A/B testing

                A/B testing works by showing different variants or versions of your product to other groups of users in production. It helps you determine which option performs better to make data-led decisions about which one to roll out fully.

                Canary releases

                A canary release involves deploying a new version of your product to a small percentage of your users first—your “canaries.” This canary group is essentially an “early warning system” for possible bugs or flags, enabling you to make quick adjustments if needed. You monitor their experience before deciding when it’s safe to roll out your update more broadly.

                Chaos engineering

                In this practice, you intentionally introduce failures into your production system to test and ensure its resilience. For instance, you might forcefully switch off your servers to test your failover mechanisms. Chaos engineering helps you find weaknesses before they cause serious issues that impact users and helps you build more robust systems.

                Automated testing

                Using automated frameworks and scripts helps streamline production testing. You run tests automatically, saving time and ensuring consistency across different environments. These are especially useful if you want to test lots of new features or have more complex software.

                Pros and cons of testing in production

                Testing isn’t perfect. While TIP is generally considered a less risky way to launch a product or release an update, it’s not without its downfalls.

                When considering a production test, weighing the pros and cons is crucial to help you decide if it’s the correct tactic.

                You’ll first need to consider:

                • Your project goals
                • The amount of time and resources you have
                • If you can afford to risk sharing test features with real users
                • How stable or “ready” your product is
                • Whether you can roll things back if things start to fail

                Once you’ve addressed those questions, analyze them alongside TIP’s advantages and disadvantages. Here are some of the most significant ones.

                Pros

                • Real-world validation: Testing in the wild (i.e., an actual production environment) accurately represents your product's performance under real-world conditions. This ensures any problems specific to production are found and fixed.
                • Early issue detection: By testing features and changes in production in small amounts or with limited visibility, your team can spot issues early on. You can resolve the errors before they escalate or affect more customers.
                • Faster time-to-market: Testing in production means you can get your product to market more quickly. You eliminate the need for heavy testing in separate environments, accelerating the delivery of new features and updates to users.
                • Better user experience: Testing your product in the live environment helps your team gather real-time user feedback and make iterative improvements. You can listen to your customers and carry out swift changes to meet their needs.
                • Cost-effective: A production test reduces the need for many separate testing environments. Your product doesn’t have to go through these different test stages, helping you keep costs down and preserve your resources.

                Cons

                • Risk of impacting users: Production testing carries the risk of introducing bugs or disruptions that may affect your users’ experience. This might lead to dissatisfied customers or a lack of trust in your product.
                • Lack of control: Unlike more controlled, pre-production testing environments, production environments are dynamic and may involve factors that are difficult to manage or predict. It can be more challenging to replicate specific scenarios or pinpoint any issues.
                • Security concerns: Without proper precautions, conducting tests in the live environment may expose sensitive data or unfinished systems to potential security risks. This threatens the integrity and confidentiality of your product’s software.
                • Compliance and regulation challenges: Testing in production comes with compliance and regulatory concerns, particularly in industries with strict guidelines or requirements surrounding data privacy and security.
                • Reliance on rollback mechanisms: Effective production testing depends heavily on robust monitoring and rollback plans. You must be able to detect errors and revert changes if needed—this might mean investing in more tools.

                Measuring the success of production tests

                Measuring the success of your production test is incredibly important. An overall “success rate” gives your team a solid understanding of how your changes have performed—you can use these insights to highlight areas for improvement, ensuring you provide a positive user experience.

                The things you track and measure will depend on your product and your business's central aims. However, having a consolidated, objective view of the production environment is crucial when testing live updates.

                Some standard metrics to monitor and analyze include:

                • Conversion rates: Look at how your changes affect conversions. After the update, are more users signing up, subscribing, or taking another desired action?
                • Error rates: Watch for increased error rates, crashes, or failures after you release an update. Sudden spikes could indicate that your changes introduced new issues.
                • User engagement: Metrics like session lengths, number of actions, and click-through rates can reveal if the updates are helping or hindering overall user engagement.
                • Response times: Alterations can impact performance. Monitor response times, load times, and latency measurements to ensure the speed of your product hasn’t degraded.
                • User feedback: Gather direct user feedback from users in the testing group through surveys, reviews, and customer support interactions. This qualitative input is invaluable for indicating the success of the product changes and where you need to improve.
                • System performance metrics: Track the CPU usage, memory consumption, network traffic, and other system metrics for signs of inefficiency or scalability issues.

                Business metrics: These are your organization’s bottom line. Link your testing results to your main business KPIs, such as revenue, operational costs, and customer retention.

                Factors to consider when testing in production

                Testing in production offers the chance to get real-world data and feedback—but it also comes with risks and complexities. To properly implement production testing, you must consider several factors and conditions.

                Align with your business goals

                Before beginning your production testing, ensure it aligns with your overall business strategy and priorities. Your efforts should relate to your primary goals, such as improving user satisfaction, increasing revenue, or reducing technical costs.

                Think about the user impact

                While TIP provides real-world data, you still expose some users to potential issues. Let your selected customers know they’re using an experimental feature or update, and be on hand to help if needed.

                Set up monitoring systems

                Robust monitoring, logging, and observability tools are non-negotiable for safe production testing. You need to be able to quickly detect, diagnose, and respond to any issues that arise.

                Establish rollout strategies

                Determine your rollout approach—this might be a canary launch, dark deploy, incremental rollout, or a combination of methods. Define how you’ll progressively share the change with more significant user segments.

                Outline clear rollback procedures

                If your testing uncovers problems, you need reliable mechanisms to deactivate the new code immediately and revert to a stable state with minimal disruption.

                Have clear communication

                Cross-team communication and transparent processes are vital. Establish lines of communication, define who is responsible for which feature, and document your testing protocols.

                Consider your resources and skills

                Assess if your team has the experience, resources, and infrastructure needed to test in production safely and effectively. Don’t overextend beyond your capabilities.

                Encourage continuous improvement

                Analyze the successes and failures from each production testing cycle. Iterate on your methods, tooling, and strategies for continuous improvement over time.

                How can Amplitude support testing in production?

                When testing new features or updates in production environments, having the right tools at your fingertips is essential.

                Amplitude provides valuable capabilities to support secure and robust production testing. You can use the platform to:

                • Run quick A/B tests to dynamically experiment with different user experiences without engineering support. Track precise results to help you understand which option to roll out.
                • Define and track any product event or user action of interest. Monitor how your users use the feature, the flows, conversions, and retention with precise and granular metrics.
                • Analyze your experiment’s results across multiple segments using funnel analysis, behavioral reports, and more. See how different cohorts are reacting to your live experiments.

                By instrumenting Amplitude’s A/B testing platform from the get-go, you gain unparalleled visibility into how your users interact with your product during live production tests.

                This real-world insight enables you to validate changes, quickly spot problems, and ultimately deliver better experiences—your true business goal.

                Obtain genuine data from your actual users during real-life tests. Get started with Amplitude now.

                Testing in production FAQs

                Why test in production and not staging?

                Although staging environments are helpful, they often cannot fully replicate real-world production conditions with live user traffic, data volumes, and unpredictable scenarios. Testing in production enables you to get genuine feedback on how your system performs and how real users experience and use the changes before a general rollout.

                What can go wrong with testing in production?

                There are a few key risks to be aware of:

                • Negative user impact: If not properly controlled, you could expose your users to bugs, downtime, or poor experiences that frustrate them.
                • Data loss or corruption: New code could corrupt or delete production data if problems exist.
                • System outages or performance issues: Faulty updates can cause system downtime or degrade user performance.
                • Compliance violations: Regulations or policies in some industries may restrict live testing on production users.

                This is why having solid processes, monitoring, team controls, and rollback methods are essential for safe production testing. With proper procedures, these risks can be managed and minimized.