Friday Deployments
When discussing Friday deployments, you will often find two groups fiercely arguing for or against it. Interestingly, many opinions don’t directly address the risk aspect but instead use proxies for it. Let’s take a closer look at how we can think about Friday deployments differently.
Why people do not want to deploy on Fridays
The number one complaint about Friday deployments is that if there’s a problem, you will need to stay late to fix it. Maybe the fix or rollback will take a day or two and ruin your weekend too.
Others say it gives more time to test the code and thus reduces the risk of deploying bugs. Really? If you think the code is not ready, then do not deploy! Just because you can release on a Friday does not mean you have to.
I believe these arguments are missing the point, as the number one question is: how does the deployment or the occasional error affect the users? The ultimate goal is managing risk for users, not developers. This does not mean that I do not care about the developers. At the same time, we engineers work at the company, so it is much easier to communicate about the issue or even come to a compromise. If you need to stay late, you can take the next day off. As with on-call, there can be monetary compensation for your extra time. Or you might simply decide that if this happens twice a year, it is part of the job - this is how I think about it.
I also think that, in general, getting into the habit of postponing deployments has detrimental effects:
- Many organizations start on Monday with planning and organizing for the next week or sprint, which means the code will get into production Monday afternoon or only Tuesday morning.
- If we can refer to risk on Friday afternoon, then what about other afternoons? Maybe we should not deploy on any day after 2pm?
- While people do not work on weekends, systems do not sleep: data collection and measurements are running during the weekend as well. If you are running an A/B test and deploy on Friday, by Monday you might have statistically significant results.
- Your team/company loses any benefits the deployment would bring (better signup rates, fewer errors, improving renewals, fixing a bug). From Friday afternoon to Monday afternoon there are 3 days… 1% of the year… for every change that could have been deployed on Friday.
- Robs you of the feeling of accomplishment at the end of the week, which is a natural stopping point for us humans.
Of course, there are exceptions when you simply cannot avoid deployment freezes — not necessarily on Fridays. Just a couple examples:
- There is a special law or contract for your area that does not let you change systems at any time.
- Your customer wants manual approval for the change, and they refuse to give approvals on Friday.
- A special event is coming up for the company: the release of a new product line, the Super Bowl (NFL), or the Champions League Final (soccer) is happening soon, etc.
- Trading companies do not risk changes during market hours. If the system’s throughput is lower, it affects their revenue directly, so they might deploy only on Friday afternoon or during the weekend.
Thus, as with almost any topic, let’s not be dogmatic. It’s better if we look at data and measurements.
How to become confident in your deployments
In order to see how you are doing deployment-wise in our industry, the DORA metrics are useful once again. Focus on two kind of opposing metrics for this area: deployment frequency + lead time and change fail rate. They mean how many times you can release and how long it takes for a change from commit to production and how many of the changes result in an error.
Notice that these metrics keep each other in check: if you want to deploy quickly and frequently, chances are there will be more errors. This is a great example of why you should never concentrate on a single metric, as you will not achieve the desired results. Once you begin measuring these metrics, it becomes much easier to discuss the risks and eliminate any unfounded fears that might prevent you from proceeding with deployments on a given Friday.
It could also indicate that the company is not performing at the level you hoped for. These metrics can provide you with high-level goals (e.g., reducing lead time from 6 hours to 2 hours) and allow you to identify what is hindering you from reaching those goals. If an average deployment takes just 2 hours, then a Friday deployment is not so daunting anymore.
There are technical and organizational improvements you can implement. If some are missing, it makes sense to avoid risky deployments until they are addressed. Usual suspects include:
- Testing: Are you confident the release is bug-free and thoroughly tested? Automated tests help a great deal here, though don’t forget manual exploratory testing, which complements automation very well.
- Staging / pre-production environment: Many problems arise only with production traffic or production-size data. Simulate that in an environment as similar to production as possible.
- Automated deployment: If you can put code live with a single click or at least without manual steps, it avoids manual errors and speeds things up. Rollbacks also become automated, and that helps with Mean Time to Resolve issues.
- Observability: Are you confident you will notice if there is a bug in production? Invest in your monitoring system, observed metrics, and SLO/SLA definitions to understand how deployments affect your systems and users.
- Complex changes: If you are deploying a large amount of code and other changes at once, debugging which change is causing problems can be tough. Try to break up the changes into smaller pieces, use feature flags to turn parts of the code on/off, and have emergency switches to isolate errors. This might be a non-technical change you can popularize within the product organization.
- Technical architecture: If every change is complex and affects all features, like in the case of a monolith application, think about how you could decouple at least parts of the system.
Overall, I believe if you set a goal to deploy confidently on a Friday (maybe even in the afternoon 😃), it will make your organization more effective. These changes are long-lasting, and you reap the benefits with every deployment.
To show that some people deploy on Fridays for unexpected reasons, here is a short story:
Story time (from Hacker News)
I realized that quite a few of the team have a strange habit of deploying around 5-6PM on Fridays! What I found out was interesting. They were young, single, and don’t really seem to be those looking out to go to bars, etc. By deploying late, and staying over-time — the company’s rule of “company-sponsored-dinners” after office-time kicks in. They get free dinner, and air-conditioned room which are better than their homes or their room in their shared flats. They anyway ends up watching videos/movies, have dinner, and go back home late. It is not uncommon, in India, to be in offices which are much better than their homes.
Great example of how incentives can work in mysterious ways, and the world doesn’t always work how you expect.
Good luck next Friday!