The Importance of Service Repositories
When an engineering organization grows above the size where you know everyone and know what they are working on, it becomes non-trivial to look up information about the infrastructure and services, and APIs the organization is offering for everyone within the company. The same thing happens with time too, in which case the company does not need to be big, but there might be a lot of old, proprietary systems. If personnel change a lot, or you need to onboard many people in a short period of time - similar issues arise.
Here are a couple of questions that I encountered in similar situations:
- How do I add a new user in our system? How do I make a payment / create a subscription? What to do to open a new document to edit?
- Who owns any of the above-mentioned APIs? In which service to I find the API?
- Who owns a given service? Where can I find its tests / deployment / CI jobs?
- The heck … how many services do we have?
- How many services do we still have on python2 / Ubuntu precise / a solution we deprecated 5 years ago?
Enter the service repository1
First at Prezi, we wanted to experiment with the idea - and the first problem we wanted to solve was a tech debt repository by services. This made sense in a microservices environment, but you can think about any other kind of software component as well - as we will see later, we also extended types of components covered. Maintaining the list of services manually seemed to be non-scalable, so we wrote information collector agents (simple scripts), that contributed into a large blob of JSON, then tried to correlate all gathered information from GitHub, CI (Jenkins), AWS and so on. It brought up some issues with inconsistent naming / tagging like e.g.: authservice, auth-service and authentication used to denote the same service or its dependents (like used DBs, caches etc.).
Finally, we could compile a list of services and their information every day automatically, clear data about our systems, improve our processes and service-generator scripts and have a solution, that was used in all kinds of different ways:
- knowing where you are with a migration, tracking changes
- helping to plan and estimate the cost of a migration
- being able to nudge the teams owning deprecated infra/services/APIs to make an effort to sunset them
- know the owners at a central place
- collect information about how far behind a given service is from our ideal state (the “golden image”) - basically a tech debt score
It was great to see, that we did not anticipate all the above use-cases, but as we finally got a trustable source of information, we began to use it in creative ways. And because we wanted to trust the data, we also made an effort to check and clean (missing or uncategorized services were the two most common).
Although hindsight is 20/20, what we got right at first:
- Visualization helps a lot, communicating about issues is much easier
- A central place to search information for and URLs to send around the company
- Having declarative descriptors (YAML files) in all service repositories, on top the whole platform can depend on
- Automatically collecting information if possible ➡️ auto-updates make it feel less out-of-date, new services show up automatically
- Having a way to export data (simple CSV, but many times it helped to put that into a spreadsheet)
What we did not get right (or simply did not see at the beginning):
- As a first step, a proprietary system is great, and you get results quickly, but over time it becomes a product itself. With a limited sized team, you will not have the bandwidth to actively develop it.
- Focused on one aspect, as many things revolve around these software components, extensibility is key. APIs, documentation, monitoring, deployment, frontend components and their needs are also very valid aspects and should have a repository.
- I did not lobby enough for shared contributions from other teams and more buy-in from developers. While the original goal was not necessary company-wide usage, it was clear over time, what such a system should be useful enough for everyone to have motivation to contribute.
- User interface design was usual developers for developers, kinda crude, good for a prototype, but not enough later.
It is one more step to generalize onto the next level after you have the repository for services: any other entity might be just as important for you. The list of libraries you are using and maybe if they are not up-to-date. All the frontend components you are maintaining, or the design system elements you have in place. All the nodes / on-premise computers / software installations.
With all the other areas to cover, we got to the idea of Internal Developer Portals (though we did not use the term this way).
Can we do better than a simple repository?
While rolling your own is an option, as we had an internal system before, we thought it would be great to use something that is a little more standard and also developed outside the company too. We decided using Backstage, which was around for a year at that time and seemed to solve our needs. Also take a look at Roadie or Port as the market for Internal Developer Portals (IDP) is evolving rapidly.
An IDP gives you the chance to tie your different purpose systems or portals together - you have a wide range of component types and list of components to be tracked, and it is very easy to add links to CI jobs, monitoring (like Grafana charts), alerts, slack room for the owners, list of APIs supported with links to try it out. There is virtually infinite ways to use such a portal, and of course design it for your most common use-cases first and reap the most benefits with the least amount of work.
Apart from this, keep in mind: your infra and services is not a fixed world. Services are created and removed all the time. They also change ownership. New systems are introduced for building, monitoring and deploying services. Being able to follow these changes is just as important as creating the repository in the first place. Having outdated or outright bad information can cause more trouble than not having information.
It is a company-wide effort to keep all the data and information clean, in a ready-to-use state. Do not be afraid to become a marketer for your newly developed portal. Continually and consistently repeat the message (David Marquet: Turn the ship around) - it is not enough the state the goal of having an up-to-date developer portal. You, as the owner of the service, will think about it all the time, but no one else will think about it as much as you do. They need constant reminding, and it is your responsibility to do that. It can take form e.g. as a Slack notification, team newsletter or a regular bug bash session for the portal itself. Find something that works for you and your teams.
Overall, such a portal is helping information retrieval, which can be surprisingly difficult or slow even with 50 people. Better information means faster and most probably better decisions, and the return on investment because of better decisions will pay for the cost of creating and maintaining such a system many times over.
While writing this article, I found this post an interesting read:
Good luck with building your own Internal Developer Portal!
Footnotes
-
I use service repository as a name for a solution, a kind of product within the company. It is not the same as the service-repository or simply repository design pattern, which is designed to abstract the data layer away for the service while also centralizing the access of domain objects. ↩︎