Suppose you’re working on a software system and need to implement user-logins using a userID and password. After a bit of research, you narrow it down to two possible options. Your first option is to implement it yourself – hash/salt the password, store it in a database that you manage, and use it to authenticate users when they login. Your second option is to use an external SaaS solution (eg, Firebase Auth) – integrate this 3rd party solution into your software system, and allow it to handle things like password storage and verification.
For the purposes of this example, let’s assume that your team already knows how to hash/salt passwords, manage it securely, and can build a custom login very easily and simply. Whereas onboarding onto Firebase will require greater upfront costs and complexity to integrate it into your system. But on the flip side, if you decide to enable social media logins or phone logins in future, it would be trivial if you’re using a SaaS solution like Firebase, and a lot more work if you have a custom-built solution. Which option should you choose?
I realize it’s easy to dive into the weeds here and bikeshed on the best way to handle user logins. But let’s step back and look at the bigger picture. The above is just one example of a more generalizable dilemma. As developers, we often find ourselves having to choose between multiple design alternatives – some that are simpler and easier to implement now, whereas others make it simpler and easier to address specific future needs that may or may not arise. What is the best way to resolve such dilemmas?
Catchphrases and Cargo Cults
This is a very common dilemma, and one that I’ve seen numerous guidelines and heuristics attempt to address.
The most popular of which is YAGNI – “You ain’t gonna need it.” Do not “introduce extra complexity now that you won’t take advantage of until later.” Ie, don’t design around future needs.
Conversely, I’ve also come across numerous best practices that recommend planning upfront for future needs. Best practices such as:
- “You should design for extensibility, because the hallmark of a good design is how easily it can accommodate future changes“
- “Your API should not expose implementation details, because that will limit your ability to evolve your implementation in future“
- “You should decouple your business logic from your storage accessors, so that you can easily modify storage implementation in future“
- “You should replace auto-incrementing integer IDs with UUIDs, because it will enable you to scale horizontally in future“
- “Your backend servers should be stateless, so that you can scale-out your fleet easily in future“
- “You should never embed/assume physical-location information (eg, shard ID) from resource identifiers, because that may change in future“
- “Never store booleans – store timestamps instead“
When looked at in isolation, each of the above sounds great. It would be great to build designs today that will simplify future changes. But how do we square this with the earlier YAGNI recommendation? When should we build something slightly more complex today, so as to potentially make our lives easier tomorrow? Given the conflicting advice, how are we to know when to use which guideline?
Is there a way to get past shiny catchphrases, and build a more rigorous framework that we can rely on?
In an attempt to move past heuristics, here’s a more rigorous methodology that solves the above problem.
Given a feature F1,
Implemented using design Dx,
The cost of implementing F1 using Dx, is given as C1x.
Here we are using the term “cost” to include man-hours, dollars, risks, time-discounting, and any other form of “cost” you care about.
Given a second feature F2,
which has probability P2 of being needed in future,
whose incremental cost of implementing with design Dx is given as C2x,
The combined cost of choosing Dx is given by:
Ie, the cost of building the first feature, along with the incremental cost of building the second feature, multiplied by the probability that you will actually need to do so. This is essentially the same formula used to compute expected values.
Generalizing further to a complete feature set F2, F3, F4…
Each needed with probability P2, P3, P4….
And incremental costs C2x, C3x, C4x…
The net cost of choosing Dx is given by:
Whereas for a competing design Dy,
Where the incremental costs of implementing the same features are given by C1y, C2y, C3y ….
The net cost of using Dy is given by:
Given the above formulation, the ideal design is now clear – you should pick the one with the lowest net cost. This is the law alluded to in the title.
Putting It Into Practice
Despite the mathematical formulation, there is obviously no way to compute the precise probability and cost associated with each feature. It is meaningless to debate whether the future cost of a feature is “3.1” as opposed to “3.2”. The real value of the above framework is in giving us a better basis for discussion, and a mental model to use when making design decisions.
Rather than throwing out catchphrases or arbitrary cargo-culted best-practices, we can instead have a more productive discussion around:
- The various different features that need to be considered
- The associated likelihood of needing to implement each feature
- The ease of implementing each feature, using a given design
As mentioned earlier, the goal is not to come up with an exact numerical value for each of the above. But rather, produce a ballpark “T-Shirt size” estimate. Estimates which can then be used to compare relative costs for competing designs.
In order to better draw up a list of potential future features to consider, and the likelihood associated with each, it helps immensely to have domain knowledge over the problem your product is attempting to solve, and the technologies you’re using to solve it.
The very first step would be to look at the documented product roadmap. Roadmaps are notorious for overpromising, but it is still good to estimate a significant likelihood for each item in your short-term roadmap. It also helps to consider the candidates that were considered but didn’t quite make the cut – these should be given a lower but non-zero likelihood.
The next step would be to consider similar products, the features they support, and the technical changes they have made to support scaling and performance requirements. For example, if you’re building a social media platform, you can review the most popular features on Facebook or Reddit, as inspiration for features that you may find yourself having to build in future.
Looking beyond product features, it also helps to consider technical challenges that commonly arise in other technology companies, and the work that is needed to address them. For example, social-media platforms and e-commerce platforms are completely different from a product perspective – but they share a great deal in common when it comes to service-oriented-architectures, and technical challenges with scaling, performance and monitoring – as well as the solutions needed to address them. By learning from the problems that other software developers have run into, you can gain insights into the future problems and improvements that might be relevant to your own project as well.
A quick word here on mature software projects as compared to those that are still finding a “product market fit.” If you’re working at a startup or on a greenfield project, there is a very significant chance that your company or project will be shuttered permanently in the near future. Similarly, if you’re working on an experimental feature for an established product – there is a significant chance that the feature and all downstream work will be abandoned, due to poor user reviews or A/B testing results. These should most definitely be factored into your probability estimates. Ie, if you’re a US-focused seed stage startup, the probability of needing to implement GDPR compliance is far lower, as compared to mature products with a large international user base.
Once you’ve identified the set of “features” to consider, estimating costs for various designs becomes a more technical matter. This is certainly very domain-specific and design-specific, so it is hard to give universal advice here. But there are numerous lessons that people have painfully learnt in the past decades, which can serve us well here.
Certain changes are painfully hard to do, and incur significant costs. For example:
- Rewriting your system from one language to another (eg, C++ -> Rust)
- Migrating your system from one framework to another (eg, Angular -> React)
- Removing or modifying the behavior of APIs that are used by a large number of external users (eg, Python2 -> 3)
If there is a significant probability of you needing to do the above, you’re better off picking an alternative design that avoids it entirely. Ie, starting off with React instead of migrating to it later.
There are also certain types of changes that are messy and hard to do. For example:
- Migrating databases (eg, Postgres -> DynamoDB)
- Changing the schema of the data in your database (eg, auto-incremented Integer -> UUID)
- Changing key design patterns and assumptions used widely in your application (eg, migrating from sticky-stateful services to stateless services)
These changes are more doable, but they are certainly not fun. If there is a significant chance of needing to implement them in future, it is generally better to do it upfront, though it does vary from case-to-case.
And finally, there are certain types of changes that are actually easy to implement. For example:
- Changing the code implementation within a reasonably sized class, without changing its API
- Extending an API by expanding access, adding new endpoints, or adding new optional parameters
- Creating a new table in your database
- Adding new optional columns to an existing table in your database
It’s still context dependent, but as a general rule, the costs associated with the above changes are much lower. Using a heuristic like YAGNI is far more reasonable when discussing such changes, where the associated costs aren’t a major concern.
A quick word here about startups and “prototype projects” – costs should be heavily time-discounted for projects that are expected to grow rapidly in size. If you’re currently working at a seed-stage startup, and the startup later grows into a decacorn like Lyft, you’ll need to make numerous changes that are extremely costly, in order to handle the required load and functionality. However, the “startup” will also have many orders of magnitude more engineers. Hence, the actual cost to the organization is a lot more bearable.
Whereas if you attempted to implement those same changes upfront in a seed-stage startup, even if the absolute cost ends up being lower, it can still overwhelm and bog down a small engineering team. Hence why it is important to keep in mind current capacity constraints, and future capacity expansions, when evaluating costs.
No Substitute for Judgement
It is very tempting to cling to catchphrases or cargo-culted best practices. They promise a simple and universal answer, without the need for messy judgement calls.
To their credit, aphorisms like YAGNI are very useful as a way to curb our tendency to overengineer things. And popular best practices from larger companies help make us aware of the problems we may run into in future, and how we can solve them. But when it comes time to make an actual decision for your project, these general suggestions will need to be adapted to fit your specific circumstances.
There will arise occasions when you will have to choose between planning ahead, or doing the simplest possible thing today. When such dilemmas arise, there is ample role for judgement and experience in foreseeing future use-cases, and estimating their likelihood and cost. Use these as inputs to guide your design decisions.
Discussion thread on HackerNews
One thought on “The Law of Net Design Costs”