I start a series of blog posts with summaries about this interesting book: Designing Data-Intensive Applications
What is a data-intensive application?
It s an application where raw CPU power is rarely a limiting factor and the problems are the amount of data, the complexity of data, and the speed at which it changes. It is built from standard building blocks that provide commonly needed functionality.
In this chapter, we see the fundamentals of what we are trying to achieve.
Reliability
The things that can go wrong are called faults, and systems that anticipate faults and can cope with them are called fault-tolerant or resilient.
Reliability means making systems work correctly, even when faults occur. Faults can be in hardware, software, and humans. Fault-tolerance techniques can hide certain types of faults from the end user.
The system should continue to work correctly even in the face of adversity: hardware or software faults, and even human error!
A fault is usually defined as one component of the system deviating from its spec, whereas a failure is when the system as a whole stops providing the required service to the user. It is impossible to reduce the probability of a fault to zero; therefore it is usually best to design fault-tolerance mechanisms that prevent faults from causing failures.
Scalability
Scalability means having strategies for keeping performance good, even when load increases.
In a scalable system, you can add processing capacity in order to remain reliable under high load.
Scalability is the term we use to describe a system’s ability to cope with increased load.
Load can be described with a few numbers which we call load parameters. The best choice of parameters depends on the architecture of your system: it may be requests per second to a web server, the ratio of reads to writes in a database, the number of simultaneously active users in a chat room, the hit rate on a cache, or something else. Perhaps the average case is what matters for you, or perhaps your bottleneck is dominated by a small number of extreme cases.
Latency and response time are not the same thing. The response time is what the client sees: besides the actual time to process the request (the service time), it includes network delays and queueing delays. Latency is the duration that a request is waiting to be handled during which it is latent, awaiting service
It’s common to see the average response time of a service reported. However, the mean is not a very good metric if you want to know your “typical” response time, because it doesn’t tell you how many users actually experienced that delay. Usually it is better to use percentiles.
This makes the median (p50) a good metric if you want to know how long users typically have to wait: half of user requests are served in less than the median response time, and the other half take longer.
High percentiles of response times, also known as tail latencies, are important as they directly affect users’ experience of the service. Customers with the slowest requests are often those who have the most data on their accounts because they have made many purchases. On the other hand, optimizing the p99.99 can be too expensive and to not yield enough benefit.
it only takes a small number of slow requests to hold up the processing of subsequent requests an effect sometimes known as head-of-line blocking.
Scaling up (vertical scaling, moving to a more powerful machine) and scaling out (horizontal scaling, distributing the load across multiple smaller machines).
Distributing load across multiple machines is also known as a shared-nothing architecture. A system that can run on a single machine is often simpler, but high-end machines can become very expensive, so very intensive workloads often can’t avoid scaling out. In reality, good architectures usually involve a pragmatic mixture of approaches: for example, using several fairly powerful machines can still be simpler and cheaper than a large number of small virtual machines.
Some systems are elastic, meaning that they can automatically add computing resources when they detect a load increase, whereas other systems are scaled manually (a human analyzes the capacity and decides to add more machines to the system).
While distributing stateless services across multiple machines is fairly straightforward, taking stateful data systems from a single node to a distributed setup can introduce a lot of additional complexity. For this reason, common wisdom until recently was to keep your database on a single node (scale up) until scaling cost or high-availability requirements forced you to make it distributed.
An architecture that scales well for a particular application is built around assumptions of which operations will be common and which will be rare.
Maintainability
Maintainability is making life better for the engineering and operations teams who need to work with the system. Good abstractions can help reduce complexity and make the system easier to modify and adapt for new use cases. Good operability means having good visibility into the system’s health, and having effective ways of managing it.
It is well known that the majority of the cost of software is not in its initial development, but in its ongoing maintenance.
See you in next chapters summaries 🙂