代码人生

Data Consistency in Microservices Architecture

代码人生 http://www.she9.com 2018-09-18 11:29 出处:网络 编辑:@技术狂热粉
Inmicroservices,onelogicallyatomicoperationcanfrequentlyspanmultiplemicroservices.Evenamonolithicsystemmightusemultipledatabasesormessagingsolutions.Withseveralindependentdatastoragesolutions,weriskin

在微服务中,一个逻辑原子操作常常可以跨越多个微服务。即使是单体系统也可能使用多个数据库或消息传递解决方案。对于几个独立的数据存储解决方案,如果某个分布式流程参与者失败了,我们就会面临数据不一致的风险——例如,在没有下订单的情况下向客户收费,或者没有通知客户订单成功。


分布式过程失败:


Data Consistency in Microservices Architecture

在本文中,我将与大家分享我学到的一些技术,这些技术可以使微服务之间的数据最终保持一致(eventually consistent)。

为什么实现这个目标如此具有挑战性?只要我们有多个存储数据的地方(不是一个数据库),一致性就不会自动得到解决,工程师在设计系统时需要注意一致性。就目前而言,在我看来,业界还没有一个广为人知的解决方案,可以在多个不同的数据源中自动更新数据——而且我们可能不应该等待一个很快可用的解决方案。


以一种自动化的、没有麻烦的方式解决这个问题的一种尝试是XA原生建模两阶段提交(2PC)模式。但是在现代的大规模应用程序中(尤其是在云环境中),2PC的性能似乎并没有那么好。为了消除2PC的缺点,我们不得不根据不同的需求以不同的方式覆盖一致性。


Saga 模式

在多个微服务中处理一致性问题的最著名的方法是Saga模式。您可以将Sagas视为多个事务的应用程序级分布式协调。根据用例和需求,您可以优化自己的Saga实现。相反,XA协议试图涵盖所有场景。这种传奇模式也并不新鲜。过去,ESB和SOA体系结构中都知道并使用它。最后,它成功地过渡到微服务世界。跨越多个服务的每个原子业务操作可能包含技术级别的多个事务。传奇模式的关键思想是能够回滚单个交易中的一个。正如我们所知,对于已经提交的开箱即用的单个事务,回滚是不可能的。但这是通过调用补偿操作来实现的——通过引入“取消”操作。

Compensating operations:

Data Consistency in Microservices Architecture

除了取消外,您还应该考虑将服务设置为幂等,以便在出现故障时可以重试或重新启动某些操作。应监测故障,对故障的反应应积极主动。

Reconciliation

如果在流程的中间,负责调用补偿操作的系统崩溃或重新启动,该怎么办?在这种情况下,用户可能会收到一条错误消息,并且应该触发补偿逻辑,或者在处理异步用户请求时,应该恢复执行逻辑。


主要失败流程:


Data Consistency in Microservices Architecture


要找到崩溃的事务并恢复操作或应用补偿,我们需要协调来自多个服务的数据。对从事金融领域工作的工程师来说,和解是一种熟悉的技术。你有没有想过,银行是如何确保你的转账没有丢失的,或者在两个不同的银行之间转账是如何发生的?快速的答案是和解。

Data Consistency in Microservices Architecture

在会计中,对账是确保两套记录(通常是两个账户的余额)一致的过程。对帐是用来确保离开帐户的钱与实际花费的钱相匹配。这是通过确保在特定会计期末余额匹配来完成的  —  Jean Scheid, “Understanding Balance Sheet Account Reconciliation,” Bright Hub, 8 April 2011

回到微服务,使用相同的原则,我们可以在某个动作触发器上协调来自多个服务的数据。当检测到故障时,可以在计划的基础上或由监视系统触发操作。最简单的方法是逐条比较。这个过程可以通过比较聚合值来优化。在这种情况下,其中一个系统将成为每个记录的真实来源。

Event Log

试想一下多步交易。如何在对账期间确定哪些事务可能失败,哪些步骤失败?一种解决方案是检查每个事务的状态。在某些情况下,该功能不可用(想象一下,一个无状态的邮件服务发送电子邮件或生成其他类型的消息)。在其他一些情况下,您可能希望立即获得事务状态的可见性,特别是在包含许多步骤的复杂场景中。例如,预订机票、酒店和转账的多步骤订单。


Complex distributed process:


Data Consistency in Microservices Architecture


在这些情况下,事件日志可以提供帮助。日志记录是一种简单但功能强大的技术。许多分布式系统依赖于日志。“预写日志记录”是数据库如何实现事务行为或在内部维护副本之间的一致性。同样的技术也适用于微服务设计。在进行实际的数据更改之前,服务将记录其更改意图的日志条目。实际上,事件日志可以是协调服务拥有的数据库中的表或集合。


Sample event log:


Data Consistency in Microservices Architecture


The event log could be used not only to resume transaction processing but also to provide visibility to system users, customers, or to the support team. However, in simple scenarios a service log might be redundant and status endpoints or status fields be enough.

Orchestration vs. Choreography

By this point, you might think sagas are only a part of orchestration scenarios. But sagas can be used in choreography as well, where each microservice knows only a part of the process. Sagas include the knowledge on handling both positive and negative flows of distributed transaction. In choreography, each of the distributed transaction participants has this kind of knowledge.

Single-Write With Events

The consistency solutions described so far are not easy. They are indeed complex. But there is a simpler way: modifying a single datasource at a time. Instead of changing the state of the service and emitting the event in one process, we could separate those two steps.

Change-First

In a main business operation, we modify our own state of the service while a separate process reliably captures the change and produces the event. This technique is known as Change Data Capture (CDC). Some of the technologies implementing this approach are Kafka Connect or Debezium.


Change Data Capture with Debezium and Kafka Connect:


Data Consistency in Microservices Architecture


However, sometimes no specific framework is required. Some databases offer a friendly way to tail their operations log, such as MongoDB Oplog. If there is no such functionality in the database, changes can be polled by timestamp or queried with the last processed ID for immutable records. The key to avoiding inconsistency is making the data change notification a separate process. The database record is, in this case, the single source of truth. A change is only captured if it happened in the first place.


Change Data Capture without specific tools:


Data Consistency in Microservices Architecture


The biggest drawback of change data capture is the separation of business logic. Change capture procedures will most likely live in your codebase separate from the change logic itself — which is inconvenient. The most well-known application of change data capture is domain-agnostic change replication such as sharing data with a data warehouse. For domain events, it’s better to employ a different mechanism such as sending events explicitly.

Event-First

Let’s look at the single source of truth upside down. What if instead of writing to the database first we trigger an event instead and share it with ourselves and with other services. In this case, the event becomes the single source of truth. This would be a form of event-sourcing where the state of our own service effectively becomes a read model and each event is a write model.

Event-first approach:


Data Consistency in Microservices Architecture

On the one hand, it’s a command query responsibility segregation (CQRS) pattern where we separate the read and write models, but CQRS by itself doesn’t focus on the most important part of the solution — consuming the events with multiple services.

In contrast, event-driven architectures focus on events consumed by multiple systems but don’t emphasize the fact that events are the only atomic pieces of data update. So I’d like to introduce “event-first” as a name to this approach: updating the internal state of the microservice by emitting a single event — both to our own service and any other interested microservices.

The challenges with an “event-first” approach are also the challenges of CQRS itself. Imagine that before making an order we want to check item availability. What if two instances concurrently receive an order of the same item? Both will concurrently check the inventory in a read model and emit an order event. Without some sort of covering scenario, we could run into troubles.

The usual way to handle these cases is optimistic concurrency: to place a read model version into the event and ignore it on the consumer side if the read model was already updated on the consumer side. The other solution would be using pessimistic concurrency control, such as creating a lock for an item while we check its availability.

The other challenge of the “event-first” approach is a challenge of any event-driven architecture — the order of events. Processing events in the wrong order by multiple concurrent consumers might give us another kind of consistency issue, for example processing an order of a customer who hasn’t been created yet.

Data streaming solutions such as Kafka or AWS Kinesis can guarantee that events related to a single entity will be processed sequentially (such as creating an order for a customer only after the user is created). In Kafka for example, you can partition topics by user ID so that all events related to a single user will be processed by a single consumer assigned to the partition, thus allowing them to be processed sequentially. In contrast, in Message Brokers, message queues have an order but multiple concurrent consumers make message processing in a given order hard, if not impossible. In this case, you could run into concurrency issues.

In practice, an “event-first” approach is hard to implement in scenarios when linearizability is required or in scenarios with many data constraints such as uniqueness checks. But it really shines in other scenarios. However, due to its asynchronous nature, challenges with concurrency and race conditions still need to be addressed.

Consistency by Design

There many ways to split the system into multiple services. We strive to match separate microservices with separate domains. But how granular are the domains? Sometimes it’s hard to differentiate domains from subdomains or aggregation roots. There is no simple rule to define your microservices split.

Rather than focusing only on domain-driven design, I suggest to be pragmatic and consider all the implications of the design options. One of those implications is how well microservices isolation aligns with the transaction boundaries. A system where transactions only reside within microservices doesn’t require any of the solutions above. We should definitely consider the transaction boundaries while designing the system. In practice, it might be hard to design the whole system in this manner, but I think we should aim to minimize data consistency challenges.

Accepting Inconsistency

While it’s crucial to match the account balance, there are many use cases where consistency is much less important. Imagine gathering data for analytics or statistics purposes. Even if we lose 10% of data from the system randomly, most likely the business value from analytics won’t be affected.


Sharing data with events:


Data Consistency in Microservices Architecture


Which Solution to Choose

Atomic update of data requires a consensus between two different systems, an agreement if a single value is 0 or 1. When it comes to microservices, it comes down to the problem of consistency between two participants and all practical solutions follow a single rule of thumb: In a given moment, for each data record, you need to find which data source is trusted by your system.

The source of truth could be events, the database or one of the services. Achieving consistency in microservice systems is the developers’ responsibility. My approach is the following:

  1. Try to design a system that doesn’t require distributed consistency. Unfortunately, that’s barely possible for complex systems.

  2. Try to reduce the number of inconsistencies by modifying one data source at a time.

  3. Consider event-driven architecture. A big strength of event-driven architecture in addition to loose coupling is a natural way of achieving data consistency by having events as a single source of truth or producing events as a result of change data capture.

  4. More complex scenarios might still require synchronous calls between services, failure handling, and compensations. Know that sometimes you may have to reconcile afterward.

  5. Design your service capabilities to be reversible, decide how you will handle failure scenarios and achieve consistency early in the design phase.

I will be sharing more thoughts on this topic at Voxxed Days Microservices in Paris. Join us!


请关注公众号:程序你好
0

精彩评论

暂无评论...
验证码 换一张
取 消