在现代分布式系统中,确保数据的一致性和可靠性是核心挑战之一。当一项操作需要跨越多个独立的计算节点(例如,不同的数据库实例、微服务或存储系统)时,我们就面临着分布式事务的问题。分布式事务的目标是维护ACID特性(原子性、一致性、隔离性、持久性),尤其是在面对网络延迟、节点故障等不确定因素时,确保事务的原子性——即所有参与者要么全部成功提交,要么全部失败回滚。
分布式事务的基石与CAP定理的权衡
要深入理解三阶段提交(3PC),我们首先需要回顾分布式系统中的一个基本原则:CAP定理。CAP定理指出,在一个分布式系统中,不可能同时满足一致性(Consistency)、可用性(Availability)和分区容忍性(Partition Tolerance)这三个特性,最多只能同时满足其中两个。
在分布式事务的语境下,我们通常会优先追求强一致性(Consistency)和分区容忍性(Partition Tolerance),这意味着在网络分区发生时,系统可能会牺牲一部分可用性。而像两阶段提交(2PC)和三阶段提交(3PC)这样的协议,正是为了在分布式环境中尽可能地实现事务的原子性和强一致性而设计的,它们在不同程度上对可用性做出了权衡。
两阶段提交(2PC):原理与困境
两阶段提交(Two-Phase Commit, 2PC)是实现分布式事务原子性最经典的协议。它涉及一个协调者(Coordinator)和多个参与者(Participants)。
2PC的核心思想
2PC通过两个阶段来确保所有参与者对事务的最终结果达成一致:
-
提交请求阶段 (Prepare Phase / Vote Phase)
- 协调者向所有参与者发送
PREPARE消息,询问它们是否能够提交事务。 - 每个参与者接收到
PREPARE消息后,会执行事务的准备工作(例如,将操作写入日志,但不真正提交),并锁定相关资源。 - 如果参与者能够成功准备,它会向协调者发送
VOTE_YES消息;如果不能(例如,资源不足、冲突),则发送VOTE_NO消息。
- 协调者向所有参与者发送
-
提交执行阶段 (Commit Phase / Execution Phase)
- 协调者收集所有参与者的投票。
- 如果所有参与者都投了
VOTE_YES: 协调者向所有参与者发送GLOBAL_COMMIT消息,指示它们正式提交事务。参与者收到后执行提交操作,并释放资源,然后向协调者发送ACK消息。 - 如果有任何一个参与者投了
VOTE_NO,或者协调者在等待投票时超时: 协调者向所有参与者发送GLOBAL_ROLLBACK消息,指示它们回滚事务。参与者收到后执行回滚操作,并释放资源,然后向协调者发送ACK消息。
2PC的阻塞问题
尽管2PC能够保证事务的原子性,但它存在严重的阻塞问题,这使得它在生产环境中并不总是理想的选择:
-
协调者单点故障:
- 故障点1:在提交执行阶段,协调者发送
GLOBAL_COMMIT消息之后,但部分参与者尚未收到消息就崩溃。 此时,已收到消息的参与者会提交事务,而未收到消息的参与者则会一直处于PREPARED状态,无限期地等待协调者的指令。它们持有的资源将一直被锁定,导致其他事务无法访问,形成阻塞。 - 故障点2:在提交请求阶段,协调者收集投票时崩溃。 此时,参与者可能已经进入
PREPARED状态并锁定了资源。如果协调者无法恢复,参与者将无法得知最终结果,也可能导致资源永久锁定。
- 故障点1:在提交执行阶段,协调者发送
-
网络分区:
- 如果网络发生分区,协调者可能无法与部分参与者通信。这可能导致协调者无法收集到所有投票,或者无法将最终的提交/回滚指令传递给所有参与者,从而导致事务卡住或不一致。
-
同步阻塞: 2PC是一个同步协议。在等待所有参与者响应的过程中,协调者会一直阻塞。同样,参与者在
PREPARED状态下也会阻塞,直到收到协调者的最终指令。
2PC 伪代码示例
为了更好地理解2PC的运作和其阻塞点,我们用伪代码来模拟协调者和参与者的行为。
# 定义事务状态
class TransactionState:
INITIAL = "INITIAL"
PREPARED = "PREPARED"
COMMITTED = "COMMITTED"
ROLLED_BACK = "ROLLED_BACK"
ABORTED = "ABORTED" # 用于内部中止
# 参与者模拟
class Participant:
def __init__(self, id):
self.id = id
self.state = TransactionState.INITIAL
self.locked_resources = [] # 模拟锁定的资源
def prepare(self, transaction_id):
print(f"Participant {self.id}: Received PREPARE for TX {transaction_id}")
# 模拟执行准备工作,例如写入undo/redo日志
# 假设这里总是成功,但在实际中可能因资源冲突等失败
can_commit = True
if can_commit:
self.state = TransactionState.PREPARED
self.locked_resources.append(transaction_id) # 模拟锁定资源
print(f"Participant {self.id}: Prepared for TX {transaction_id}, locked resources. Voting YES.")
return True
else:
print(f"Participant {self.id}: Failed to prepare for TX {transaction_id}. Voting NO.")
return False
def commit(self, transaction_id):
if self.state == TransactionState.PREPARED:
print(f"Participant {self.id}: Received GLOBAL_COMMIT for TX {transaction_id}. Committing.")
# 模拟真正提交数据
self.state = TransactionState.COMMITTED
self.locked_resources.remove(transaction_id) # 释放资源
print(f"Participant {self.id}: Committed TX {transaction_id}, released resources.")
return True
else:
print(f"Participant {self.id}: Cannot commit TX {transaction_id} from state {self.state}.")
return False
def rollback(self, transaction_id):
if self.state == TransactionState.PREPARED or self.state == TransactionState.INITIAL:
print(f"Participant {self.id}: Received GLOBAL_ROLLBACK for TX {transaction_id}. Rolling back.")
# 模拟回滚数据
self.state = TransactionState.ROLLED_BACK
if transaction_id in self.locked_resources:
self.locked_resources.remove(transaction_id) # 释放资源
print(f"Participant {self.id}: Rolled back TX {transaction_id}, released resources.")
return True
else:
print(f"Participant {self.id}: Cannot rollback TX {transaction_id} from state {self.state}.")
return False
# 协调者模拟
class Coordinator:
def __init__(self, participants):
self.participants = participants
self.transaction_id_counter = 0
def start_transaction(self):
self.transaction_id_counter += 1
transaction_id = f"TX-{self.transaction_id_counter}"
print(f"nCoordinator: Starting transaction {transaction_id}")
return transaction_id
def two_phase_commit(self, transaction_id):
# Phase 1: Prepare
print(f"Coordinator: Initiating Phase 1 (Prepare) for TX {transaction_id}")
prepared_participants = []
all_voted_yes = True
for participant in self.participants:
try:
# 模拟网络通信和超时
if participant.prepare(transaction_id):
prepared_participants.append(participant)
else:
all_voted_yes = False
break # 任何一个NO,则全部回滚
except Exception as e:
print(f"Coordinator: Error during PREPARE for participant {participant.id}: {e}")
all_voted_yes = False
break
# Simulate coordinator crash AFTER collecting some votes but BEFORE deciding
# This is where 2PC is vulnerable. If it crashes now, participants are blocked.
# if random.random() < 0.1: # 10% chance of crash
# print("COORDINATOR CRASHES AFTER PREPARE, BEFORE DECISION!")
# # In a real scenario, this would mean the coordinator process stops.
# # Participants in PREPARED state would be blocked indefinitely.
# return TransactionState.ABORTED
# Phase 2: Commit/Rollback
if all_voted_yes:
print(f"Coordinator: All participants voted YES for TX {transaction_id}. Initiating Phase 2 (Commit).")
for participant in prepared_participants: # Only commit those that prepared
try:
# Simulate coordinator crash AFTER sending COMMIT to some, but not all
# This is another critical blocking point for 2PC.
# if random.random() < 0.1: # 10% chance of crash
# print("COORDINATOR CRASHES DURING COMMIT PHASE!")
# # Some participants commit, others are blocked in PREPARED.
# return TransactionState.ABORTED
participant.commit(transaction_id)
except Exception as e:
print(f"Coordinator: Error during COMMIT for participant {participant.id}: {e}")
# Even if one fails to commit, the others might have. Inconsistent state.
return TransactionState.ABORTED
print(f"Coordinator: Transaction {transaction_id} successfully COMMITTED.")
return TransactionState.COMMITTED
else:
print(f"Coordinator: Not all participants voted YES for TX {transaction_id}. Initiating Phase 2 (Rollback).")
# Inform all participants (even those not prepared if needed, but typically only prepared ones)
for participant in self.participants:
try:
participant.rollback(transaction_id)
except Exception as e:
print(f"Coordinator: Error during ROLLBACK for participant {participant.id}: {e}")
print(f"Coordinator: Transaction {transaction_id} ROLLED_BACK.")
return TransactionState.ROLLED_BACK
# --- 运行示例 ---
# p1 = Participant("P1")
# p2 = Participant("P2")
# p3 = Participant("P3")
# coordinator = Coordinator([p1, p2, p3])
#
# tx1 = coordinator.start_transaction()
# result1 = coordinator.two_phase_commit(tx1)
# print(f"Final result for {tx1}: {result1}")
#
# # 模拟一个参与者拒绝提交的情况
# print("n--- Simulating a participant voting NO ---")
# # Hack: temporarily make P2 vote NO for the next transaction
# original_prepare = Participant.prepare
# def mock_prepare_no(self, transaction_id):
# if self.id == "P2":
# print(f"Participant {self.id}: FORCING VOTE NO for TX {transaction_id}.")
# return False
# return original_prepare(self, transaction_id)
# Participant.prepare = mock_prepare_no
#
# tx2 = coordinator.start_transaction()
# result2 = coordinator.two_phase_commit(tx2)
# print(f"Final result for {tx2}: {result2}")
#
# # Restore original prepare method
# Participant.prepare = original_prepare
从上面的代码和解释中可以看到,2PC在协调者或网络发生故障时,很容易导致参与者长时间阻塞,持有关键资源,从而严重影响系统的可用性。这就是3PC试图解决的核心问题。
三阶段提交(3PC):引入预提交阶段
三阶段提交(Three-Phase Commit, 3PC)是为了解决2PC在协调者故障时可能导致的阻塞问题而提出的。3PC在2PC的两个阶段之间插入了一个额外的“预提交”阶段,旨在通过增加更多状态和超时机制,使参与者在协调者失败时能够做出更安全的决策。
3PC的核心思想
3PC的改进之处在于,它通过在正式提交之前引入一个中间的“预提交”状态,使得在特定故障场景下,参与者能够推断出其他参与者的状态,并据此自行决定提交或回滚,从而避免无限期阻塞。
3PC的三个阶段
-
CanCommit 阶段 (询问阶段,类似2PC的Prepare)
- 协调者向所有参与者发送
CanCommit?消息,询问它们是否能够提交事务。 - 参与者收到
CanCommit?后,会执行事务的准备工作(例如,日志记录、资源锁定),但不真正提交。 - 如果参与者能够成功准备,它会向协调者发送
VOTE_YES消息;否则发送VOTE_NO消息。 - 超时机制: 如果协调者在预设时间内没有收到所有参与者的响应,它会假定某些参与者投了
VOTE_NO,并中止事务。
- 协调者向所有参与者发送
-
PreCommit 阶段 (预提交阶段,3PC特有)
- 如果协调者在CanCommit阶段收到所有参与者的
VOTE_YES: 协调者向所有参与者发送PreCommit!消息。 - 参与者收到
PreCommit!消息后,进入PRE-COMMITTED状态。在这个状态下,参与者知道所有其他参与者都已同意提交,并且已经准备好提交。它们会再次写入持久化日志,但仍然不真正提交事务(不释放锁)。 - 参与者向协调者发送
ACK消息,确认已进入PRE-COMMITTED状态。 - 超时机制: 如果参与者在
PREPARED状态下,长时间没有收到协调者的PreCommit!或Abort!消息,它会超时并回滚事务。这是3PC防止阻塞的关键点之一:如果协调者在CanCommit阶段之后、PreCommit阶段之前崩溃,所有参与者将因为超时而回滚。 - 如果协调者在CanCommit阶段收到任何
VOTE_NO,或超时: 协调者向所有参与者发送Abort!消息。参与者收到后回滚事务。
- 如果协调者在CanCommit阶段收到所有参与者的
-
DoCommit 阶段 (提交阶段)
- 如果协调者在PreCommit阶段收到所有参与者的
ACK: 协调者向所有参与者发送DoCommit!消息。 - 参与者收到
DoCommit!消息后,正式提交事务(释放资源),并向协调者发送ACK消息。 - 超时机制: 如果参与者在
PRE-COMMITTED状态下,长时间没有收到协调者的DoCommit!或Abort!消息,它会超时并自行提交事务。这是3PC解决2PC阻塞问题的另一个关键机制。因为参与者收到了PreCommit!,它知道所有其他参与者都已准备好提交,并且协调者已经决定提交。因此,即使协调者崩溃,它也可以安全地假设最终会提交。 - 如果协调者在PreCommit阶段没有收到所有
ACK,或超时: 协调者向所有参与者发送Abort!消息。参与者收到后回滚事务。
- 如果协调者在PreCommit阶段收到所有参与者的
Pre-Commit 阶段如何解决2PC的阻塞问题?
核心在于PRE-COMMITTED状态的语义和超时机制:
-
协调者在CanCommit阶段之后、PreCommit阶段之前崩溃:
- 在2PC中,如果协调者在所有参与者都
PREPARED之后但在发出GLOBAL_COMMIT之前崩溃,参与者会无限期阻塞。 - 在3PC中,如果协调者崩溃,参与者在
PREPARED状态下会因为超时(未收到PreCommit!或Abort!)而自动回滚。这是一个安全的回滚,因为协调者尚未发出PreCommit!,这意味着它还没有确认所有参与者都同意提交。因此,回滚不会导致不一致。
- 在2PC中,如果协调者在所有参与者都
-
协调者在PreCommit阶段之后、DoCommit阶段之前崩溃:
- 这是3PC真正解决2PC阻塞的关键场景。
- 在2PC中,如果协调者在发出
GLOBAL_COMMIT到部分参与者之后崩溃,那些收到消息的会提交,未收到的会阻塞。 - 在3PC中,如果协调者崩溃,部分或全部参与者可能已经收到了
PreCommit!消息并进入PRE-COMMITTED状态。当这些参与者在PRE-COMMITTED状态下等待DoCommit!或Abort!超时时,它们会自动提交事务。 - 为什么可以安全地自动提交?因为收到
PreCommit!消息意味着:- 协调者已经确认所有其他参与者都投了
VOTE_YES。 - 协调者已经决定要提交事务。
- 没有任何一个参与者会自行回滚(因为它们都收到了
PreCommit!,或者在等待PreCommit!时超时并回滚,但这种情况已经被CanCommit阶段的超时处理了)。
- 协调者已经确认所有其他参与者都投了
- 因此,即使协调者崩溃,处于
PRE-COMMITTED状态的参与者也可以独立地推进到提交状态,从而避免了阻塞。
3PC的假设与局限性
尽管3PC在理论上解决了2PC的一些阻塞问题,但它并非完美无缺,并引入了自身的复杂性和限制:
- 无网络分区假设: 3PC的一个核心假设是,网络在任何时候都不会发生分区,或者分区是短暂的,并且分区后的网络最终会恢复。如果网络持续分区,导致部分参与者与协调者或其他参与者隔离,3PC仍然可能导致不一致。例如,在协调者发送
PreCommit!之后发生分区,部分参与者进入PRE-COMMITTED状态,而其他参与者未收到PreCommit!。如果协调者随后崩溃,未收到PreCommit!的参与者会超时回滚,而收到PreCommit!的参与者会超时提交,导致不一致。 - 消息传递延迟有界: 3PC依赖于超时机制来检测故障和进行恢复。这意味着消息传递必须在有限的时间内完成,否则超时判断可能不准确。
- 增加了消息数量和延迟: 引入了一个额外的阶段意味着更多的消息交换和更长的事务延迟。这增加了系统的复杂性和性能开销。
- 实际应用较少: 由于上述假设和复杂性,以及存在更健壮的分布式一致性算法(如Paxos、Raft),3PC在实际的数据库或分布式系统中并不像2PC那样被广泛实现。2PC虽然有阻塞问题,但其实现相对简单,且在许多商业数据库的XA事务中得到了优化和应用。
3PC 伪代码示例
为了更深入地理解3PC的各个阶段和超时处理,我们来编写更详细的伪代码。
import time
import random
from enum import Enum
# 定义事务状态
class TransactionState(Enum):
INITIAL = "INITIAL"
PREPARED = "PREPARED" # 参与者准备好但未收到PreCommit
PRE_COMMITTED = "PRE_COMMITTED" # 参与者收到PreCommit,知道可以提交
COMMITTED = "COMMITTED"
ROLLED_BACK = "ROLLED_BACK"
ABORTED = "ABORTED" # 内部中止状态
# 消息类型
class MessageType(Enum):
CAN_COMMIT_REQUEST = "CAN_COMMIT_REQUEST"
VOTE_YES = "VOTE_YES"
VOTE_NO = "VOTE_NO"
PRE_COMMIT_REQUEST = "PRE_COMMIT_REQUEST"
ACK_PRE_COMMIT = "ACK_PRE_COMMIT"
DO_COMMIT_REQUEST = "DO_COMMIT_REQUEST"
ACK_COMMIT = "ACK_COMMIT"
ABORT_REQUEST = "ABORT_REQUEST"
ACK_ABORT = "ACK_ABORT"
# 参与者模拟
class Participant:
def __init__(self, id, coordinator_ref, can_commit_timeout_sec=5, pre_commit_timeout_sec=5):
self.id = id
self.coordinator = coordinator_ref
self.state = TransactionState.INITIAL
self.current_transaction_id = None
self.locked_resources = [] # 模拟锁定的资源
self.can_commit_timeout = can_commit_timeout_sec
self.pre_commit_timeout = pre_commit_timeout_sec
self.timer = None
def start_timer(self, duration_sec, callback_func):
# 实际系统中会是异步定时器或线程
self.timer = time.time() + duration_sec
print(f"Participant {self.id}: Timer started for {duration_sec}s for {self.current_transaction_id}. Callback: {callback_func.__name__}")
def check_timer(self):
if self.timer and time.time() >= self.timer:
self.timer = None
return True
return False
def handle_message(self, message_type, transaction_id):
if transaction_id != self.current_transaction_id and message_type != MessageType.CAN_COMMIT_REQUEST:
print(f"Participant {self.id}: Ignoring message {message_type} for old TX {transaction_id}, current is {self.current_transaction_id}")
return
print(f"Participant {self.id}: Handling message {message_type} for TX {transaction_id} in state {self.state}")
if message_type == MessageType.CAN_COMMIT_REQUEST:
self.current_transaction_id = transaction_id
self.state = TransactionState.INITIAL # Reset state for new transaction
# Simulate preparation work
can_prepare = random.choice([True, True, True, False]) # 75% chance to say YES
if can_prepare:
self.state = TransactionState.PREPARED
self.locked_resources.append(transaction_id)
print(f"Participant {self.id}: Prepared for TX {transaction_id}. Voting YES.")
self.coordinator.receive_vote(self.id, transaction_id, MessageType.VOTE_YES)
# Start timer for PreCommit or Abort from coordinator
self.start_timer(self.can_commit_timeout, self.timeout_in_prepared_state)
else:
self.state = TransactionState.ROLLED_BACK # Immediately abort locally
print(f"Participant {self.id}: Failed to prepare for TX {transaction_id}. Voting NO.")
self.coordinator.receive_vote(self.id, transaction_id, MessageType.VOTE_NO)
elif message_type == MessageType.PRE_COMMIT_REQUEST:
if self.state == TransactionState.PREPARED:
self.state = TransactionState.PRE_COMMITTED
print(f"Participant {self.id}: Entered PRE_COMMITTED state for TX {transaction_id}.")
self.coordinator.receive_ack(self.id, transaction_id, MessageType.ACK_PRE_COMMIT)
# Start timer for DoCommit or Abort from coordinator
self.start_timer(self.pre_commit_timeout, self.timeout_in_pre_committed_state)
else:
print(f"Participant {self.id}: Received PRE_COMMIT_REQUEST in unexpected state {self.state}. Responding with ACK_ABORT for safety.")
# This could indicate a coordinator retry or out-of-order message after participant timeout
self.coordinator.receive_ack(self.id, transaction_id, MessageType.ACK_ABORT)
self.rollback_local(transaction_id) # Ensure consistency
elif message_type == MessageType.DO_COMMIT_REQUEST:
if self.state == TransactionState.PRE_COMMITTED:
self.commit_local(transaction_id)
self.coordinator.receive_ack(self.id, transaction_id, MessageType.ACK_COMMIT)
else:
print(f"Participant {self.id}: Received DO_COMMIT_REQUEST in unexpected state {self.state}. Transaction might be inconsistent.")
# Force commit if we are in PREPARED state as well, as coordinator must have decided commit
# This is a recovery logic based on the assumption of 3PC safety
if self.state == TransactionState.PREPARED:
print(f"Participant {self.id}: Forcing commit as coordinator sent DO_COMMIT while in PREPARED.")
self.commit_local(transaction_id)
self.coordinator.receive_ack(self.id, transaction_id, MessageType.ACK_COMMIT)
else:
self.coordinator.receive_ack(self.id, transaction_id, MessageType.ACK_ABORT) # Cannot commit
elif message_type == MessageType.ABORT_REQUEST:
self.rollback_local(transaction_id)
self.coordinator.receive_ack(self.id, transaction_id, MessageType.ACK_ABORT)
def timeout_in_prepared_state(self):
# If coordinator crashed after CanCommit but before PreCommit
# Participant was in PREPARED state, means coordinator has not yet decided GLOBAL_COMMIT
# So it's safe to abort.
if self.state == TransactionState.PREPARED:
print(f"Participant {self.id}: TIMEOUT in PREPARED state for TX {self.current_transaction_id}. Rolling back.")
self.rollback_local(self.current_transaction_id)
# Optionally inform coordinator of rollback (if coordinator later recovers)
self.coordinator.receive_ack(self.id, self.current_transaction_id, MessageType.ACK_ABORT)
def timeout_in_pre_committed_state(self):
# This is the key non-blocking part of 3PC
# If coordinator crashed after PreCommit but before DoCommit
# Participant was in PRE_COMMITTED state, which means:
# 1. All participants voted YES in CanCommit.
# 2. Coordinator decided to commit and sent PreCommit.
# So, it's safe to self-commit.
if self.state == TransactionState.PRE_COMMITTED:
print(f"Participant {self.id}: TIMEOUT in PRE_COMMITTED state for TX {self.current_transaction_id}. Self-committing!")
self.commit_local(self.current_transaction_id)
# Optionally inform coordinator of commit (if coordinator later recovers)
self.coordinator.receive_ack(self.id, self.current_transaction_id, MessageType.ACK_COMMIT)
def commit_local(self, transaction_id):
if self.state != TransactionState.COMMITTED:
print(f"Participant {self.id}: Committing TX {transaction_id}. Releasing resources.")
self.state = TransactionState.COMMITTED
if transaction_id in self.locked_resources:
self.locked_resources.remove(transaction_id)
self.current_transaction_id = None # Transaction finished
self.timer = None
def rollback_local(self, transaction_id):
if self.state != TransactionState.ROLLED_BACK:
print(f"Participant {self.id}: Rolling back TX {transaction_id}. Releasing resources.")
self.state = TransactionState.ROLLED_BACK
if transaction_id in self.locked_resources:
self.locked_resources.remove(transaction_id)
self.current_transaction_id = None # Transaction finished
self.timer = None
# 协调者模拟
class Coordinator:
def __init__(self, participants, coordinator_timeout_sec=10):
self.participants = {p.id: p for p in participants}
self.transaction_id_counter = 0
self.transactions = {} # Stores transaction state
self.coordinator_timeout = coordinator_timeout_sec
def start_transaction(self):
self.transaction_id_counter += 1
transaction_id = f"TX-{self.transaction_id_counter}"
self.transactions[transaction_id] = {
"state": TransactionState.INITIAL,
"votes": {}, # Participant ID -> Vote
"acks": {}, # Participant ID -> ACK type
"start_time": time.time()
}
print(f"nCoordinator: Starting transaction {transaction_id}")
return transaction_id
def three_phase_commit(self, transaction_id):
tx_data = self.transactions.get(transaction_id)
if not tx_data:
print(f"Coordinator: Error, transaction {transaction_id} not found.")
return TransactionState.ABORTED
# Phase 1: CanCommit
tx_data["state"] = TransactionState.INITIAL
print(f"Coordinator: Initiating Phase 1 (CanCommit) for TX {transaction_id}")
for p_id, participant in self.participants.items():
# Simulate sending message over network
# In a real system, this would be asynchronous
participant.handle_message(MessageType.CAN_COMMIT_REQUEST, transaction_id)
# Wait for votes or timeout
start_wait = time.time()
while time.time() - start_wait < self.coordinator_timeout:
# Simulate checking for votes (real system uses event loop or queue)
if len(tx_data["votes"]) == len(self.participants):
break
time.sleep(0.1) # Small delay to simulate async processing
# Evaluate votes
all_voted_yes = True
if len(tx_data["votes"]) != len(self.participants):
all_voted_yes = False # Timeout happened or not all replied
print(f"Coordinator: Timeout or not all votes received in CanCommit for TX {transaction_id}.")
else:
for vote in tx_data["votes"].values():
if vote == MessageType.VOTE_NO:
all_voted_yes = False
break
if not all_voted_yes:
print(f"Coordinator: Aborting TX {transaction_id} (not all YES votes or timeout).")
self._send_abort(transaction_id)
tx_data["state"] = TransactionState.ROLLED_BACK
return tx_data["state"]
# Phase 2: PreCommit
tx_data["state"] = TransactionState.PRE_COMMITTED # Coordinator's state
tx_data["acks"] = {} # Reset acks for this phase
print(f"Coordinator: All YES votes for TX {transaction_id}. Initiating Phase 2 (PreCommit).")
for p_id, participant in self.participants.items():
participant.handle_message(MessageType.PRE_COMMIT_REQUEST, transaction_id)
# Wait for ACKs or timeout
start_wait = time.time()
while time.time() - start_wait < self.coordinator_timeout:
if len(tx_data["acks"]) == len(self.participants):
break
time.sleep(0.1)
all_pre_committed_ack = True
if len(tx_data["acks"]) != len(self.participants):
all_pre_committed_ack = False # Timeout or not all replied
print(f"Coordinator: Timeout or not all ACK_PRE_COMMIT received for TX {transaction_id}.")
else:
for ack in tx_data["acks"].values():
if ack == MessageType.ACK_ABORT: # If any participant decided to abort
all_pre_committed_ack = False
break
# Simulate coordinator crash AFTER sending PreCommit, but BEFORE receiving all ACKs or sending DoCommit
# This is where 3PC handles coordinator failure
# if random.random() < 0.1: # 10% chance of crash
# print("COORDINATOR CRASHES AFTER PRE-COMMIT, BEFORE DO-COMMIT!")
# tx_data["state"] = TransactionState.ABORTED # Coordinator's view
# return tx_data["state"]
if not all_pre_committed_ack:
print(f"Coordinator: Aborting TX {transaction_id} (not all ACK_PRE_COMMIT or timeout).")
self._send_abort(transaction_id)
tx_data["state"] = TransactionState.ROLLED_BACK
return tx_data["state"]
# Phase 3: DoCommit
tx_data["state"] = TransactionState.COMMITTED # Coordinator's state
tx_data["acks"] = {} # Reset acks for this phase
print(f"Coordinator: All ACK_PRE_COMMIT received for TX {transaction_id}. Initiating Phase 3 (DoCommit).")
for p_id, participant in self.participants.items():
participant.handle_message(MessageType.DO_COMMIT_REQUEST, transaction_id)
# Wait for ACKs or timeout (less critical here, as participants can self-commit)
start_wait = time.time()
while time.time() - start_wait < self.coordinator_timeout:
if len(tx_data["acks"]) == len(self.participants):
break
time.sleep(0.1)
print(f"Coordinator: Transaction {transaction_id} successfully COMMITTED.")
return tx_data["state"]
def _send_abort(self, transaction_id):
print(f"Coordinator: Sending ABORT for TX {transaction_id} to all participants.")
for p_id, participant in self.participants.items():
participant.handle_message(MessageType.ABORT_REQUEST, transaction_id)
def receive_vote(self, participant_id, transaction_id, vote):
tx_data = self.transactions.get(transaction_id)
if tx_data and participant_id not in tx_data["votes"]:
tx_data["votes"][participant_id] = vote
print(f"Coordinator: Received vote {vote} from {participant_id} for TX {transaction_id}.")
def receive_ack(self, participant_id, transaction_id, ack_type):
tx_data = self.transactions.get(transaction_id)
if tx_data and participant_id not in tx_data["acks"]:
tx_data["acks"][participant_id] = ack_type
print(f"Coordinator: Received ACK {ack_type} from {participant_id} for TX {transaction_id}.")
def simulate_participant_timers(self):
# In a real system, participants would run their own timer loops
for p_id, p in self.participants.items():
if p.check_timer():
if p.state == TransactionState.PREPARED:
p.timeout_in_prepared_state()
elif p.state == TransactionState.PRE_COMMITTED:
p.timeout_in_pre_committed_state()
# --- 运行示例 ---
# p1 = Participant("P1", None) # Coordinator ref will be set after coordinator creation
# p2 = Participant("P2", None)
# p3 = Participant("P3", None)
# participants_list = [p1, p2, p3]
#
# coordinator = Coordinator(participants_list)
# for p in participants_list:
# p.coordinator = coordinator # Set coordinator reference
# Normal successful commit
# tx1 = coordinator.start_transaction()
# result1 = coordinator.three_phase_commit(tx1)
# print(f"nFinal result for {tx1}: {result1}")
# coordinator.simulate_participant_timers() # Check for any lingering timers
# --- 模拟故障场景 ---
# Scenario 1: Coordinator crashes after CanCommit, before PreCommit (participants should rollback)
# print("n--- Simulating Coordinator crash after CanCommit, before PreCommit ---")
# # Hack: Temporarily disable coordinator's PreCommit and DoCommit logic
# original_3pc = coordinator.three_phase_commit
# def mock_crash_after_can_commit(self, transaction_id):
# tx_data = self.transactions.get(transaction_id)
# # Phase 1: CanCommit (same as before)
# tx_data["state"] = TransactionState.INITIAL
# print(f"Coordinator: Initiating Phase 1 (CanCommit) for TX {transaction_id}")
# for p_id, participant in self.participants.items():
# participant.handle_message(MessageType.CAN_COMMIT_REQUEST, transaction_id)
#
# # Wait for votes or timeout
# start_wait = time.time()
# while time.time() - start_wait < self.coordinator_timeout:
# if len(tx_data["votes"]) == len(self.participants):
# break
# time.sleep(0.1)
#
# all_voted_yes = True
# if len(tx_data["votes"]) != len(self.participants):
# all_voted_yes = False
# else:
# for vote in tx_data["votes"].values():
# if vote == MessageType.VOTE_NO:
# all_voted_yes = False
# break
#
# if all_voted_yes:
# print(f"Coordinator: All YES votes for TX {transaction_id}. COORDINATOR CRASHES NOW (before PreCommit)!")
# tx_data["state"] = TransactionState.ABORTED # Coordinator's internal state
# return tx_data["state"] # Coordinator effectively stops here
# else:
# print(f"Coordinator: Aborting TX {transaction_id} (not all YES votes or timeout).")
# self._send_abort(transaction_id)
# tx_data["state"] = TransactionState.ROLLED_BACK
# return tx_data["state"]
#
# coordinator.three_phase_commit = mock_crash_after_can_commit
#
# tx2 = coordinator.start_transaction()
# print("--- Participants are now in PREPARED state, waiting for coordinator... ---")
# # Coordinator "crashes" here. Participants will eventually timeout and rollback.
# time.sleep(p1.can_commit_timeout + 1) # Wait for participants to timeout
# coordinator.simulate_participant_timers()
#
# print(f"Participant P1 state: {p1.state}, locked: {p1.locked_resources}")
# print(f"Participant P2 state: {p2.state}, locked: {p2.locked_resources}")
# print(f"Participant P3 state: {p3.state}, locked: {p3.locked_resources}")
# # Expected: All participants should be ROLLED_BACK
#
# coordinator.three_phase_commit = original_3pc # Restore original method
# Scenario 2: Coordinator crashes after PreCommit, before DoCommit (participants should self-commit)
# print("n--- Simulating Coordinator crash after PreCommit, before DoCommit ---")
# def mock_crash_after_pre_commit(self, transaction_id):
# tx_data = self.transactions.get(transaction_id)
# # Phase 1: CanCommit (same as before)
# tx_data["state"] = TransactionState.INITIAL
# print(f"Coordinator: Initiating Phase 1 (CanCommit) for TX {transaction_id}")
# for p_id, participant in self.participants.items():
# participant.handle_message(MessageType.CAN_COMMIT_REQUEST, transaction_id)
# start_wait = time.time()
# while time.time() - start_wait < self.coordinator_timeout:
# if len(tx_data["votes"]) == len(self.participants): break
# time.sleep(0.1)
# all_voted_yes = True
# if len(tx_data["votes"]) != len(self.participants): all_voted_yes = False
# else:
# for vote in tx_data["votes"].values():
# if vote == MessageType.VOTE_NO: all_voted_yes = False; break
#
# if not all_voted_yes:
# self._send_abort(transaction_id); tx_data["state"] = TransactionState.ROLLED_BACK; return tx_data["state"]
#
# # Phase 2: PreCommit (same as before)
# tx_data["state"] = TransactionState.PRE_COMMITTED
# tx_data["acks"] = {}
# print(f"Coordinator: All YES votes for TX {transaction_id}. Initiating Phase 2 (PreCommit).")
# for p_id, participant in self.participants.items():
# participant.handle_message(MessageType.PRE_COMMIT_REQUEST, transaction_id)
#
# start_wait = time.time()
# while time.time() - start_wait < self.coordinator_timeout:
# if len(tx_data["acks"]) == len(self.participants): break
# time.sleep(0.1)
#
# all_pre_committed_ack = True
# if len(tx_data["acks"]) != len(self.participants): all_pre_committed_ack = False
# else:
# for ack in tx_data["acks"].values():
# if ack == MessageType.ACK_ABORT: all_pre_committed_ack = False; break
#
# if all_pre_committed_ack:
# print(f"Coordinator: All ACK_PRE_COMMIT received for TX {transaction_id}. COORDINATOR CRASHES NOW (before DoCommit)!")
# tx_data["state"] = TransactionState.ABORTED # Coordinator's internal state
# return tx_data["state"] # Coordinator effectively stops here
# else:
# self._send_abort(transaction_id); tx_data["state"] = TransactionState.ROLLED_BACK; return tx_data["state"]
#
# coordinator.three_phase_commit = mock_crash_after_pre_commit
#
# tx3 = coordinator.start_transaction()
# print("--- Participants are now in PRE_COMMITTED state, waiting for coordinator... ---")
# # Coordinator "crashes" here. Participants will eventually timeout and self-commit.
# time.sleep(p1.pre_commit_timeout + 1) # Wait for participants to timeout
# coordinator.simulate_participant_timers()
#
# print(f"Participant P1 state: {p1.state}, locked: {p1.locked_resources}")
# print(f"Participant P2 state: {p2.state}, locked: {p2.locked_resources}")
# print(f"Participant P3 state: {p3.state}, locked: {p3.locked_resources}")
# # Expected: All participants should be COMMITTED
#
# coordinator.three_phase_commit = original_3pc # Restore original method
在上面的伪代码中,我们模拟了协调者和参与者的消息交换、状态转换以及超时处理。特别注意Participant类中的timeout_in_prepared_state和timeout_in_pre_committed_state方法,它们是3PC解决阻塞问题的核心逻辑。
- 当参与者在
PREPARED状态超时时,它会安全地回滚,因为协调者尚未发出PreCommit,这意味着事务尚未被协调者完全批准。 - 当参与者在
PRE_COMMITTED状态超时时,它会安全地提交,因为PreCommit消息已经确认了协调者的提交意图以及所有其他参与者的准备就绪。
2PC 与 3PC 对比
| 特性 | 两阶段提交 (2PC) | 三阶段提交 (3PC) |
|---|---|---|
| 阶段数量 | 2个:Prepare, Commit/Rollback | 3个:CanCommit, PreCommit, DoCommit/Abort |
| 阻塞问题 | 协调者故障或网络分区可能导致参与者无限期阻塞 | 引入PreCommit阶段和超时机制,旨在解决协调者故障时的阻塞问题 |
| 消息数量 | 较少 | 较多 (额外一轮消息交换) |
| 延迟 | 较低 | 较高 (额外一轮等待和超时) |
| 原子性 | 强保证 | 强保证 (在无网络分区假设下) |
| 可用性 | 较差 (易阻塞) | 较好 (降低了特定故障模式下的阻塞) |
| 一致性 | 强一致性 | 强一致性 (在无网络分区假设下) |
| 故障场景 | 协调者在Commit阶段崩溃导致部分参与者阻塞 | 协调者在PreCommit阶段崩溃,参与者可自行提交;协调者在CanCommit阶段崩溃,参与者可自行回滚 |
| 网络分区 | 易导致不一致和阻塞 | 在网络分区下仍可能导致不一致,是其主要缺陷 |
| 复杂性 | 相对简单 | 较复杂,尤其是在恢复逻辑上 |
| 实际应用 | 广泛应用于XA事务、传统数据库分布式事务 | 理论意义大于实际应用,较少被通用数据库实现 |
超越3PC:现代分布式事务的实践
尽管3PC在理论上改进了2PC,但其对“无网络分区”的强假设以及更高的复杂性,使得它在实际生产环境中并没有得到广泛应用。现代分布式系统在处理事务时,通常会采用以下更为实用和健壮的方法:
-
Sagas 模式: 针对微服务架构,Sagas将一个长事务分解为一系列本地事务,每个本地事务都有一个对应的补偿操作。如果某个本地事务失败,可以通过执行之前已成功本地事务的补偿操作来回滚整个Saga。Sagas牺牲了强一致性,通常提供最终一致性,但极大地提高了可用性。
-
分布式共识算法 (Paxos / Raft): 这些算法旨在解决分布式系统中的一致性问题,它们通常被用于构建高可用的分布式状态机或复制日志。通过将事务协调者的状态复制到多个节点,即使协调者崩溃,也可以通过选举新的领导者来继续处理事务,从而避免阻塞。这比2PC/3PC提供了更强的容错性,但实现起来也更为复杂。
-
消息队列 + 幂等操作: 对于不需要严格实时一致性的场景,可以通过可靠消息队列和幂等操作来实现事务。例如,将操作分解为一系列事件,通过消息队列异步处理。接收方需要确保即使多次收到相同消息也能正确处理(幂等性)。这种方式也实现了最终一致性。
-
两阶段提交的增强: 许多商业数据库和事务管理器(如JTA/XA)仍然使用2PC,但通过各种优化和增强(例如持久化协调者日志、手动恢复工具、更智能的超时和故障检测机制)来缓解其阻塞问题。
结语
三阶段提交协议通过引入Pre-Commit阶段和精细的超时机制,在理论上提升了分布式事务在协调者故障时的非阻塞性。它在2PC的基础上,通过增加通信轮次和状态,使得参与者在特定故障场景下能够做出更安全的独立决策。然而,其对网络分区的高度敏感性以及由此带来的复杂性,限制了它在广泛实践中的应用。在构建高可靠的分布式系统时,理解2PC和3PC的原理和局限性是至关重要的,但更重要的是结合具体的业务需求和系统架构,选择或设计更为合适的分布式事务解决方案,如Sagas或基于分布式共识算法的方案,以在性能、可用性和一致性之间取得最佳平衡。