SDR Transaction Reversibility and Crash Recovery
This document provides a deep dive into ION's SDR (Simple Data Recorder) transaction reversibility mechanism, including how transactions work, how crash recovery is triggered, and how daemons coordinate during recovery.
Overview
ION's SDR provides ACID-like transaction semantics for its shared memory dataspace. The reversibility mechanism ensures that if a daemon crashes mid-transaction, the dataspace can be restored to a consistent state. This is achieved through:
- Write-Ahead Logging (WAL) - Original data is logged before any modification
- Transaction Ownership Tracking - Only one task can own a transaction at a time
- Automatic Reversal - Incomplete transactions are automatically rolled back
- Coordinated Recovery -
ionrestartorchestrates daemon restart after crashes
Transaction Lifecycle
Starting a Transaction
sdr_begin_xn() performs:
- Acquires the SDR semaphore to serialize access
- Records the owner task ID and thread ID in SdrState
- Initializes xnDepth to 1 (supports nested transactions)
- Clears xnCanceled and modified flags
Committing a Transaction
sdr_end_xn() performs:
- Decrements xnDepth
- When xnDepth reaches 0, calls terminateXn() to finalize
- If successful, truncates the transaction log to zero length
- Releases the SDR semaphore
Canceling a Transaction
sdr_cancel_xn() performs:
- Sets xnCanceled flag to 1
- Decrements xnDepth
- Triggers terminateXn() when xnDepth reaches 0
- terminateXn() calls reverseTransaction() to undo all modifications
Exiting a Transaction (Defensive)
sdr_exit_xn() is used when unwinding from errors:
- If no modifications were made: safely releases the transaction
- If modifications were made (modified=1): automatically triggers cancellation
- Handles cases where call stack is unwinding after a crash
Write-Ahead Logging
The transaction log is the core mechanism enabling reversibility. Every write to the SDR dataspace is preceded by logging the original data.
Log Entry Format
Each log entry contains:
| Offset | Size | Content |
|---|---|---|
| 0 | W bytes | Start address within SDR heap |
| W | W bytes | Length (L) of data being written |
| 2W | L bytes | Original data at that address |
Where W = WORD_SIZE (4 on 32-bit, 8 on 64-bit systems)
Log Storage Options
File-based logging (logSize == 0):
- Non-volatile, survives power cycles and crashes
- Location: <pathName>/<sdrname>.sdrlog
- Required for crash recovery across restarts
Memory-based logging (logSize > 0):
- Volatile, lost on process termination
- Faster, suitable for SDRs that don't require crash recovery
- Allocated in shared memory
Write Operation Flow
When sdr_write() is called:
- Ownership Validation: Verify current task still owns the transaction
- Log Control Write: Write address and length to log
- Data Capture: Write original data from target address to log
- Entry Registration: Insert log offset into
logEntrieslist - Actual Write: Write new data to target address in dataspace
┌─────────────────────────────────────────────────────────────┐
│ sdr_write() Flow │
├─────────────────────────────────────────────────────────────┤
│ 1. Check sdrOwnerTask == current task │
│ └─ If not, silently return (defensive) │
│ │
│ 2. Write to transaction log: │
│ ┌────────────┬────────────┬─────────────────┐ │
│ │ Address │ Length │ Original Data │ │
│ │ (W bytes) │ (W bytes) │ (L bytes) │ │
│ └────────────┴────────────┴─────────────────┘ │
│ │
│ 3. Add log entry offset to logEntries list │
│ │
│ 4. Write new data to SDR heap address │
└─────────────────────────────────────────────────────────────┘
Transaction Reversal
When a transaction is canceled (explicitly or due to crash), reverseTransaction() restores the dataspace to its pre-transaction state.
Reversal Algorithm
- Iterate in Reverse Order: Process log entries LIFO (last-in, first-out)
- Read Log Entry: Parse address, length, and original data from log
- Restore Data: Write original data back to the heap address
- Dual-Site Consistency: If using file-backed SDR, write to both shared memory and file
for (elt = sm_list_last(logEntries); elt; elt = sm_list_prev(elt)) {
// Read log entry: address, length, original_data
// Write original_data back to heap[address]
// If file-backed, also write to dsfile
}
The reverse order is critical for correctness: if the same address was written multiple times, the earliest value must be restored last.
Transaction Ownership and Hijacking
Owner Tracking
The SdrState structure tracks transaction ownership:
typedef struct {
sm_SemId sdrSemaphore; // Mutex for serialization
int sdrOwnerTask; // Task ID holding transaction
pthread_t sdrOwnerThread; // Thread ID for nested calls
int xnDepth; // Nesting level
int xnCanceled; // Flag: marked for reversal
int modified; // Flag: dataspace modified
int halted; // Flag: daemons initializing
char restartCmd[32]; // Command to invoke ionrestart
} SdrState;
Ownership Validation
Every SDR operation validates ownership:
int sdr_in_xn(Sdr sdrv) {
return (sdrv->sdr->sdrOwnerTask == sm_TaskIdSelf()
&& pthread_equal(sdrv->sdr->sdrOwnerThread, pthread_self()));
}
Defensive Owner Check During Writes
A critical safety feature prevents stale writes after ownership transfer:
// In _sdrput():
if (sdr->sdrOwnerTask != sm_TaskIdSelf()) {
// Task no longer owns transaction
// Silently return without corrupting new owner's data
return;
}
This handles the scenario where:
1. Daemon A crashes mid-transaction
2. ionrestart hijacks the transaction
3. Daemon A's call stack continues unwinding, attempting more writes
4. These writes are silently ignored because ownership transferred
Crash Detection and Recovery
Automatic Recovery at Startup
When sdr_load_profile() is called (typically during ionAttach()):
- Open Log File: Check if log file exists and has content
- Reload Log Entries: Reconstruct
logEntrieslist from log file - Detect Incomplete Transaction: Non-empty log indicates crash
- Automatic Reversal: Call
reverseTransaction()to restore consistency - Truncate Log: Clear log after successful reversal
┌─────────────────────────────────────────────────────────────┐
│ sdr_load_profile() Recovery │
├─────────────────────────────────────────────────────────────┤
│ 1. Open log file │
│ └─ If empty: no recovery needed │
│ │
│ 2. If log has content: │
│ ├─ reloadLogEntries() - scan log, rebuild list │
│ ├─ reverseTransaction() - restore original data │
│ └─ Truncate log file │
│ │
│ 3. Load dataspace from file into shared memory │
└─────────────────────────────────────────────────────────────┘
ionrestart Recovery Orchestration
When a daemon crash is detected, ionrestart coordinates full system recovery:
Phase 1: Transaction Hijacking
// Hijack transaction ownership from crashed task
sdrv->sdr->sdrOwnerTask = sm_TaskIdSelf();
sdrv->sdr->sdrOwnerThread = pthread_self();
// Block new transactions from crashed task
sdrSemaphore = sdrv->sdr->sdrSemaphore;
sdrv->sdr->sdrSemaphore = -1;
// Wait for crashed task to terminate
snooze(RESTART_GRACE_PERIOD);
// Re-enable transactions
sdrv->sdr->sdrSemaphore = sdrSemaphore;
Phase 2: Stop All Daemons
// Stop protocol daemons in order
cfdp_stop();
bp_stop();
ltp_stop();
rfx_stop();
// Wait for daemons to terminate, force SIGKILL if needed
Phase 3: Drop and Rebuild Volatile Databases
// Terminate SDR semaphore to interrupt waiting tasks
sm_SemEnd(sdr->sdrSemaphore);
// Drop corrupted volatile databases
cfdpDropVdb();
bpDropVdb();
ltpDropVdb();
ionDropVdb();
// Restore semaphore
sm_SemUnend();
sm_SemGive();
// Reconstruct volatile databases from persistent SDR
ionRaiseVdb();
ltpRaiseVdb();
bpRaiseVdb();
cfdpRaiseVdb();
Phase 4: Restart Daemons
// Set halted flag to allow safe daemon initialization
sdrv->sdr->halted = 1;
// Start daemons in order
rfx_start();
ltpStart();
bpStart();
cfdpStart();
// Clear halted flag when all daemons initialized
sdrv->sdr->halted = 0;
Daemon Coordination During Recovery
The Halted Flag
The halted flag in SdrState coordinates daemon startup:
- Set by ionrestart before starting daemons
- Cleared by ionrestart after all daemons are initialized
- Checked by sdrFetchSafe() to allow safe SDR access during startup
sdrFetchSafe() Check
Read-only SDR functions use sdrFetchSafe() for safe access:
Two conditions allow safe reads: 1. In Transaction: Current task owns the transaction 2. Heap Halted: Daemons are starting up under ionrestart control
Defensive SDR List Functions
The SDR list functions (sdr_list_first, sdr_list_next, etc.) check sdrFetchSafe() defensively:
Object sdr_list_first(Sdr sdrv, Object list) {
if (!sdrFetchSafe(sdrv)) {
writeMemoNote("[?] sdr_list_first called but SDR not accessible",
itoa(list));
return 0; // Defensive: SDR not accessible
}
// ... normal operation
}
This prevents cascading crashes during recovery when newly restarted daemons attempt SDR operations before fully initialized.
Crash Recovery Flow
Complete Recovery Scenario
┌─────────────────────────────────────────────────────────────┐
│ Daemon Crash Recovery Timeline │
├─────────────────────────────────────────────────────────────┤
│ │
│ T0: Daemon A holds transaction, executes sdr_write() │
│ └─ Log entries accumulate │
│ │
│ T1: Daemon A crashes (SIGSEGV, assertion, etc.) │
│ └─ Transaction incomplete, log file has entries │
│ │
│ T2: Crash detected, ionrestart invoked │
│ └─ ionAttach() → sdr_load_profile() │
│ └─ reloadLogEntries() reconstructs log entry list │
│ └─ reverseTransaction() restores original data │
│ │
│ T3: ionrestart hijacks transaction │
│ └─ sdrOwnerTask = ionrestart's task ID │
│ └─ sdrSemaphore = -1 (block crashed task) │
│ │
│ T4: Crashed daemon's call stack unwinds │
│ └─ Stale sdr_write() calls check ownership │
│ └─ Owner mismatch → silently return (no corruption) │
│ │
│ T5: Grace period expires │
│ └─ Crashed daemon fully terminated │
│ └─ Semaphore re-enabled │
│ │
│ T6: Volatile databases dropped and rebuilt │
│ └─ Corrupted in-memory state discarded │
│ └─ Clean state reconstructed from persistent SDR │
│ │
│ T7: Daemons restarted with halted=1 │
│ └─ sdrFetchSafe() returns true for all daemons │
│ └─ Safe initialization without transaction │
│ │
│ T8: halted=0, normal operation resumes │
│ │
└─────────────────────────────────────────────────────────────┘
Restart Loop Prevention
To prevent infinite restart loops when crashes recur rapidly:
time_t prevRestartTime = sdrv->sdr->restartTime;
sdrv->sdr->restartTime = getCtime();
if ((sdrv->sdr->restartTime - prevRestartTime) < RESTART_LOOP_INTERVAL) {
writeMemo("Inferred restart loop. Tasks not restarted.");
return; // Skip daemon restart
}
If ionrestart is invoked too frequently (within RESTART_LOOP_INTERVAL), daemon restart is skipped to prevent thrashing.
Configuration
Enabling Reversibility
SDR reversibility is enabled via the configFlags in the .ionconfig file:
The flags are a bitmask:
- Bit 0 (1): SDR_IN_DRAM - Use shared memory for dataspace
- Bit 2 (4): SDR_IN_FILE - Use file for persistent storage
- Bit 3 (8): SDR_REVERSIBLE - Enable transaction logging
configFlags 13 = 1 + 4 + 8 = DRAM + FILE + REVERSIBLE
Log Size Configuration
logSize 0: Use file-based logging (persistent)logSize > 0: Use memory-based logging with specified size (volatile)
For crash recovery across system restarts, use file-based logging (logSize 0) or ensure sufficient log size for transaction workload.
Safety Features Summary
| Feature | Purpose |
|---|---|
| Write-Ahead Logging | Log original data before modification |
| LIFO Reversal | Correct order for multi-write transactions |
| Ownership Tracking | Prevent concurrent transaction conflicts |
| Defensive Owner Check | Ignore stale writes after ownership transfer |
| Dual-Site Consistency | Keep shared memory and file synchronized |
| Halted Flag | Safe daemon initialization during restart |
| sdrFetchSafe() | Allow reads during halted state |
| Restart Loop Detection | Prevent infinite crash/restart cycles |
| Grace Period | Allow crashed task to fully terminate |
Key Functions Reference
| Function | File | Purpose |
|---|---|---|
sdr_begin_xn |
sdrxn.c | Start transaction, acquire lock |
sdr_end_xn |
sdrxn.c | Commit transaction, release lock |
sdr_cancel_xn |
sdrxn.c | Rollback transaction |
sdr_exit_xn |
sdrxn.c | Safely exit (with auto-cancel if modified) |
sdr_in_xn |
sdrxn.c | Check if current task owns transaction |
sdr_heap_is_halted |
sdrxn.c | Check if heap is in halted state |
sdrFetchSafe |
sdrxn.c | Check if safe to read (in-xn OR halted) |
terminateXn |
sdrxn.c | Finalize transaction (commit or reverse) |
reverseTransaction |
sdrxn.c | Undo all writes using log |
reloadLogEntries |
sdrxn.c | Reconstruct log entry list from file |
restartION |
ionrestart.c | Full system recovery orchestration |