Dual Write Problem Article
# When Blockchain Meets PostgreSQL: Solving the Dual-Write Dilemma in Our Rust Service
## Introduction: Two Sources of Truth, One Big Headache
Picture this: you're building a diploma verification system. The requirements seem straightforward—data must be immutable (hello, blockchain) while remaining queryable at speed (hello, PostgreSQL). The obvious solution? Write to both. But as every battle-scarred engineer knows, the devil lurks in the implementation details.
Our project uses the dual-write pattern:
- Solana — guarantees immutability and transparency for issued diplomas
- PostgreSQL (Supabase) — enables fast queries and complex analytics
Looks great on architecture diagrams, but production tells a different story. The killer issue? Partial failures. Your Solana transaction succeeds, the diploma is forever etched into the blockchain, but then PostgreSQL decides to take a coffee break. The user gets their confirmation, but half your system has no idea their diploma exists.
Today I'll walk you through how we faced this beast head-on and the patterns we deployed to tame it.
## Anatomy of a Failure: Where Things Break
Let's dive into the actual code from our internal/api/handlers.rs. The issue_diploma function is where the magic happens... and where everything can go sideways:
```rust
pub async fn issue_diploma(
State(state): State<Arc<AppState>>,
mut multipart: Multipart,
) -> Result<Json<IssueResponse>, AppError> {
// ... multipart data parsing ...
// Generate diploma hash
let hash = hashing::generate_hash(
&file_bytes,
&req.issuer_id,
&req.recipient_id,
issued_at,
req.serial.as_deref(),
);
// Sign the hash with private key
let signature = hashing::sign_hash(&hash, &state.issuer_keypair)?;
let diploma = Diploma {
hash: hash.clone(),
issuer_id: req.issuer_id.clone(),
recipient_id: req.recipient_id.clone(),
signature: Some(signature.clone()),
issued_at,
serial: req.serial.clone(),
ipfs_cid: None,
};
// CRITICAL POINT #1: Writing to Solana blockchain
// This operation can take 1-3 seconds and costs money (gas)
let chain_record = state.chain_client.write_hash(&hash, &diploma).await?;
// Prepare data for database
let credential_data = serde_json::json!({
"hash": &diploma.hash,
"issuer_id": &diploma.issuer_id,
"recipient_id": &diploma.recipient_id,
"solana_tx_id": &chain_record.tx_id,
"issued_at": diploma.issued_at.to_rfc3339(),
});
// CRITICAL POINT #2: Writing to PostgreSQL
// And here's where disaster can strike
let db_response = state
.db_client
.from("credentials")
.insert(credential_data.to_string())
.execute()
.await;
// Handle database error
let db_response = match db_response {
Ok(response) => response,
Err(e) => {
tracing::error!("Database request failed: {}", e);
return Err(AppError::Database(format!("Database request failed: {}", e)));
}
};
if !db_response.status().is_success() {
let status = db_response.status();
let error_body = db_response.text().await.unwrap_or_default();
// HERE IT IS - THE CRITICAL INCONSISTENCY!
tracing::error!(
"CRITICAL INCONSISTENCY: Failed to save to Supabase after successful Solana transaction. \
tx_id: {}, hash: {}. Status: {}. Body: {}",
chain_record.tx_id,
hash,
status,
error_body
);
// We've already written to blockchain, can't roll back!
return Err(AppError::Internal(
"Failed to save credential record after blockchain confirmation.".to_string(),
));
}
// If all is well - return success response
Ok(Json(IssueResponse {
hash,
tx_id: chain_record.tx_id,
signature: Some(signature),
issued_at,
}))
}
```
Here's the sequence of operations and the failure point visualized:
```
┌──────────┐ ┌────────────┐ ┌─────────┐ ┌────────────┐
│ Client │────▶│ Rust API │────▶│ Solana │────▶│ SUCCESS │
└──────────┘ └────────────┘ └─────────┘ └────────────┘
│ │
│ ▼
│ ┌────────────┐
└───────────────────────────▶│ PostgreSQL │
└────────────┘
│
▼
┌────────────┐
│ FAIL! │
└────────────┘
│
▼
┌──────────────────────┐
│ DATA DRIFT: │
│ • Blockchain: ✓ │
│ • Database: ✗ │
└──────────────────────┘
```
### The Fallout
What happens after such a failure? Several unpleasant scenarios:
1. Users can't find their diploma through API queries to the database
2. Analytics become impossible — database has incomplete data
3. Audit nightmares — blockchain has the record, reports don't
4. Duplication on retry — users might attempt to issue the diploma again
## Theory Meets Practice: Patterns to the Rescue
### The Consistency Problem: Choosing a Strategy
In distributed systems, there are two main approaches to consistency:
1. Strong Consistency — all nodes see identical data at the same moment. This is expensive and complex, especially when one node is a public blockchain.
2. Eventual Consistency — data may temporarily differ, but will eventually converge to a consistent state.
We chose eventual consistency. Why? Once a Solana transaction is confirmed, it's irreversible. There's no rollback. So we need to guarantee that PostgreSQL will eventually receive this data.
### The Saga Pattern: Long-Running Transactions with Compensation
The Saga pattern breaks a distributed transaction into a sequence of local transactions. Each step can have a compensating transaction for rollback.
Here's how it could look in our case:
```rust
// Pseudocode for diploma issuance saga
enum SagaStep {
SaveToDatabase, // Step 1
WriteToBlockchain, // Step 2
UpdateDatabaseStatus // Step 3
}
async fn issue_diploma_saga(diploma: Diploma) -> Result<(), SagaError> {
// Step 1: Save to DB with "pending" status
let db_record = match save_to_database_with_status(&diploma, "pending").await {
Ok(record) => record,
Err(e) => {
// Nothing to roll back, just exit
return Err(SagaError::DatabaseFailed(e));
}
};
// Step 2: Write to blockchain
let tx_id = match write_to_blockchain(&diploma).await {
Ok(tx) => tx,
Err(e) => {
// Compensating transaction: mark record as failed
mark_database_record_failed(&db_record.id).await?;
return Err(SagaError::BlockchainFailed(e));
}
};
// Step 3: Update DB status to "confirmed"
match update_database_status(&db_record.id, "confirmed", &tx_id).await {
Ok(_) => Ok(()),
Err(e) => {
// Compensation is tricky here: blockchain already has the record
// Can only mark for manual intervention
mark_for_manual_reconciliation(&db_record.id, &tx_id).await?;
Err(SagaError::InconsistentState(e))
}
}
}
```
The problem with Saga in blockchain: Compensating transactions in Solana cost money (gas) and don't actually remove previous entries—they add new ones. This makes the pattern expensive and complex.
### Idempotency and Retries
Idempotency is the property of an operation yielding the same result on repeated calls. In our context, it's critical.
Here's how we could add a retry mechanism:
```rust
use tokio::time::{sleep, Duration};
async fn write_to_database_with_retry(
db_client: &Postgrest,
data: serde_json::Value,
max_retries: u32,
) -> Result<(), AppError> {
let mut retries = 0;
let mut backoff = Duration::from_millis(100);
loop {
match db_client
.from("credentials")
.insert(data.to_string())
.execute()
.await
{
Ok(response) if response.status().is_success() => {
return Ok(());
}
Ok(response) if response.status() == 409 => {
// Conflict - record already exists (idempotency!)
tracing::info!("Record already exists, considering it success");
return Ok(());
}
Ok(_) | Err(_) if retries < max_retries => {
retries += 1;
tracing::warn!(
"Database write failed, retry {}/{} after {:?}",
retries, max_retries, backoff
);
sleep(backoff).await;
backoff *= 2; // Exponential backoff
}
_ => {
return Err(AppError::Database(
"Failed after maximum retries".to_string()
));
}
}
}
}
```
The downside: If the database is down for an extended period (say, scheduled maintenance), the user waits. Meanwhile, the blockchain transaction is already done!
## Our Solution: Outbox Pattern with Background Reconciliation
After analyzing various approaches, we settled on combining two patterns:
### The Transactional Outbox Pattern
The essence of the Outbox pattern: instead of writing directly to two systems, we make one atomic transaction to the primary storage, including an event in an outbox table.
Here's how our architecture changes:
```rust
// New structure for outbox
#[derive(Serialize, Deserialize)]
struct OutboxEvent {
id: Uuid,
event_type: String,
payload: serde_json::Value,
status: String, // "pending", "processing", "completed", "failed"
created_at: DateTime<Utc>,
processed_at: Option<DateTime<Utc>>,
retry_count: u32,
error_message: Option<String>,
}
// Modified issue_diploma function
pub async fn issue_diploma_with_outbox(
State(state): State<Arc<AppState>>,
mut multipart: Multipart,
) -> Result<Json<IssueResponse>, AppError> {
// ... parsing and hash generation as before ...
// IMPORTANT: First write to DB in one transaction
let mut transaction = state.db_client.begin_transaction().await?;
// Save diploma with "pending_blockchain" status
let credential_data = serde_json::json!({
"hash": &diploma.hash,
"issuer_id": &diploma.issuer_id,
"recipient_id": &diploma.recipient_id,
"status": "pending_blockchain",
"issued_at": diploma.issued_at.to_rfc3339(),
});
transaction
.from("credentials")
.insert(credential_data.to_string())
.execute()
.await?;
// Add event to outbox
let outbox_event = serde_json::json!({
"id": Uuid::new_v4(),
"event_type": "WRITE_TO_BLOCKCHAIN",
"payload": serde_json::to_value(&diploma)?,
"status": "pending",
"created_at": Utc::now(),
"retry_count": 0,
});
transaction
.from("outbox_events")
.insert(outbox_event.to_string())
.execute()
.await?;
// Commit transaction - all or nothing!
transaction.commit().await?;
// Return response to user
Ok(Json(IssueResponse {
hash: diploma.hash,
tx_id: "pending".to_string(), // Will be updated asynchronously
signature: Some(signature),
issued_at: diploma.issued_at,
}))
}
```
Now we need a background processor for outbox events:
```rust
// Background worker for outbox processing
async fn outbox_processor(state: Arc<AppState>) {
loop {
// Fetch unprocessed events
let events = fetch_pending_outbox_events(&state.db_client).await;
for event in events {
match event.event_type.as_str() {
"WRITE_TO_BLOCKCHAIN" => {
process_blockchain_write(event, &state).await;
}
_ => {
tracing::warn!("Unknown event type: {}", event.event_type);
}
}
}
// Sleep before next iteration
tokio::time::sleep(Duration::from_secs(5)).await;
}
}
async fn process_blockchain_write(
event: OutboxEvent,
state: &Arc<AppState>
) {
let diploma: Diploma = serde_json::from_value(event.payload.clone())
.expect("Failed to deserialize diploma");
// Attempt blockchain write
match state.chain_client.write_hash(&diploma.hash, &diploma).await {
Ok(chain_record) => {
// Success! Update statuses
let mut transaction = state.db_client.begin_transaction().await.unwrap();
// Update credential
transaction
.from("credentials")
.update(serde_json::json!({
"status": "confirmed",
"solana_tx_id": chain_record.tx_id,
}).to_string())
.eq("hash", &diploma.hash)
.execute()
.await
.unwrap();
// Mark event as processed
transaction
.from("outbox_events")
.update(serde_json::json!({
"status": "completed",
"processed_at": Utc::now(),
}).to_string())
.eq("id", event.id.to_string())
.execute()
.await
.unwrap();
transaction.commit().await.unwrap();
}
Err(e) => {
// Error - increment retry counter
update_outbox_event_retry(&state.db_client, event.id, e.to_string()).await;
}
}
}
```
### Reconciliation Job: The Safety Net
Even with the Outbox pattern, things can go wrong. So we added a background reconciliation process:
```rust
// Periodic data reconciliation between Solana and PostgreSQL
async fn reconciliation_job(state: Arc<AppState>) {
loop {
tracing::info!("Starting reconciliation check...");
// Fetch recent transactions from Solana (last hour)
let cutoff_time = Utc::now() - Duration::from_secs(3600);
let blockchain_records = fetch_recent_blockchain_transactions(
&state.chain_client,
cutoff_time
).await;
for record in blockchain_records {
// Check if record exists in PostgreSQL
let db_result = state
.db_client
.from("credentials")
.select("hash")
.eq("hash", &record.hash)
.single()
.execute()
.await;
if db_result.is_err() || !db_result.unwrap().status().is_success() {
// Record missing in DB - add it
tracing::warn!(
"Found orphaned blockchain record: hash={}, tx_id={}",
record.hash,
record.tx_id
);
// Recover record from blockchain
let recovery_data = serde_json::json!({
"hash": record.hash,
"solana_tx_id": record.tx_id,
"status": "recovered_from_blockchain",
"recovered_at": Utc::now(),
// Other fields from transaction metadata
});
match state
.db_client
.from("credentials")
.insert(recovery_data.to_string())
.execute()
.await
{
Ok(_) => {
tracing::info!("Successfully recovered record: {}", record.hash);
// Alert the team
send_alert(
"Data inconsistency detected and fixed",
&format!("Recovered hash {} from blockchain", record.hash)
).await;
}
Err(e) => {
tracing::error!("Failed to recover record: {}", e);
}
}
}
}
// Run reconciliation every 5 minutes
tokio::time::sleep(Duration::from_secs(300)).await;
}
}
```
Visualizing the new approach:
```
┌──────────┐ ┌────────────┐ ┌─────────────┐
│ Client │────▶│ Rust API │────▶│ PostgreSQL │
└──────────┘ └────────────┘ │ + Outbox │
└─────────────┘
│
▼
┌─────────────┐
│ SUCCESS │
│ (Atomic) │
└─────────────┘
│
┌──────────────────┼──────────────────┐
▼ ▼ ▼
┌─────────────────┐ ┌──────────────┐ ┌──────────────┐
│ Outbox Processor│ │Reconciliation│ │ Monitoring │
│ (Async) │ │ Job │ │ & Alerts │
└─────────────────┘ └──────────────┘ └──────────────┘
│ │
▼ ▼
┌──────────┐ ┌──────────┐
│ Solana │◀─────│ Check │
└──────────┘ └──────────┘
```
## Room for Improvement: Looking Ahead
### Queue with Retries
Instead of a simple outbox in the DB, we could use a proper message queue:
```rust
// Integration with Redis Streams for more reliable delivery
use redis::AsyncCommands;
async fn publish_to_queue(
redis_client: &redis::Client,
diploma: &Diploma,
) -> Result<(), AppError> {
let mut conn = redis_client.get_async_connection().await?;
let event = serde_json::json!({
"type": "WRITE_TO_BLOCKCHAIN",
"payload": diploma,
"timestamp": Utc::now().to_rfc3339(),
"retry_count": 0,
});
// Add to Redis Stream with auto-generated ID
conn.xadd(
"diploma:outbox",
"*",
&[("event", serde_json::to_string(&event)?)],
).await?;
Ok(())
}
// Consumer with group for guaranteed delivery
async fn consume_from_queue(redis_client: &redis::Client, state: Arc<AppState>) {
let mut conn = redis_client.get_async_connection().await.unwrap();
// Create consumer group
let : Result<(), > = conn.xgroup_create_mkstream(
"diploma:outbox",
"blockchain_writers",
"$",
).await;
loop {
// Read events from queue
let events: Vec<StreamReadReply> = conn.xreadgroup(
&["diploma:outbox"],
"blockchain_writers",
"worker_1",
&[">"],
Some(1),
None,
).await.unwrap();
for event in events {
// Process and acknowledge
process_event(event, &state).await;
conn.xack("diploma:outbox", "blockchain_writers", &[event.id]).await.unwrap();
}
}
}
```
### Monitoring and Alerting
Critical to track system state:
```rust
use prometheus::{register_counter_vec, register_histogram_vec, CounterVec, HistogramVec};
lazy_static! {
static ref INCONSISTENCY_COUNTER: CounterVec = register_counter_vec!(
"diploma_inconsistencies_total",
"Total number of data inconsistencies detected",
&["type"]
).unwrap();
static ref RECONCILIATION_DURATION: HistogramVec = register_histogram_vec!(
"reconciliation_duration_seconds",
"Time taken to reconcile records",
&["status"]
).unwrap();
}
// Using metrics in code
async fn monitor_inconsistency(inconsistency_type: &str) {
INCONSISTENCY_COUNTER
.with_label_values(&[inconsistency_type])
.inc();
// Alert if too many inconsistencies
let total = INCONSISTENCY_COUNTER
.with_label_values(&[inconsistency_type])
.get();
if total > 10.0 {
send_critical_alert(
"High inconsistency rate detected",
&format!("Type: {}, Count: {}", inconsistency_type, total)
).await;
}
}
```
### Event Sourcing for Full Traceability
We could go further and store all events as an immutable log:
```rust
#[derive(Serialize, Deserialize)]
enum DiplomaEvent {
Created {
hash: String,
issuer_id: String,
recipient_id: String,
timestamp: DateTime<Utc>,
},
BlockchainWriteRequested {
hash: String,
timestamp: DateTime<Utc>,
},
BlockchainWriteCompleted {
hash: String,
tx_id: String,
timestamp: DateTime<Utc>,
},
BlockchainWriteFailed {
hash: String,
error: String,
retry_count: u32,
timestamp: DateTime<Utc>,
},
ReconciliationDetected {
hash: String,
source: String, // "blockchain" or "database"
timestamp: DateTime<Utc>,
},
}
// This gives us complete history for every diploma
async fn append_event(
db_client: &Postgrest,
event: DiplomaEvent,
) -> Result<(), AppError> {
let event_data = serde_json::json!({
"event_type": event.variant_name(),
"payload": serde_json::to_value(&event)?,
"timestamp": Utc::now(),
});
db_client
.from("diploma_events")
.insert(event_data.to_string())
.execute()
.await?;
Ok(())
}
```
## Conclusion: Lessons from the Trenches
Working with dual-writes between Solana and PostgreSQL taught us several hard lessons:
1. Never trust sequential calls — just because the first one succeeds doesn't guarantee the second will. Especially when the first is an irreversible blockchain operation.
2. Design for failure — it's not a question of if the system will fail, but when. The Outbox pattern and background reconciliation aren't redundancy; they're necessities.
3. Eventual consistency is your friend — don't try to achieve strong consistency between blockchain and traditional databases. It's expensive, complex, and often impossible.
4. Monitoring is critical — better to get an alert about drift within a minute than hear about it from a user a week later.
5. Idempotency saves lives — design operations so they can be safely retried. This simplifies recovery from failures.
For fellow engineers working with Web3 backends in Rust: blockchain isn't a silver bullet. It's a powerful tool, but it requires careful system design. Dual-writes seem simple until you hit your first production failure at 3 AM.
Remember: in distributed systems, everything that can go wrong will go wrong. Design accordingly.
## Useful Links
- [Saga Pattern - Microservices.io](https://microservices.io/patterns/data/saga.html)
- [Transactional Outbox Pattern](https://microservices.io/patterns/data/transactional-outbox.html)
- [Event Sourcing in Rust](https://doc.rust-lang.org/book/)
- [Building on Solana with Rust](https://docs.solana.com/)
- [PostgreSQL and Distributed Systems](https://www.postgresql.org/)
- [Designing Data-Intensive Applications](https://dataintensive.net/) — the bible for distributed systems
If you have experience solving similar problems or questions about implementation, let's discuss in the comments. I'm particularly interested in hearing about alternative approaches to syncing blockchain with traditional databases.