we are working on refining the plan @notes/usage-billing-design.md ,

usage-plan-review-and-durable-execution-idempotency

 ▐▛███▜▌   Claude Code v2.1.63
▝▜█████▛▘  Opus 4.6 · Claude Max
  ▘▘ ▝▝    ~/Code/linkmind

❯ we are working on refining the plan @notes/usage-billing-design.md ,      
  here's a review given by codex @notes/usage-billing-review.md , what do   
  you think of each suggesion, and do you have any other ones?              
  ⎿  Read notes/usage-billing-design.md (658 lines)
  ⎿  Read notes/usage-billing-review.md (112 lines)

⏺ Let me also quickly check the current codebase to ground my opinions in the
  actual scale and architecture.

⏺ Explore(Explore pipeline and DB patterns) Sonnet 4.6
  ⎿  Done (14 tool uses · 66.5k tokens · 1m 1s)
  (ctrl+o to expand)

⏺ Good context. Here's my take on each review suggestion, calibrated to the
  actual system — invite-only, ~10 users, concurrency 2, Absurd durable 
  execution:

  ---
  Critical

  1. TOCTOU race between check and spend

  Agree it's real, but severity is low. With concurrency 2 globally and 2
  records/user/minute drip-feed, the race window is tiny. A user would need to
  spam links simultaneously to overshoot by one pipeline's worth (~$0.002).

  My recommendation: Skip the SELECT FOR UPDATE / reservation mechanism for now.
   The simple pre-check is fine. Just document it as a known soft-cap behavior.
  If you want a minimal safeguard, re-check inside recordTransaction and log a
  warning if over-limit (but still allow the in-flight pipeline to finish —
  killing it mid-way wastes the already-spent money).

  2. No idempotency for recordTransaction under retries

  Agree, and Absurd makes this real. Absurd replays from the last completed step
   on crash. If the process dies after the DB insert but before Absurd
  checkpoints the step, the transaction gets double-recorded on replay.

  My recommendation: Worth implementing, but keep it simple. Use record_id + 
  label (e.g. record_id=42, label='summary') as a natural idempotency key with a
   unique index. No need for a separate UUID-based scheme. Each pipeline step
  for a record only runs once logically, so this composite key is sufficient.

  3. Concurrent lazy reset corruption

  Disagree on severity. With concurrency 2, two workers resetting the same
  user's cycle simultaneously is extremely unlikely (requires two tasks for the
  same user crossing a month boundary at the exact same second). The
  UNIQUE(user_id, cycle_start) constraint will catch it — just handle the
  conflict with ON CONFLICT DO NOTHING and re-read. No need for FOR UPDATE
  locking.

  4. resetUserCycle races with in-flight usage

  Disagree on relevance. Payment reset is a manual/admin action, not happening
  during pipeline execution. The admin can just... not reset while the user has
  active pipelines. Or if they do, a few cents attributed to the wrong cycle is
  not a billing disaster at this scale.

  Skip for now. Not worth the complexity.

  5. No guarantee balance matches ledger

  Agree this is important. This is the one that actually matters for
  correctness. Use a DB transaction: insert into usage_transactions and UPDATE 
  user_balances SET current_cycle_usage_usd = current_cycle_usage_usd + $amount
  in one transaction. This is simple, Kysely supports db.transaction(), and it
  eliminates drift.

  My recommendation: This is the one critical item to implement properly. Single
   transaction for record + increment. Add a CLI reconcile-usage command that
  compares SUM(usage_transactions) vs current_cycle_usage_usd — useful for
  debugging, cheap to build.

  6. Time boundary semantics

  Partially agree. Timezone matters, but overcomplicating it doesn't. The system
   has ~10 users, likely all in the same timezone (CN).

  My recommendation: Use TIMESTAMPTZ for created_at on transactions (already the
   case). For cycle boundaries, keep DATE but compute them in a fixed timezone
  (e.g. Asia/Shanghai, configurable via env var BILLING_TIMEZONE). Don't store
  timezone per-user — overkill.

  ---
  Important

  1. Hard cap only at entry, not during pipeline

  Agree with the soft-cap policy. The pipeline costs ~$0.001-0.003 per run.
  Aborting mid-pipeline wastes the already-spent portion and leaves records in
  broken states. Let the current job finish, block new ones. This is already
  what the design does — just document it explicitly as "soft cap."

  2. Pricing model/versioning

  Disagree for now. Prices change rarely, and you can always recalculate from
  usage_transactions.metadata (which already stores model, tokens, etc.).
  Snapshotting unit_price + fx_rate into every transaction row adds clutter for
  a problem that doesn't exist yet.

  Maybe later: If/when prices change, add a price_version field then. The
  metadata already has enough info to retroactively recompute.

  3. Floating-point risk

  Agree, but the practical risk is negligible. Amounts are tiny ($0.000019), and
   NUMERIC(10,6) in Postgres is exact. The JS-side accumulation happens via
  current_cycle_usage_usd + amount in SQL (Postgres NUMERIC arithmetic), not in
  JS. The only JS float math is in calculateLLMCost() etc., which produces
  values stored as NUMERIC immediately.

  My recommendation: Use Number() carefully when reading back from DB (Kysely
  returns strings for NUMERIC). No need for a decimal library. The design is
  already fine here.

  4. Schema constraints too loose

  Agree, easy win. Add CHECK constraints in the migration:

  CHECK (type IN ('llm', 'crawler', 'embedding'))
  CHECK (amount_usd >= 0)
  CHECK (monthly_limit_usd >= 0)

  Skip provider whitelist (new providers get added; free text is fine with
  app-level validation).

  5. Cycle archival semantics

  Agree. Define cycle_end as the exclusive upper bound (= next cycle_start).
  Document once, done. Rename column to cycle_end without "exclusive" suffix —
  just note in the design doc that intervals are [start, end).

  6. Missing behavior for absent usage data

  Good point. Some LLM APIs may not return token counts (streaming, errors,
  timeouts).

  My recommendation: If usage is missing from the API response, log a warning
  and record the transaction with amount_usd = 0 and metadata.estimate = 
  'missing_usage'. Don't block the pipeline. This is better than silently
  skipping the charge.

  7. User creation path assumptions

  Agree. getOrCreateBalance() already handles this — it should be called in
  checkUsageLimit(), making it lazy-initialized like the cycle reset. No need
  for a DB trigger. Just make sure getOrCreateBalance is the only path, and
  checkUsageLimit calls it internally.

  8. Testing: concurrency and failure cases

  Disagree for now. Given concurrency 2 and the low stakes ($1 limit), these
  edge cases are theoretical. The lifecycle test + date boundary tests cover the
   real-world scenarios. Add concurrency tests if/when you scale up.

  ---
  Minor

  1. Rename monthly_limit_usd → cycle_limit_usd

  Agree. Easy rename, more accurate. Do it now before any code is written.

  2. Index improvements

  Skip. With ~10 users and maybe 1000 transactions/month, indexing is
  irrelevant.

  3. Metadata shape validation

  Skip. JSONB is fine for this scale. App-level TypeScript types provide enough
  structure.

  4. File paths

  Agree. Use actual paths: server/src/db/migrations/, server/src/usage.ts, etc.

  ---
  My Additional Suggestions

  1. Don't create usage_cycles table yet. It's premature optimization. You can
  always SELECT SUM(...) FROM usage_transactions WHERE created_at BETWEEN $start
   AND $end for historical data. With <1000 rows/month, this is instant. Add the
   archival table when you actually need it (e.g. for a dashboard). This
  simplifies the cycle reset logic significantly — no archival step, just update
   current_cycle_start and zero the counter.
  2. recordTransaction should calculate cost internally. The design has the
  pipeline passing amountUsd to recordTransaction, meaning cost calculation is
  scattered across pipeline steps. Instead, recordTransaction should accept raw
  usage data (tokens, provider, model) and call calculateLLMCost() /
  calculateCrawlerCost() internally. Single responsibility, harder to get wrong.
  3. Consider combining getOrCreateBalance + checkUsageLimit + lazy reset into 
  one function. The design has these as separate concerns, but they always run
  together. One function checkAndGetBudget(userId) that: creates balance if
  missing → resets cycle if needed → returns { allowed, used, limit, cycleStart,
   cycleEnd }. Simpler call sites.
  4. The record_id on usage_transactions should be nullable (already is in the
  design with REFERENCES records(id)). Good — note processing won't always have
  a record_id. But also consider: some charges might not map to a single record
  (e.g. future batch operations). Keep it nullable, don't make it a hard FK
  constraint initially.
  5. Add cycle_limit_usd to the checkUsageLimit return value so the bot message
  can show the actual limit. The design already returns limitUsd — just
  confirming this is correct.

  ---
  Summary: What to Actually Implement

  For this scale, I'd accept: Critical #2 (idempotency), #5 (atomic
  transaction), #6 (timezone); Important #1 (soft cap policy, document only), #4
   (CHECK constraints), #5 (cycle_end semantics), #6 (missing usage fallback);
  Minor #1 and #4.

  I'd skip/defer: Critical #1 (FOR UPDATE locking), #3 (concurrent reset
  locking), #4 (payment race); Important #2 (price versioning), #3 (decimal
  lib), #7 (DB trigger), #8 (concurrency tests); Minor #2, #3.

  Want me to update the design doc with these decisions?

✻ Cogitated for 2m 22s

❯ yes, agree with all the accept and skip, and all the additional           
  suggestions, please update the design doc with these

⏺ Let me re-read the current file one more time to make sure I have the exact
  content, then rewrite it.

⏺ Read 1 file (ctrl+o to expand)

⏺ Now I'll rewrite the doc with all the agreed changes applied.

⏺ Write(notes/usage-billing-design.md)
  ⎿  Added 238 lines, removed 146 lines

⏺ Done. Here's a summary of all changes applied:

  Structural changes:
  - Removed usage_cycles table entirely — historical data stays in
  usage_transactions, queryable by date range
  - Renamed monthly_limit_usd → cycle_limit_usd throughout
  - Renamed checkUsageLimit → checkAndGetBudget (combines getOrCreateBalance +
  lazy reset + limit check)
  - Added new 设计决策 section at the top documenting the rationale

  Schema changes:
  - Added step TEXT column to usage_transactions for idempotency
  - Added CHECK constraints on type, amount_usd >= 0, cycle_limit_usd >= 0
  - Added partial unique index (record_id, step) for Absurd replay safety
  - Added detailed explanation of the idempotency mechanism

  Business logic changes:
  - recordTransaction now accepts raw usage data (tokens, provider, model) and
  calculates cost internally via discriminated union types
  - Atomic DB transaction: insert + balance increment in one transaction, with
  ON CONFLICT DO NOTHING for idempotency
  - Documented soft cap policy explicitly
  - Added BILLING_TIMEZONE env var (default Asia/Shanghai)
  - Added missing usage handling (pass 0 tokens, log warning)
  - Simplified cycle reset (just zero balance + update cycle_start, no archival)
  - Added reconcile-usage CLI command to management section

  Test changes:
  - Updated recordTransaction calls to use new signature (type-specific params,
  no amountUsd)
  - Assertions now use calculateXCost() helpers for expected values
  - Removed all usage_cycles / getUsageCycles assertions
  - Added Phase 5: idempotency test (duplicate call doesn't double-charge)
  - Added zero-tokens cost calculation test
  - Fixed file paths to actual repo layout (server/src/...)

✻ Worked for 7m 51s

❯ there's one thing i still have question, about your thoughts on this:     
  >  2. No idempotency for recordTransaction under retries                  
  >                                                                         
  >  Agree, and Absurd makes this real. Absurd replays from the last        
  completed                                                                 
  >  step on crash. If the process dies after the DB insert but before      
  Absurd                                                                    
  >  checkpoints the step, the transaction gets double-recorded on replay.  
  >                                                                         
  >  My recommendation: Worth implementing, but keep it simple. Use         
  record_id +                                                               
  >   label (e.g. record_id=42, label='summary') as a natural idempotency   
  key                                                                       
  >  with a unique index. No need for a separate UUID-based scheme. Each    
  >  pipeline step for a record only runs once logically, so this composite 
   key                                                                      
  >   is sufficient.                                                        
                                                                            
  if recordTransaction is the last thing to do in the relevant step, and    
  it's within the same db transaction to update related data, then i think  
  it should be safe to not double-record when step retries, is it?

⏺ Good question. Almost, but not quite — there's still a tiny window.

  The sequence is:

  1. Absurd starts step callback
  2. ... do work (LLM call, etc.) ...
  3. recordTransaction commits DB transaction (insert + balance increment)
  4. Step callback returns
  5. Absurd checkpoints the step result (separate DB write)

  If the process crashes between 3 and 5 — your DB transaction committed, but
  Absurd didn't checkpoint. On replay, Absurd re-runs the entire step from
  scratch, and recordTransaction inserts a duplicate row + increments the
  balance again.

  The DB transaction guarantees atomicity of your writes, but it can't cover
  Absurd's checkpoint because that's a separate operation outside your
  transaction.

  That said — for this system, the risk is practically zero. That window is
  maybe a few milliseconds, and it requires a process crash at exactly that
  moment. You could totally skip the idempotency index and be fine.

  I'd still keep it because it's one line of DDL (CREATE UNIQUE INDEX ...) and
  one ON CONFLICT DO NOTHING clause — near-zero implementation cost for
  correctness insurance. But if you'd rather not, it's a defensible choice at
  this scale.

  The one way to make it truly safe without the index would be if Absurd
  supported checkpointing within your DB transaction (same Postgres, same
  transaction), but that would require Absurd SDK internals we don't control.

✻ Cogitated for 49s

❯ if we use other durable execution tool like inngest, which uses separate  
  db and service to handle its checkpoint, will this not be a problem?

⏺ It's the same problem regardless of the tool. The core issue is the two-system
   commit problem:

  1. Your Postgres commits the usage transaction
  2. The durable execution engine (Absurd, Inngest, Temporal, whatever) records
  its checkpoint

  These are two separate writes to two separate systems. No matter how you
  arrange them, there's a window where one succeeds and the other doesn't. If
  the checkpoint fails after your DB committed, the step replays and you
  double-charge.

  Ironically, Absurd using the same Postgres is actually the closest you can get
   to solving this — theoretically you could put the checkpoint in the same DB
  transaction. With Inngest (separate service), that's impossible; you'd need
  distributed two-phase commit, which nobody does for this.

  This is a fundamental property of durable execution: the "exactly-once"
  guarantee only applies to the engine's own state. Side effects (your DB
  writes, API calls) are always "at-least-once" unless you add your own
  idempotency.

  So yeah — the unique index is the standard solution across all durable
  execution frameworks. Cheap and correct.

✻ Cogitated for 42s

❯ you mean, in other words, all db operation in durable execution workflow  
  steps must be ideomoptent?

⏺ Yes, exactly. All side effects in durable execution steps must be idempotent —
   DB writes, API calls, sending emails, everything. The engine guarantees your
  step function runs at least once, not exactly once. Making it effectively
  exactly-once is your responsibility via idempotency keys, upserts, ON CONFLICT
   DO NOTHING, etc.

  This is true for Absurd, Inngest, Temporal, Durable Objects — all of them.

❯ can you provide me some articles or documents from the internet to        
  support this argument?

⏺ firecrawl - firecrawl_search (MCP)(query: "durable execution idempotency side
                                    effects at-least-once exactly-once temporal
                                    inngest", limit: 5, sources:
                                    [{"type":"web"}])
  ⎿  {
       "web": [
         {
     … +23 lines (ctrl+o to expand)

⏺ firecrawl - firecrawl_search (MCP)(query: "temporal.io idempotent activity
                                    side effects \"at least once\"", limit: 5,
                                    sources: [{"type":"web"}])
  ⎿  {
       "web": [
         {
     … +25 lines (ctrl+o to expand)

⏺ firecrawl - firecrawl_search (MCP)(query: "inngest idempotency steps side
                                    effects documentation", limit: 5, sources:
                                    [{"type":"web"}])
  ⎿  {
       "web": [
         {
     … +22 lines (ctrl+o to expand)

⏺ Let me grab the key content from the most authoritative sources.

⏺ firecrawl - firecrawl_scrape (MCP)(url: "https://temporal.io/blog/idempotency-
                                    and-durable-execution", formats:
                                    ["markdown"], onlyMainContent: true)
  ⎿  {
       "markdown": "[Series D at $5B Valuation \\| AI Drives Demand for Dur
     able Execution](https://temporal.io/blog/temporal-raises-usd300m-serie
     … +389 lines (ctrl+o to expand)

⏺ firecrawl - firecrawl_scrape (MCP)(url: "https://www.inngest.com/docs/guides/h
                                    andling-idempotency", formats: ["markdown"],
                                     onlyMainContent: true)
  ⎿  {
       "markdown": "Search... `Ctrl+K`\n\n[Contact sales](https://www.innge
     st.com/contact?ref=docs-header) [Sign Up](https://app.inngest.com/sign
     … +161 lines (ctrl+o to expand)

⏺ firecrawl - firecrawl_scrape (MCP)(url: "https://www.inngest.com/blog/principl
                                    es-of-durable-execution", formats:
                                    ["markdown"], onlyMainContent: true)
  ⎿  {
       "markdown": "![Featured image for The Principles of Durable Executio
     n Explained blog post](https://www.inngest.com/_next/image?url=%2Fasse
     … +228 lines (ctrl+o to expand)

⏺ Here are the key sources:

  1. Temporal — Official blog on idempotency

  Understanding idempotency in distributed systems (Temporal, Feb 2024)

  Key quotes:

  Temporal recommends Activities be idempotent, or at least if they aren't, that
   executing an Activity more than once does not cause any unexpected or 
  undesired side effects.

  Temporal is able to guarantee "at least once" execution. [...] Activity 
  execution is not atomic due to factors such as failures, timeouts, environment
   failure, or other conditions that lead to partial success.

  The article also shows the exact SQL pattern we're using — an operations table
   with a uniqueness constraint + INSERT ... ON CONFLICT DO NOTHING, then
  conditionally performing the actual write only if the insert succeeded.

  2. Inngest — Official docs on handling idempotency

  Handling idempotency

  Inngest provides idempotency mechanisms at the event level and function level,
   precisely because steps can be retried. Their error handling docs state
  explicitly:

  Re-running a step upon error requires its code to be idempotent, which means 
  that running the same code multiple times won't have any side effect.

  3. Inngest — Principles of Durable Execution

  The Principles of Durable Execution Explained

  Describes the three core principles: incremental execution, state persistence,
   fault tolerance. On fault tolerance:

  A given step of a function can fail and the code can be retried and resumed 
  from the point of failure. All previous steps are skipped and the results are 
  memoized using the persisted state.

  This confirms: completed steps are skipped via memoization, but the currently 
  executing step runs again in full — including any DB writes it already made.
  That's the window where idempotency matters.

  ---
  The pattern is universal: durable execution = at-least-once step execution =
  you need idempotent side effects. The (record_id, step) unique index is the
  standard solution.

✻ Worked for 48s

❯ /kb write note for the valuable knowledges in this session

⏺ Bash(mkdir -p /Users/reorx/Code/linkmind/kb/notes)
  ⎿  Done

⏺ Bash(date +%Y-%m-%d)
  ⎿  2026-03-04

⏺ Write(kb/notes/2026-03-04-durable-execution-idempotency.md)
  ⎿  Wrote 66 lines to kb/notes/2026-03-04-durable-execution-idempotency.md
       1 ---
       2 created: 2026-03-04
       3 tags:
       4   - durable-execution
       5   - idempotency
       6   - architecture
       7   - billing
       8 ---
       9 
      10 # Durable Execution and Idempotency of Side Effects
     … +56 lines (ctrl+o to expand)

⏺ Written to kb/notes/2026-03-04-durable-execution-idempotency.md. Covers the
  core insight (at-least-once = idempotent side effects required), the
  two-system commit problem, the standard solution pattern, and the three
  authoritative sources from Temporal and Inngest.

✻ Cogitated for 34s