SPDK From First Principles

SPDK deep learning path

Chapter 10: Reactors, `spdk_thread`, Messages, Pollers

By the end of this chapter, a beginner should be able to explain the difference between an OS thread, an SPDK reactor, and an `spdk_thread`; trace a message sent by...

Source: drafts/runtime/10-reactors-threads-messages-pollers.md

Reader Promise

By the end of this chapter, a beginner should be able to explain the difference between an OS thread, an SPDK reactor, and an spdk_thread; trace a message sent by spdk_thread_send_msg(); understand how pollers run; and diagnose wrong-thread assertions, blocked reactors, leaked pollers, and thread-exit hangs.

This is one of the most important chapters in the book. Many SPDK bugs are not algorithm bugs. They are ownership bugs: code runs on the wrong spdk_thread, blocks a reactor, keeps an io_channel too long, or forgets to unregister a poller.

Mental Model

Use this vocabulary precisely:

  • OS thread: the kernel-scheduled execution context, usually pinned to a CPU core by DPDK/SPDK.
  • Reactor: SPDK's per-core event loop object. A reactor owns a list of lightweight SPDK threads and event queues.
  • spdk_thread: a lightweight cooperative context. It has message queues, pollers, io_channels, stats, and a cpumask.
  • Message: a function pointer plus context enqueued to an spdk_thread.
  • Poller: a callback that runs repeatedly on an spdk_thread, either every loop or on a timer.

Prose diagram:

CPU core 3
  OS thread "reactor_3"
    reactor object for lcore 3
      spdk_thread "app_thread"
        message ring
        active pollers
        timed pollers
        io_channels
      spdk_thread "nvmf_tgt_poll_group_3"
        message ring
        active pollers
        io_channels

The reactor runs the OS thread. The reactor polls each spdk_thread. Each spdk_thread drains messages and runs pollers.

flowchart TB os[OS thread pinned to core] --> reactor[SPDK reactor loop] reactor --> t1[spdk_thread: app_thread] reactor --> t2[spdk_thread: nvmf poll group] t1 --> msg1[message queue] t1 --> poll1[active and timed pollers] t1 --> ch1[io_channels] t2 --> msg2[message queue] t2 --> poll2[transport pollers] t2 --> ch2[transport and bdev channels] sender[another spdk_thread] -->|spdk_thread_send_msg| msg1

Source Anchors

  • include/spdk_internal/event.h: struct spdk_reactor, spdk_reactors_init(), spdk_reactors_start(), spdk_reactors_stop()
  • lib/event/reactor.c: spdk_reactors_init(), reactor_construct(), spdk_reactors_start(), reactor_run(), _reactor_run(), reactor_post_process_lw_thread(), spdk_reactors_stop()
  • include/spdk/thread.h: spdk_thread_create(), spdk_thread_poll(), spdk_thread_send_msg(), spdk_for_each_thread(), spdk_poller_register(), spdk_poller_unregister(), spdk_thread_exit()
  • lib/thread/thread.c: struct spdk_thread, spdk_thread_create(), spdk_set_thread(), spdk_get_thread(), spdk_thread_poll(), thread_poll(), msg_queue_run_batch(), spdk_thread_send_msg(), poller_register(), thread_execute_poller(), thread_execute_timed_poller(), spdk_poller_unregister(), spdk_for_each_thread(), spdk_thread_exit(), thread_exit()
  • lib/event/app_rpc.c: rpc_framework_get_reactors(), _rpc_framework_get_reactors()

Reactor Initialization

lib/event/reactor.c:spdk_reactors_init() creates the event framework's reactor state.

It:

  • creates g_spdk_event_mempool
  • allocates the g_reactors array aligned to 64 bytes
  • initializes the thread library with spdk_thread_lib_init_ext()
  • constructs a reactor for each env core
  • records the scheduling reactor
  • sets reactor state to initialized

The thread library call is crucial. Reactors cannot run spdk_thread objects until the thread library exists, because spdk_thread uses message mempools, message rings, poller queues, and io_channel registries.

Reactor Start

lib/event/reactor.c:spdk_reactors_start() sets the reactor state to running, launches a reactor OS thread on every selected core except the current core, and then runs the current core's reactor inline.

That last detail explains why spdk_app_start() blocks: the main OS thread becomes a reactor runner until shutdown.

lib/event/reactor.c:reactor_run() is the long-running loop. It:

  • names the POSIX thread reactor_<lcore>
  • registers trace ownership
  • repeatedly runs either interrupt mode handling or _reactor_run()
  • periodically performs scheduler work if enabled
  • exits when reactor state changes
  • drains and destroys remaining spdk_thread objects

What _reactor_run() Does

lib/event/reactor.c:_reactor_run() is the normal polling loop body.

It:

- gets the spdk_thread - calls spdk_thread_poll(thread, 0, reactor->tsc_last) - updates reactor busy or idle time based on return code - post-processes the lightweight thread

  1. Runs a batch of reactor events.
  2. If the reactor has no SPDK threads, accounts idle time and returns.
  3. For each lightweight thread on the reactor:

The important point: a reactor does not call arbitrary module code directly. It calls spdk_thread_poll(), and the thread runs messages and pollers.

spdk_thread Structure

lib/thread/thread.c:struct spdk_thread contains:

  • active pollers queue
  • timed pollers tree
  • paused pollers queue
  • message ring
  • local message cache
  • critical message slot
  • io_channel tree
  • cpumask
  • state
  • lock count
  • interrupt-mode state
  • trace ID
  • user context

This is why spdk_thread is more than "a callback queue." It is the unit of SPDK ownership for pollers, messages, and per-thread device resources.

Creating An spdk_thread

lib/thread/thread.c:spdk_thread_create():

  • allocates cache-line-aligned memory
  • copies or initializes the cpumask
  • initializes io_channel and poller containers
  • creates a message ring
  • fills a local message cache from g_spdk_msg_mempool if possible
  • assigns a name and trace ID
  • assigns a monotonic thread ID
  • inserts the thread into the global thread list
  • calls the reactor thread-op hook so the event framework can schedule it
  • marks the thread running
  • records the first created thread as the app thread

The event framework created the app thread in lib/event/app.c:spdk_app_start(). Other modules create their own SPDK threads when they need separate lightweight contexts.

Messages

spdk_thread_send_msg(thread, fn, ctx) is the standard cross-thread handoff.

lib/thread/thread.c:spdk_thread_send_msg():

  1. Checks that the target thread is not exited.
  2. Tries to take a message object from the sender's local cache.
  3. Falls back to g_spdk_msg_mempool.
  4. Stores fn and ctx.
  5. Enqueues the message to the target thread's message ring.
  6. Sends a notification if needed.

The function is asynchronous. It does not call fn. It only queues the work.

The message will run when the target thread is polled by its reactor and msg_queue_run_batch() drains messages inside thread_poll().

Beginner rule:

If you need code to run on a different spdk_thread, send a message. Do not call the function directly unless the function explicitly allows it.

Pollers

A poller is a callback registered on the current spdk_thread.

lib/thread/thread.c:poller_register() requires spdk_get_thread() to be non-NULL. It allocates a struct spdk_poller, names it, records the callback and argument, assigns a per-thread poller ID, converts the period from microseconds to ticks, initializes interrupt support if needed, and inserts it into either:

  • active pollers, if period is zero
  • timed pollers, if period is nonzero

Public wrappers:

  • include/spdk/thread.h:spdk_poller_register()
  • include/spdk/thread.h:spdk_poller_register_named()
  • include/spdk/thread.h:SPDK_POLLER_REGISTER()

Poller return values matter:

  • 0 means idle.
  • Positive means busy.
  • Negative is allowed for some debug/reporting paths but does not mean "unregister me."

The reactor and thread stats use idle and busy return values to track work.

How A Poller Runs

Inside lib/thread/thread.c:thread_poll():

  1. A critical message runs first if present.
  2. A batch of regular messages is drained.
  3. Active pollers are executed.
  4. Post-poller handlers run if registered.
  5. Timed pollers whose deadline has passed are executed.

Active pollers are round-robin by queue movement. Timed pollers live in an RB tree keyed by next run time.

thread_execute_poller() and thread_execute_timed_poller() both assert that thread->lock_count == 0 after the callback. This is the source of lock-count asserts when code holds an SPDK spinlock across a point where SPDK expects cooperative progress.

The No-Blocking Rule

A reactor is a cooperative event loop. If a poller blocks, that OS thread stops polling every other spdk_thread assigned to that reactor.

Do not:

  • sleep in a poller
  • perform blocking filesystem I/O in a hot callback
  • wait synchronously for an RPC response from the same framework
  • hold locks across callbacks that may pump SPDK threads
  • busy-loop inside a poller instead of returning and letting the reactor continue

Use:

  • messages for ownership handoff
  • pollers for repeated progress
  • async callbacks for completion
  • NOMEM or retry queues for resource pressure

Wrong-Thread Assertions

SPDK APIs often require that operations happen on the same spdk_thread that owns the object.

lib/thread/thread.c:wrong_thread() logs the function, object name, current thread, and expected thread, then asserts.

Common causes:

  • unregistering a poller from a different thread than the one that registered it
  • putting an io_channel from the wrong thread
  • calling module-specific functions on a callback thread rather than the resource owner thread
  • mixing OS thread identity with spdk_thread identity

Misconception to kill:

"I am on the same CPU core, so I am on the right SPDK thread." Not necessarily. Ownership is spdk_thread, not just core.

Thread Exit

lib/thread/thread.c:spdk_thread_exit() marks a thread as exiting. It does not instantly free the thread.

lib/thread/thread.c:thread_exit() waits until:

  • message ring is empty
  • no spdk_for_each_thread() or spdk_for_each_channel() operations are outstanding
  • active pollers are unregistered
  • timed pollers are unregistered
  • paused pollers are gone
  • io_channels are released
  • pending io_device unregisters are complete

Only then does the state become exited. lib/event/reactor.c:reactor_post_process_lw_thread() sees an exited and idle thread, removes it from the reactor, and destroys it.

If shutdown hangs, inspect the thread for remaining messages, pollers, io_channels, or outstanding foreach operations.

Interrupt Mode

SPDK's classic model is polling. This tree also supports interrupt mode. In reactor code, reactor_run() chooses reactor_interrupt_run() when reactor->in_interrupt is true. In thread code, spdk_thread_poll() waits on the thread fd group when the thread is in interrupt mode.

For beginners, the important distinction:

  • Poll mode repeatedly calls pollers for low latency and high CPU use.
  • Interrupt mode waits on file descriptors where supported, reducing CPU but adding complexity.

Do not assume every poller or device path has the same interrupt-mode behavior.

Edge Cases And Failure Modes

  • Message mempool exhaustion: spdk_thread_send_msg() aborts if it cannot allocate a message.
  • Message ring enqueue failure: aborts.
  • Target thread exited: sending a message aborts.
  • Poller registered outside any spdk_thread: assert path.
  • Poller unregistered from the wrong thread: wrong-thread assert.
  • Poller callback blocks: reactor stalls.
  • Poller callback returns busy forever: stats show busy even if no useful work happens.
  • Thread exit with active pollers: exit waits and logs.
  • Thread exit with io_channels: exit waits and logs.
  • Reactor shutdown with non-app running threads: logs that spdk_thread_exit() was not called.

Misconceptions To Kill

  • "spdk_thread is a pthread." It is not. It is a lightweight SPDK context run by a reactor.
  • "Messages run immediately." They run later when the target thread polls.
  • "Pollers are background threads." They are callbacks on an spdk_thread.
  • "A timed poller runs exactly at its period." It runs when the thread is polled and its deadline has passed.
  • "Blocking only hurts my poller." Blocking hurts the whole reactor OS thread.
  • "Returning -1 from a poller unregisters it." Unregistration is explicit.

Diskengine Relevance

Diskengine integrations tend to cross boundaries: an external controller sends RPCs, SPDK translates them into bdev or transport work, and completions come back asynchronously. Bugs appear when a control path assumes synchronous behavior.

When reading diskengine-facing SPDK code, always annotate:

  • callback owner thread
  • resource owner thread
  • whether a function sends a message
  • whether a function registers a poller
  • where completion is delivered

That habit prevents most wrong-thread misunderstandings.

Prose Diagram: Message Delivery

Imagine a message as a sealed envelope:

  1. Sender writes function pointer and context into the envelope.
  2. Sender drops it into the target thread's mailbox.
  3. Reactor eventually visits that target thread.
  4. spdk_thread_poll() opens a batch of envelopes.
  5. Each function runs on the target thread.

The sender does not wait by the mailbox.

Source Reading Exercise

Read the loop from reactor to poller:

  1. lib/event/reactor.c:spdk_reactors_start()
  2. lib/event/reactor.c:reactor_run()
  3. lib/event/reactor.c:_reactor_run()
  4. lib/thread/thread.c:spdk_thread_poll()
  5. lib/thread/thread.c:thread_poll()
  6. lib/thread/thread.c:thread_execute_poller()
  7. lib/thread/thread.c:thread_execute_timed_poller()

Then read the message path:

  1. lib/thread/thread.c:spdk_thread_send_msg()
  2. lib/thread/thread.c:msg_queue_run_batch()
  3. lib/thread/thread.c:thread_poll()

Questions:

  • Where does TLS spdk_thread get set?
  • What happens before active pollers run?
  • How does SPDK decide busy vs idle?
  • What causes a thread to be destroyed?

Operational Lab

Use RPC and logs:

  1. Start an SPDK target with a small reactor mask.
  2. Call framework_get_reactors.
  3. Identify reactors, their threads, busy ticks, idle ticks, and interrupt state.
  4. Add or enable a component that registers a poller.
  5. Call framework_get_reactors again and observe thread/poller changes.

Source-only variation:

  • Pick one module that calls spdk_thread_send_msg() and trace why it needs to cross ownership boundaries.

Self-Check

  1. What is the difference between an OS thread, a reactor, and an spdk_thread?
  2. Why does spdk_thread_send_msg() not call the function directly?
  3. Where are active pollers stored?
  4. Where are timed pollers stored?
  5. Why must pollers avoid blocking?
  6. What conditions must be satisfied before an spdk_thread exits?
  7. Why can being on the same CPU core still be the wrong SPDK thread?

References

  • Local source: include/spdk_internal/event.h
  • Local source: lib/event/reactor.c
  • Local source: include/spdk/thread.h
  • Local source: lib/thread/thread.c
  • Local source: lib/event/app_rpc.c