SPDK From First Principles

SPDK deep learning path

Chapter 11: `io_device` And `io_channel`

By the end of this chapter, a beginner should be able to explain why SPDK has `io_device` and `io_channel`, how channels provide per-thread resources, why...

Source: drafts/runtime/11-io-device-and-io-channel.md

Reader Promise

By the end of this chapter, a beginner should be able to explain why SPDK has io_device and io_channel, how channels provide per-thread resources, why spdk_get_io_channel() and spdk_put_io_channel() must happen on the owning thread, and how spdk_for_each_channel() safely visits every thread-local channel for a device.

This chapter is the bridge between the execution model and the bdev chapters. bdevs, NVMe-oF poll groups, accelerators, iobuf, and many other SPDK components use io_channels to avoid shared locks in hot I/O paths.

Mental Model

An io_device is the shared identity of something that can have per-thread I/O state.

An io_channel is the per-spdk_thread state for that device.

Prose diagram:

io_device: NVMe bdev controller or NVMf target or iobuf singleton
  shared data:
    name
    create_channel callback
    destroy_channel callback
    registered/unregistered state

spdk_thread A
  io_channel for this device
    private context for A

spdk_thread B
  io_channel for this device
    private context for B

The point is not object orientation. The point is hot-path locality. Each SPDK thread gets its own channel context so it can submit I/O without taking a global lock for every operation.

Source Anchors

  • include/spdk/thread.h: spdk_io_device_register(), spdk_io_device_unregister(), spdk_get_io_channel(), spdk_put_io_channel(), spdk_io_channel_get_ctx(), spdk_io_channel_from_ctx(), spdk_io_channel_get_thread(), spdk_for_each_channel(), spdk_for_each_channel_continue()
  • lib/thread/thread.c: struct io_device, struct spdk_io_channel, spdk_io_device_register(), spdk_io_device_unregister(), spdk_get_io_channel(), spdk_put_io_channel(), put_io_channel(), thread_get_io_channel(), spdk_io_channel_ref(), spdk_io_channel_get_ctx(), spdk_for_each_channel(), _call_channel(), spdk_for_each_channel_continue(), _call_completion(), __pending_unregister()
  • lib/thread/iobuf.c: spdk_iobuf_initialize(), spdk_iobuf_channel_init(), spdk_iobuf_channel_fini(), spdk_iobuf_get_stats()
  • lib/nvmf/nvmf.c: nvmf_tgt_create_poll_group(), nvmf_tgt_destroy_poll_group(), spdk_io_device_register() use for the NVMf target
  • lib/nvmf/transport.c: spdk_for_each_channel() use for transport listener and poll group operations

Why Channels Exist

Imagine an NVMe controller with one shared object and many reactor threads. Each reactor needs its own queue pair or poll-group state. If every I/O used one global queue protected by a mutex, SPDK would lose much of its value.

Instead:

  • The controller or module registers an io_device.
  • Each spdk_thread asks for a channel when it needs to do I/O.
  • The create callback allocates per-thread resources.
  • The hot path uses the channel.
  • The destroy callback releases per-thread resources.

This keeps shared state small and moves hot I/O state into thread-local ownership.

Registering An io_device

lib/thread/thread.c:spdk_io_device_register() registers a device pointer plus callbacks:

  • io_device: caller-owned identity pointer
  • create_cb: called when a thread creates its first channel for the device
  • destroy_cb: called when a thread releases its last channel for the device
  • ctx_size: bytes of per-channel context to allocate after the spdk_io_channel header
  • name: debug name

It requires a current spdk_thread. Calling it from a non-SPDK thread logs an error and asserts.

The implementation allocates an internal struct io_device, initializes its thread tree and refcount, and inserts it into the global io_device tree under g_devlist_mutex.

Beginner rule:

The io_device pointer is a key. It must remain valid until unregistration and all channels complete destruction.

Getting A Channel

lib/thread/thread.c:spdk_get_io_channel():

  1. Finds the registered io_device.
  2. Gets the current spdk_thread.
  3. Rejects exited threads.
  4. Checks whether this thread already has a channel for the device.
  5. If yes, increments the channel refcount and returns it.
  6. If no, allocates struct spdk_io_channel + ctx_size.
  7. Inserts the channel into the thread's io_channel tree.
  8. Increments the device refcount.
  9. Adds the thread to the device's thread tree.
  10. Calls the device create callback.
  11. On create failure, unwinds the insertion and refcount.

The channel context is accessed with spdk_io_channel_get_ctx(ch).

Subtle point:

The same thread can call spdk_get_io_channel() multiple times for the same device and receive the same channel with a higher refcount. That is why every successful get needs a matching put.

Putting A Channel

lib/thread/thread.c:spdk_put_io_channel():

  • verifies there is a current SPDK thread
  • verifies the channel belongs to this thread
  • decrements the channel refcount
  • if the refcount reaches zero, increments destroy_ref and sends a message to the same thread to run put_io_channel()

Why deferred destruction?

Because code may call spdk_put_io_channel() while still unwinding a stack that used the channel. Deferring actual destruction to a later message makes the lifetime safer and lets any immediate re-get on the same thread race cleanly against destruction.

lib/thread/thread.c:put_io_channel() performs actual removal:

  • asserts it runs on the channel's owning thread
  • decrements destroy_ref
  • returns early if new references appeared
  • removes the channel from the thread tree
  • removes the thread link from the device
  • calls the destroy callback without holding the global device mutex
  • decrements device refcount
  • frees the device if it was unregistered and no channels remain
  • frees the channel

Wrong-Thread Rules

The docs in include/spdk/thread.h say spdk_put_io_channel() must be called on the same thread that called spdk_get_io_channel().

The implementation enforces this through wrong_thread() in lib/thread/thread.c.

This rule surprises beginners because the channel pointer looks like an ordinary C pointer. It is not ordinary ownership. It is a thread-local capability.

Common wrong-thread bug:

Thread A gets channel
Thread A submits async operation
Completion runs on Thread B
Completion calls spdk_put_io_channel(channel_from_A)
wrong-thread assert

The fix is usually to send a message back to Thread A or design the operation so completion ownership is clear.

Unregistering An io_device

lib/thread/thread.c:spdk_io_device_unregister():

  1. Finds the device.
  2. Records the unregister callback and unregistering thread.
  3. If for_each_count > 0, marks pending unregister and returns.
  4. Marks the device unregistered.
  5. Removes it from the global tree so new lookups fail.
  6. If there are references, defers deletion.
  7. If no references remain, frees it or schedules unregister callback completion.

Important distinction:

Unregistering prevents new channels, but existing channels can keep the internal device alive until their refcounts drop.

Iterating Channels With spdk_for_each_channel()

Some operations must touch every per-thread channel for a device. Examples:

  • pause a transport on every poll group
  • remove a bdev from every channel
  • collect iobuf stats
  • disconnect qpairs across all NVMf poll groups

lib/thread/thread.c:spdk_for_each_channel() creates an iterator and sends _call_channel to the first thread that has a channel for the device.

_call_channel() checks whether the channel still exists on that thread. If it does, it calls the user callback. If not, it continues.

The callback must eventually call spdk_for_each_channel_continue(i, status).

spdk_for_each_channel_continue() moves to the next thread or sends completion back to the original thread.

Beginner rule:

If you use spdk_for_each_channel(), your per-channel callback owns progress. Forgetting spdk_for_each_channel_continue() hangs the whole iteration and can block unregister.

Pending Unregister Races

io_device unregister and channel iteration interact carefully.

If unregister happens while spdk_for_each_channel() is active, unregister sets pending_unregister and returns. When the last iteration completes, spdk_for_each_channel_continue() sends __pending_unregister to the unregistering thread.

This prevents the device from disappearing while a multi-thread channel walk is in progress.

Edge case:

If a second unregister is attempted while one is pending and foreach work remains, the implementation treats it as an error.

Example: iobuf Uses io_device Internally

lib/thread/iobuf.c:spdk_iobuf_initialize() registers a singleton io_device using &g_iobuf as the device pointer. spdk_iobuf_channel_init() gets an io_channel for &g_iobuf and stores it as the iobuf channel's parent. spdk_iobuf_channel_fini() puts that parent channel.

This gives iobuf per-thread cache state while the global iobuf module owns shared backing pools.

Example: NVMf Target Poll Groups

lib/nvmf/nvmf.c registers the NVMf target as an io_device. Its create callback creates poll-group state for a thread. Transport operations then use spdk_for_each_channel() to add, remove, pause, resume, or inspect poll groups across threads.

This pattern recurs throughout SPDK:

module-global object
  registered as io_device
per-thread channel
  contains poll group, qpair, queue, or cache state
foreach channel
  performs coordinated cross-thread operation

Edge Cases And Failure Modes

  • Register from non-SPDK thread: assert.
  • Register same device pointer twice: error.
  • Get unregistered or unknown device: returns NULL.
  • Get from no current spdk_thread: returns NULL.
  • Get from exited thread: returns NULL.
  • Create callback fails: get unwinds and returns NULL.
  • Put from wrong thread: wrong-thread assert.
  • Put too many times: refcount underflow risk; treat matching get/put as strict.
  • Destroy callback blocks: stalls the owning thread.
  • Forget spdk_for_each_channel_continue(): foreach never completes.
  • Unregister during foreach: deletion is deferred.
  • Thread exit with live channels: thread_exit() waits and logs.

Misconceptions To Kill

  • "io_channel is a hardware channel." Not always. It is SPDK per-thread context for any registered device.
  • "The channel context is shared by all threads." No. Each thread gets its own context.
  • "I can put a channel anywhere because I have the pointer." No. The owner thread must put it.
  • "Unregister immediately calls destroy on every channel." Existing channels live until put.
  • "spdk_for_each_channel() is synchronous." No. It is a message-driven asynchronous walk.
  • "The foreach callback can just return when done." It must call spdk_for_each_channel_continue().

Diskengine Relevance

Diskengine-triggered operations often look global: delete a volume, pause exports, remove a namespace, disconnect clients. Inside SPDK, global operations frequently become spdk_for_each_channel() walks over per-thread state.

When debugging a stuck disk deletion or target teardown:

  • identify the io_device
  • list which threads have channels
  • find whether a foreach is outstanding
  • confirm each per-channel callback calls continue
  • check whether unregister is pending behind foreach
  • check whether any user still holds a channel ref

Prose Diagram: Channel Lifetime

Think of an io_channel as a library book checked out by one SPDK thread:

  1. The device registers the book title.
  2. Thread A checks out its copy with spdk_get_io_channel().
  3. Thread A can check out the same copy again; the checkout count increases.
  4. Thread A returns each checkout with spdk_put_io_channel().
  5. When the count reaches zero, SPDK schedules final return processing.
  6. The destroy callback runs on Thread A.
  7. If the library title was removed and no copies remain, the device unregister completes.

No other thread may return Thread A's copy.

Source Reading Exercise

Read channel creation and destruction:

  1. lib/thread/thread.c:spdk_io_device_register()
  2. lib/thread/thread.c:spdk_get_io_channel()
  3. lib/thread/thread.c:spdk_put_io_channel()
  4. lib/thread/thread.c:put_io_channel()
  5. lib/thread/thread.c:spdk_io_device_unregister()

Then read foreach:

  1. lib/thread/thread.c:spdk_for_each_channel()
  2. lib/thread/thread.c:_call_channel()
  3. lib/thread/thread.c:spdk_for_each_channel_continue()
  4. lib/thread/thread.c:_call_completion()

Questions:

  • Which operations hold g_devlist_mutex?
  • Where does SPDK intentionally avoid holding the mutex while calling user callbacks?
  • Why is destruction sent as a message?
  • How is pending unregister completed?

Operational Lab

Source-only lab:

  1. Pick lib/thread/iobuf.c.
  2. Find where it registers &g_iobuf as an io_device.
  3. Find where an iobuf channel gets its parent io_channel.
  4. Find where the parent channel is put.
  5. Explain what per-thread state iobuf stores.

Runtime lab:

  1. Start an SPDK target with multiple reactors.
  2. Use framework_get_reactors to list threads.
  3. Trigger a subsystem or transport that creates channels.
  4. Use logs or debugger breakpoints on spdk_get_io_channel() and spdk_put_io_channel() to observe thread ownership.

Self-Check

  1. What is an io_device?
  2. What is an io_channel?
  3. Why does each thread get its own channel context?
  4. Why must spdk_put_io_channel() run on the owning thread?
  5. Why is channel destruction deferred through a message?
  6. What must a spdk_for_each_channel() callback do before returning control?
  7. How does unregister handle outstanding channels?

References

  • Local source: include/spdk/thread.h
  • Local source: lib/thread/thread.c
  • Local source: lib/thread/iobuf.c
  • Local source: lib/nvmf/nvmf.c
  • Local source: lib/nvmf/transport.c