SPDK From First Principles

SPDK deep learning path

Chapter 34: Extending SPDK Source

This chapter gives practical extension projects for learning SPDK by changing it in small, controlled ways.

Source: drafts/control-debug/33-extension-projects.md

Chapter Goal

This chapter gives practical extension projects for learning SPDK by changing it in small, controlled ways. The projects are intentionally scoped. They teach where to add code, how to test the change, and what source patterns to copy. The goal is not to invent a new storage product in one step. The goal is to build confidence by extending one surface at a time.

Beginner Mental Model

An SPDK extension should start at an existing seam:

  • a new RPC around existing state.
  • a new log or tracepoint.
  • a new statistic.
  • a tiny bdev module.
  • a wrapper virtual bdev.
  • a test-only fault injection path.
  • a documentation example.

Good first projects avoid changing core lifetime rules. Good first projects have a simple success test. Good first projects can be reverted without touching unrelated modules.

small extension
  |
  +-- copy local pattern
  +-- add one behavior
  +-- add one focused test
  +-- document how to observe it

Source Anchors

  • module/bdev/malloc/bdev_malloc_rpc.c: compact RPC handler pattern.
  • module/bdev/null/bdev_null_rpc.c: simple bdev creation and resize RPC examples.
  • module/bdev/delay/vbdev_delay_rpc.c: virtual bdev RPC examples.
  • module/bdev/error/vbdev_error_rpc.c: error-injection virtual bdev pattern.
  • module/bdev/passthru/vbdev_passthru_rpc.c: wrapper bdev RPC pattern.
  • module/bdev/crypto/vbdev_crypto.c: virtual bdev function table and config output.
  • lib/bdev/bdev_rpc.c: query and stats JSON writing patterns.
  • lib/event/log_rpc.c: runtime log RPC patterns.
  • lib/trace/trace_rpc.c: trace mask RPC patterns.
  • lib/event/app_rpc.c: thread and poller stats RPC patterns.
  • include/spdk/bdev_module.h: bdev module interface.
  • include/spdk/thread.h: thread, message, poller, and I/O channel API.
  • test/unit/lib/bdev/bdev.c/bdev_ut.c: bdev test examples.
  • test/unit/lib/rpc/rpc.c/rpc_ut.c: RPC registration tests.
  • test/unit/lib/jsonrpc/jsonrpc_server.c/jsonrpc_server_ut.c: JSON-RPC parse tests.
  • doc/bdev_module.md: official bdev module guide, if generated in docs.
  • doc/bdev_pg.md: official bdev programmer guide.
  • doc/jsonrpc.md.jinja2: official RPC reference source.

Project Selection Rules

Choose a project that:

  • changes one subsystem.
  • has one user-visible output.
  • has a known source pattern to copy.
  • can be tested without special hardware if possible.
  • does not require rethinking thread ownership.
  • does not change ABI unless that is the explicit goal.
  • can fail safely.

Avoid first projects that:

  • change request lifetime in the hot path.
  • alter generic bdev completion behavior.
  • add global locks to fast paths.
  • require RDMA hardware to test.
  • require coordinating three subsystems at once.
  • change JSON config format without a migration reason.

KISS matters. Most useful SPDK contributions are small and precise.

Project 1: Add A Read-Only Query RPC

Goal: add a new query RPC that reports existing internal state. This is one of the safest first extensions. It teaches RPC registration, JSON writing, and unit testing.

Good source patterns:

  • lib/bdev/bdev_rpc.c: rpc_bdev_get_bdevs.
  • lib/event/app_rpc.c: rpc_thread_get_pollers.
  • lib/rpc/rpc.c: rpc_rpc_get_methods.

Design:

new RPC name
  |
  +-- optional params decoder
  +-- begin result
  +-- write JSON
  +-- end result

Checklist:

  • pick a name that matches existing naming style.
  • decide startup, runtime, or both.
  • add SPDK_RPC_REGISTER.
  • use existing JSON writer helpers.
  • return errors for invalid params.
  • keep output stable and documented.
  • add a unit test or script-level test if practical.

Edge cases:

  • object not found.
  • empty result list.
  • params omitted.
  • params provided but invalid.
  • object removed while query is in progress.

Misconception:

"Read-only means no concurrency thinking." Even query RPCs need to run on the right thread or use safe iteration helpers.

Project 2: Add A Narrow Log Message

Goal: add one useful log message near a confusing failure branch. This teaches component flags and error diagnosis without changing behavior.

Good source patterns:

  • include/spdk/log.h.
  • lib/event/log_rpc.c.
  • nearby SPDK_ERRLOG or SPDK_DEBUGLOG in the target file.

Checklist:

  • use an existing component if possible.
  • avoid logging in the hottest loop unless gated narrowly.
  • include the object name or id when helpful.
  • include rc values and spdk_strerror(-rc) when appropriate.
  • do not log secrets such as keys.
  • do not add noisy success logs.

Example target:

RPC handler returns invalid params
  |
  +-- add a debug log with decoded field values

Test:

  • enable the component flag.
  • reproduce the branch.
  • confirm the message appears once.
  • disable the flag.

Edge cases:

  • logs in callbacks may run on different threads.
  • repeated errors can flood output.
  • pointer values are useful for correlation but not stable across runs.

Project 3: Add A Tracepoint

Goal: add a tracepoint around a state transition that is hard to debug with logs. This teaches trace registration and low-overhead event recording.

Good source patterns:

  • include/spdk/trace.h.
  • lib/trace/trace_flags.c.
  • lib/blob/blobstore.c: blob_trace.
  • lib/blob/request.c: spdk_trace_record.
  • include/spdk_internal/sock_module.h: spdk_trace_record.

Design:

define tpoint id
register description
record at state transition
decode with trace tool

Checklist:

  • choose an existing trace group if one fits.
  • record object id that links related events.
  • keep arguments small and meaningful.
  • avoid expensive formatting.
  • document what the event means.
  • test with trace mask disabled and enabled.

Edge cases:

  • trace buffers can overwrite.
  • tracepoint ids must not collide in a group.
  • the event may fire very often.
  • object pointer reuse can confuse long captures.

Misconception:

"A tracepoint should explain everything." Tracepoints should mark a specific transition. Use source and stats to interpret them.

Project 4: Add A Counter To Existing Stats

Goal: add one counter to an existing stats RPC. This teaches where runtime state is stored and how stats are serialized.

Good source patterns:

  • lib/bdev/bdev_rpc.c: rpc_bdev_get_iostat.
  • lib/event/app_rpc.c: rpc_thread_get_stats.
  • lib/nvmf/nvmf.c: spdk_nvmf_poll_group_dump_stat.
  • module/accel/mlx5/accel_mlx5.c: accel_mlx5_dump_stats_json.

Design:

increment counter in exact branch
  |
  +-- store counter in per-thread or shared object
  |
  +-- expose through existing stats JSON
  |
  +-- verify delta under reproduction

Checklist:

  • choose counter ownership carefully.
  • prefer per-thread counters in hot paths.
  • avoid adding locks to hot I/O.
  • document units and reset behavior.
  • ensure JSON field name is clear.
  • update tests expecting exact JSON if any.

Edge cases:

  • counter overflow for long-running systems.
  • reset RPCs may need to reset the new counter.
  • per-channel stats need aggregation.
  • interrupt mode may change how idle/busy stats behave.

Misconception:

"Counters are free." Even an increment has cost and ownership implications in a hot path.

Project 5: Create A Tiny Virtual Bdev

Goal: create a wrapper bdev that passes I/O to a base bdev and adds one visible behavior. This is more advanced than a query RPC but still a manageable learning project.

Good source patterns:

  • module/bdev/passthru/vbdev_passthru_rpc.c.
  • module/bdev/crypto/vbdev_crypto.c.
  • module/bdev/delay/vbdev_delay_rpc.c.
  • module/bdev/error/vbdev_error_rpc.c.
  • include/spdk/bdev_module.h.

Possible behaviors:

  • count reads and writes.
  • reject writes above a configured LBA.
  • add artificial latency for a test bdev.
  • log first I/O after create.
  • expose base bdev name in driver_specific.

Design:

RPC creates wrapper
  |
  +-- wrapper opens base bdev
  +-- wrapper registers new bdev
  +-- I/O submit_request maps to base I/O
  +-- base completion completes original I/O

Checklist:

  • open base bdev with correct permissions.
  • claim base bdev if exclusivity is required.
  • create per-thread channel with base channel.
  • complete every I/O exactly once.
  • propagate base I/O status.
  • unregister cleanly.
  • write config JSON.
  • add delete RPC.

Edge cases:

  • base bdev removal.
  • outstanding I/O during delete.
  • no memory while submitting child I/O.
  • unsupported I/O types.
  • write zeroes, unmap, flush, reset.
  • metadata or zoned devices.

Misconception:

"A wrapper can ignore I/O types it does not use." It must explicitly report supported types and fail unsupported requests correctly.

Project 6: Add An Error Injection Knob

Goal: add a test-only or debug-oriented failure path to make a rare branch reproducible.

Good source patterns:

  • module/bdev/error/vbdev_error_rpc.c.
  • module/bdev/error/vbdev_error.c.
  • module/bdev/nvme/bdev_nvme_rpc.c error injection RPCs.
  • tests under test/unit/lib/bdev/nvme/bdev_nvme.c.

Design:

RPC configures injection rule
  |
  +-- hot path checks rule cheaply
  |
  +-- selected requests fail or delay
  |
  +-- stats/logs show injection happened

Checklist:

  • default must be off.
  • rule must be narrow.
  • expose get/remove path.
  • make behavior deterministic where possible.
  • avoid affecting unrelated bdevs.
  • test both injected and non-injected paths.

Edge cases:

  • injection during shutdown.
  • injection after target object is deleted.
  • repeated injection causing permanent state.
  • confusing injected failure with real failure.

Misconception:

"Fault injection is only for tests." It is also a learning tool, but it must be safe and explicit.

Project 7: Write A Focused Unit Test

Goal: add or extend a unit test for a small branch. This may be the best first contribution if you are not ready to modify runtime behavior.

Good source patterns:

  • test/unit/lib/rpc/rpc.c/rpc_ut.c.
  • test/unit/lib/event/reactor.c/reactor_ut.c.
  • test/unit/lib/bdev/bdev.c/bdev_ut.c.
  • test/unit/lib/bdev/nvme/bdev_nvme.c/bdev_nvme_ut.c.
  • mk/spdk.unittest.mk.

Checklist:

  • find an existing unit test near the code.
  • add one test function.
  • use existing mock style.
  • keep the setup small.
  • assert the branch and the cleanup.
  • run only that unit first.
  • then run the relevant group.

Edge cases:

  • tests may include .c files directly.
  • static functions can be tested because of direct include.
  • global state must be reset between tests.
  • mocks may hide real integration constraints.

Misconception:

"A unit test must start SPDK." Many SPDK unit tests build a tiny mocked world instead.

Project 8: Improve Generated Config Coverage

Goal: ensure a new object can be emitted by framework_get_config and replayed.

Good source patterns:

  • lib/init/subsystem_rpc.c: rpc_framework_get_config.
  • include/spdk_internal/init.h: write_config_json.
  • module/bdev/raid/bdev_raid.c: raid_bdev_write_config_json.
  • module/bdev/crypto/vbdev_crypto.c: vbdev_crypto_config_json.
  • lib/nvmf/nvmf.c: spdk_nvmf_tgt_write_config_json.

Checklist:

  • identify object owner.
  • implement or extend write_config_json.
  • emit the same RPCs a user would run.
  • preserve dependency order.
  • include non-default params needed for replay.
  • omit transient counters and runtime-only state.
  • replay generated config from clean process.

Edge cases:

  • secrets should not be dumped casually.
  • object names may not be stable across machines.
  • hardware-backed objects may not replay elsewhere.
  • generated config may need startup methods before runtime methods.

Misconception:

"Config output should include all fields." It should include what is needed and safe to recreate configuration.

Project 9: Add Documentation For One Failure

Goal: write a short troubleshooting note grounded in one exact source branch. This is valuable when a failure is common but non-obvious.

Good source patterns:

  • doc/system_configuration.md.
  • doc/applications.md.
  • doc/bdev.md.
  • doc/nvmf.md.
  • doc/gdb_macros.md.

Checklist:

  • name the symptom.
  • show the exact error text.
  • explain likely causes.
  • point to the relevant RPC or command.
  • include one safe verification step.
  • avoid outdated environment assumptions.

Edge cases:

  • docs should not promise behavior not guaranteed by source.
  • hardware-specific advice should be marked as such.
  • configuration examples should be minimal.

Misconception:

"Docs are less technical than code." Good SPDK docs need source accuracy and operational clarity.

Project Review Questions

Before coding, answer:

  • What exact user-visible behavior changes?
  • Which local source pattern am I copying?
  • Which object owns the new state?
  • Which thread accesses the new state?
  • What happens on failure?
  • What happens on shutdown?
  • How do I test the success path?
  • How do I test the error path?
  • How do I observe it with logs, stats, or RPC?

If you cannot answer thread ownership, do not write hot-path code yet. Start with a read-only RPC or documentation project.

Edge Cases Across Projects

ABI And API Stability

Public headers under include/spdk matter more than internal headers. Changing public structs or function signatures has wider impact. Beginner projects should avoid public ABI changes.

JSON Field Compatibility

Renaming existing JSON fields can break clients. Add new optional fields when possible. Keep output names clear and stable.

Thread Affinity

Do not read or mutate an object from an arbitrary thread. Use existing message or channel patterns.

Hot Path Cost

Debug features can hurt performance if placed in hot paths. Make them off by default or cheap when disabled.

Shutdown

Every create path needs a destroy path. Every reference needs release. Every outstanding I/O needs a completion or cancellation story.

Misconceptions To Kill

  • "A useful extension must be large." Small observability improvements are valuable.
  • "Copying a local pattern is uncreative." It is how you stay compatible with the codebase.
  • "Read-only RPCs need no tests." Output shape and object lookup still need coverage.
  • "Stats can be shared globals." Ownership and per-thread behavior matter.
  • "A virtual bdev only needs read and write." It must handle or reject the full interface correctly.
  • "Config output is optional." If users create it by RPC, they often expect it to replay.
  • "Debug code can be sloppy." Debug code runs during failures, when clarity matters most.
  • "One test is enough for lifecycle code." Create, use, delete, and failure paths need thought.

Lab: Pick A First Project

Choose one of the nine projects. Write the target file. Write the source pattern you will copy. Write the RPC, log, trace, or stats output expected. Write one success test. Write one failure test. Write one cleanup check.

Lab: Design A Query RPC

Invent a read-only RPC for a small module. Do not implement it yet. Write:

  • method name.
  • state mask.
  • params struct.
  • decoder fields.
  • result JSON shape.
  • empty result behavior.
  • invalid params behavior.
  • source file that should own it.

Then compare with lib/bdev/bdev_rpc.c and revise.

Lab: Design A Tracepoint

Pick a state transition that is hard to see. Write:

  • group name.
  • tracepoint name.
  • object id.
  • owner id.
  • arguments.
  • code location.
  • expected decode output.

Explain why a log line would be worse or better.

Lab: Virtual Bdev Checklist

Before writing a wrapper bdev, list every I/O type you will support. For unsupported types, specify the failure behavior. Draw the base bdev relationship. Draw the channel relationship. Draw deletion with one outstanding I/O. If any box is unclear, read module/bdev/passthru and module/bdev/error first.

Self-Check

  1. Why is a read-only query RPC a good first extension?
  2. What file defines the bdev module interface?
  3. Why should a tracepoint use a stable object id?
  4. What is dangerous about adding locks in a hot path?
  5. Why does every create path need generated config thought?
  6. What makes a fault injection feature safe?
  7. Why are local source patterns better than invented style?
  8. What question should stop you from editing hot-path code?

References

  • doc/bdev_pg.md for bdev programming.
  • doc/jsonrpc.md.jinja2 for RPC surface examples.
  • doc/tracing.md for trace extension context.
  • include/spdk/bdev_module.h for bdev module contracts.
  • include/spdk/thread.h for thread and channel rules.
  • module/bdev/error for error-injection patterns.
  • module/bdev/passthru for wrapper patterns.
  • test/unit/unittest.sh and mk/spdk.unittest.mk for unit-test layout.