Chapter Goal
This chapter teaches the SPDK control plane from the point of view of a new operator. By the end, you should know what an RPC request is, where it enters SPDK, how methods are registered, why some calls are startup-only, and how JSON configuration is saved and replayed. The chapter is source-grounded, so every important concept points to a local SPDK file or function.
Beginner Mental Model
SPDK applications are long-running storage engines. The data path moves I/O. The control plane changes the engine while it is running. JSON-RPC is the control-plane language. An RPC request says: call this named method, with these parameters, and return either a result or an error. The common client is scripts/rpc.py. The common server endpoint is a Unix domain socket such as /var/tmp/spdk.sock. The common target application is app/spdk_tgt/spdk_tgt.c. The core idea is simple:
operator or orchestrator
|
| JSON object over a socket
v
SPDK JSON-RPC server
|
| method lookup
v
registered C handler
|
| validates params and mutates state
v
subsystem, bdev, transport, or app framework
Do not confuse JSON-RPC with the I/O path. Creating a malloc bdev by RPC is control-plane work. Submitting a read to that bdev is data-path work. Both may interact, but they have different timing and safety rules.
Source Anchors
include/spdk/rpc.h: public registration macros and RPC state constants.include/spdk/rpc.h: SPDK_RPC_REGISTER: macro used by many modules to register methods.lib/rpc/rpc.c: spdk_rpc_register_method: adds a method to the internal method list.lib/rpc/rpc.c: jsonrpc_handler: dispatches a parsed JSON-RPC method to a registered handler.lib/rpc/rpc.c: rpc_rpc_get_methods: implementsrpc_get_methods.lib/init/rpc.c: spdk_rpc_initialize: initializes the framework RPC server.lib/jsonrpc/jsonrpc_server.c: jsonrpc_parse_request: parses raw JSON-RPC input.lib/jsonrpc/jsonrpc_server_tcp.c: spdk_jsonrpc_server_listen: listens on Unix or TCP sockets.lib/event/app.c: spdk_app_start: starts a normal SPDK application.lib/event/app.c: rpc_framework_start_init: starts delayed initialization when--wait-for-rpcis used.lib/event/app.c: rpc_framework_wait_init: lets a client wait for initialization completion.lib/init/subsystem.c: spdk_subsystem_init: initializes registered subsystems in dependency order.lib/init/json_config.c: spdk_subsystem_load_config: replays JSON config through RPC handlers.lib/init/subsystem_rpc.c: rpc_framework_get_config: implementsframework_get_config.module/bdev/malloc/bdev_malloc_rpc.c: rpc_bdev_malloc_create: a small concrete bdev RPC.lib/bdev/bdev_rpc.c: rpc_bdev_get_bdevs: a common runtime query RPC.doc/jsonrpc.md.jinja2: generated official JSON-RPC reference source.doc/applications.md: official application command-line and configuration guide.
What A JSON-RPC Request Looks Like
SPDK follows the JSON-RPC 2.0 shape. The payload is usually one JSON object. It can be sent by scripts/rpc.py, by a service manager, or by a custom client.
{
"jsonrpc": "2.0",
"method": "bdev_malloc_create",
"params": {
"name": "Malloc0",
"num_blocks": 1024,
"block_size": 4096
},
"id": 1
}
The method name is just a string until SPDK looks it up. The params are method-specific. The id lets the client match a response to a request. A notification has no id, but most operational tools use ids.
The server does not know what num_blocks means. Only the target method handler knows. That is why each RPC handler has a decoder table. For example, bdev RPC handlers use spdk_json_decode_object with arrays of spdk_json_object_decoder. When decoding fails, the handler sends a JSON-RPC error rather than half-applying the requested change.
Registration: How Methods Appear
Many SPDK RPC methods are not listed in one central table. They are registered by C files as the program is loaded. The macro SPDK_RPC_REGISTER in include/spdk/rpc.h creates a constructor-like registration hook. When the binary starts, the hook calls spdk_rpc_register_method. That function records the method name, handler function, and state mask.
The pattern looks like this:
SPDK_RPC_REGISTER("bdev_malloc_create", rpc_bdev_malloc_create, SPDK_RPC_RUNTIME)
Read it as:
- name:
bdev_malloc_create - handler:
rpc_bdev_malloc_create - valid phase: runtime
This is why adding a new RPC normally means editing a module-specific *_rpc.c file. The bdev malloc example lives in module/bdev/malloc/bdev_malloc_rpc.c. NVMe bdev RPCs live in module/bdev/nvme/bdev_nvme_rpc.c. NVMe-oF target RPCs live in lib/nvmf/nvmf_rpc.c and module/event/subsystems/nvmf/nvmf_rpc.c.
Dispatch: From Socket To Handler
The JSON-RPC server accepts bytes from a socket. jsonrpc_parse_request turns the bytes into JSON values. Then the generic RPC layer calls jsonrpc_handler. jsonrpc_handler checks whether the method exists and whether it is allowed in the current framework state. If the method is allowed, the handler receives:
- the request object, used to send the response.
- the params JSON value, or null if no params were supplied.
The handler then owns three jobs:
- validate params.
- call the subsystem API.
- send exactly one response or error.
The handler should not trust the client. Bad JSON, wrong types, missing fields, invalid names, duplicate objects, and impossible sizes are normal inputs. Source anchor: test/unit/lib/jsonrpc/jsonrpc_server.c/jsonrpc_server_ut.c tests valid, invalid, and partial parse cases.
Startup RPCs And Runtime RPCs
SPDK has phases. Some choices are only safe before the subsystem starts. Some changes are safe after the system is running. This is encoded in state masks such as SPDK_RPC_STARTUP and SPDK_RPC_RUNTIME.
Startup RPC examples:
bdev_set_optionsinlib/bdev/bdev_rpc.c.iobuf_set_optionsinmodule/event/subsystems/iobuf/iobuf_rpc.c.nvmf_set_configinmodule/event/subsystems/nvmf/nvmf_rpc.c.sock_impl_set_optionsinlib/sock/sock_rpc.c.
Runtime RPC examples:
bdev_get_bdevsinlib/bdev/bdev_rpc.c.bdev_malloc_createinmodule/bdev/malloc/bdev_malloc_rpc.c.thread_get_statsinlib/event/app_rpc.c.nvmf_get_subsystemsinlib/nvmf/nvmf_rpc.c.
The distinction matters because startup options often size pools, choose implementations, or set global behavior. Changing them after the relevant subsystem starts would be ambiguous or unsafe. If an RPC fails with a state error, the method name may be correct and the JSON may be valid. The failure can still be correct because the timing is wrong.
--wait-for-rpc
Normal startup parses config and initializes subsystems before the app enters steady state. --wait-for-rpc changes that. With --wait-for-rpc, the app starts the RPC server early and waits for an explicit framework_start_init RPC. This allows an external orchestrator to send startup RPCs before subsystem initialization.
without --wait-for-rpc:
process starts
load config if provided
init subsystems
runtime begins
with --wait-for-rpc:
process starts
RPC server opens
orchestrator sends startup RPCs
orchestrator sends framework_start_init
init subsystems
runtime begins
Source anchors:
include/spdk/event.h: spdk_app_startdocuments delayed initialization behavior.lib/event/app.c: rpc_framework_start_initstarts initialization from RPC.lib/event/app.c: rpc_framework_wait_initreports initialization completion.
The main misconception is that --wait-for-rpc means the application is fully ready. It does not. It means the RPC server is ready while the application is intentionally not fully initialized. Only startup-safe calls should be sent before framework_start_init.
Configuration Files Are RPC Sequences
SPDK JSON configuration is best understood as a sequence of RPC calls. A config file does not bypass the RPC layer. It is loaded by spdk_subsystem_load_config, which replays methods through the same handler model.
That means a config file has the same constraints as live RPC:
- method names must exist.
- params must decode.
- startup-only methods must run during startup.
- object creation order matters.
- references must point to objects that already exist or can be discovered later.
A simplified config shape looks like this:
{
"subsystems": [
{
"subsystem": "bdev",
"config": [
{
"method": "bdev_malloc_create",
"params": {
"name": "Malloc0",
"num_blocks": 1024,
"block_size": 4096
}
}
]
}
]
}
The exact generated output varies by subsystem. Always inspect real output from framework_get_config rather than assuming a hand-written format is canonical.
framework_get_config
framework_get_config asks each subsystem to write configuration JSON for the state it owns. The implementation starts in lib/init/subsystem_rpc.c: rpc_framework_get_config. Subsystems provide writer callbacks through structures declared around include/spdk_internal/init.h. Concrete writers appear across the tree.
Useful source anchors:
lib/nvmf/nvmf.c: spdk_nvmf_tgt_write_config_json.module/event/subsystems/nvmf/nvmf_tgt.c: nvmf_subsystem_write_config_json.module/bdev/raid/bdev_raid.c: raid_bdev_write_config_json.module/bdev/crypto/vbdev_crypto.c: vbdev_crypto_config_json.module/bdev/nvme/bdev_mdns_client.c: bdev_nvme_mdns_discovery_config_json.
The output is not a database snapshot. It is a best-effort replay recipe. If an object cannot be represented as RPCs, it may not appear the way you expect. If runtime state is intentionally transient, it may not be included.
Config Replay Timeline
1. binary starts
2. app framework parses app options
3. RPC methods have been registered by constructors
4. JSON config is loaded if provided
5. startup RPCs are replayed
6. subsystem initialization begins
7. subsystem init callbacks create base framework state
8. runtime RPCs become available
9. external clients mutate or inspect runtime state
The replay model explains many startup errors. If a config tries to create a RAID bdev before its base bdevs exist, replay can fail. If a config uses a startup-only option after runtime begins, replay is too late. If an RPC method was not compiled in because a feature was disabled, replay cannot find it.
Method Discovery
Use rpc_get_methods to ask the server what it currently supports. Source anchor: lib/rpc/rpc.c: rpc_rpc_get_methods. The method can optionally filter by current state. That is useful when debugging why a call is refused.
Recommended beginner workflow:
1. start app with the features you expect
2. call rpc_get_methods
3. check whether the method exists
4. check whether it is allowed in the current state
5. then debug params
This avoids a common mistake: spending time on JSON syntax when the method was never registered.
RPC Error Classes
An SPDK RPC failure usually belongs to one of these classes:
- transport error: could not connect to the socket.
- parse error: JSON is malformed or incomplete.
- method error: method name is unknown.
- state error: method exists but is not allowed now.
- params error: JSON is valid but does not match the decoder.
- semantic error: params decode, but requested state is invalid.
- asynchronous error: operation started but later completion reports failure.
Treat the error text as a clue, not a complete diagnosis. Many handlers include targeted messages through spdk_jsonrpc_send_error_response or formatted variants. Search the method handler for the string to find the exact branch.
Edge Cases
Socket Exists But The Server Is Gone
A stale Unix socket path can remain after an abnormal process exit. The client may report connection failure even though the path exists. Check the process, not just the file.
The Method Exists In Documentation But Not In Your Binary
SPDK features can depend on configure options and linked libraries. If a module is not built in, its registration macro never runs. rpc_get_methods is more reliable than memory.
Startup Method Sent Too Late
The JSON shape can be perfect and still fail. Look for SPDK_RPC_STARTUP registrations. If the app has already completed init, the call belongs in a config file or before framework_start_init.
Runtime Method Sent Too Early
With --wait-for-rpc, runtime state may not exist. Do not create runtime objects before subsystem init unless the method explicitly supports startup.
Save Config Does Not Preserve Everything
framework_get_config serializes configuration, not every runtime counter or transient queue. Stats, active I/O, poller run counts, and temporary reconnect state are not durable configuration.
Replay Order Is Real
Config is not declarative magic. It is closer to a script. If object B depends on object A, A must appear first or be discoverable by a later examine step.
Misconceptions To Kill
- "JSON-RPC is slow, so it must affect every I/O." The control plane is separate from the hot I/O path.
- "If the docs list a method, my binary has it." Build options and linked modules decide what is registered.
- "Startup RPC means run at process start only." It means valid before subsystem initialization completes.
- "Runtime RPC means safe at any instant." The handler still must validate object state and concurrency.
- "A config file is a dump of memory." It is a replayable RPC recipe.
- "Unknown method means typo." It can also mean the module was not compiled or not linked.
- "A successful create RPC means the whole stack is healthy." It means the handler accepted and completed that operation.
- "All RPCs are synchronous inside." Some handlers initiate work and respond from a completion callback.
Lab: Trace One RPC Handler
Pick bdev_malloc_create. Find its registration in module/bdev/malloc/bdev_malloc_rpc.c. Find the handler function. Find the decoder table. Write down each parameter accepted by the decoder. Find the call that creates the malloc bdev. Find where the handler sends success. Find where it sends an error. Now run the same reading exercise for bdev_get_bdevs in lib/bdev/bdev_rpc.c. Compare a mutating RPC with a query RPC.
Lab: Classify RPC Phase
Use rg to list registrations:
rg -n 'SPDK_RPC_REGISTER' lib module
For ten methods, classify them as startup, runtime, or both. For each startup method, write one sentence explaining why late mutation could be unsafe. For each runtime method, write one sentence explaining what state must already exist.
Lab: Build A Minimal Config Replay
Start with a malloc bdev config. Add one bdev_malloc_create. Load it into spdk_tgt. Call bdev_get_bdevs. Call framework_get_config. Compare your input config to SPDK's output. Note differences in ordering, omitted defaults, and generated fields.
Debug Checklist
- Can the client connect to the socket?
- Does
rpc_get_methodslist the method? - Is the method allowed in the current state?
- Does the JSON parse?
- Does the params object match the decoder?
- Does the handler require a named object that already exists?
- Is the feature compiled into this binary?
- Is the failure synchronous or completed from a callback?
- Does
framework_get_configproduce a replay that includes the object? - Does replay succeed from a clean process?
Self-Check
- What local file contains
SPDK_RPC_REGISTER? - What function dispatches parsed RPC methods to registered handlers?
- Why can a startup RPC fail after the app is running?
- What does
--wait-for-rpcdelay? - Why is
framework_get_confignot the same as a memory dump? - How would you prove that a method is missing because of build options?
- Why does replay order matter?
- What is the difference between a params error and a semantic error?
References
doc/jsonrpc.md.jinja2for the generated official RPC reference.doc/applications.mdfor SPDK application options and JSON configuration behavior.doc/getting_started.mdfor first-run setup context.doc/bdev.mdfor block-device RPC examples.doc/nvmf.mdfor NVMe-oF target RPC examples.test/unit/lib/rpc/rpc.c/rpc_ut.cfor RPC registration and method-list unit tests.test/unit/lib/jsonrpc/jsonrpc_server.c/jsonrpc_server_ut.cfor JSON-RPC parsing tests.