SPDK From First Principles

SPDK deep learning path

Chapter 9: App Startup And Subsystems

By the end of this chapter, a beginner should be able to trace an SPDK event-framework application from `spdk_app_start()` to the user start callback, explain why subsystem...

Source: drafts/runtime/09-app-startup-and-subsystems.md

Reader Promise

By the end of this chapter, a beginner should be able to trace an SPDK event-framework application from spdk_app_start() to the user start callback, explain why subsystem initialization is asynchronous, and debug common failures around JSON config, startup RPCs, runtime RPCs, --wait-for-rpc, and subsystem dependencies.

The key idea is simple: SPDK does not "just call main and then do I/O." It builds an execution environment, creates reactors, creates the app spdk_thread, loads startup configuration, initializes registered subsystems in dependency order, loads runtime configuration, resumes RPC, and only then calls the application start function.

Mental Model

An SPDK app has two beginnings:

  • The process beginning: C main() parses options and calls spdk_app_start().
  • The framework beginning: SPDK calls your start_fn later, from the app spdk_thread, after env, reactors, RPC state, JSON config, and subsystems are ready.

Beginners often expect spdk_app_start() to behave like a normal function that initializes and returns. It does not. It blocks inside spdk_reactors_start() until spdk_app_stop() is called. Your start_fn runs as a message on the app thread while reactors are active.

Source Anchors

  • include/spdk/event.h: spdk_app_opts_init(), spdk_app_start(), spdk_app_stop(), spdk_app_fini()
  • lib/event/app.c: spdk_app_opts_init(), spdk_app_start(), app_setup_env(), bootstrap_fn(), app_do_spdk_subsystem_init(), app_subsystem_init_done(), app_start_application(), spdk_app_stop(), app_stop(), rpc_framework_start_init()
  • include/spdk/init.h: spdk_subsystem_init(), spdk_subsystem_load_config(), spdk_subsystem_fini()
  • include/spdk_internal/init.h: struct spdk_subsystem, SPDK_SUBSYSTEM_REGISTER(), SPDK_SUBSYSTEM_DEPEND()
  • lib/init/subsystem.c: spdk_add_subsystem(), spdk_add_subsystem_depend(), subsystem_sort(), spdk_subsystem_init(), spdk_subsystem_init_next(), spdk_subsystem_fini_next(), spdk_subsystem_fini()
  • lib/init/json_config.c: spdk_subsystem_load_config(), json_config_prepare_ctx(), app_json_config_load_subsystem(), app_json_config_load_subsystem_config_entry(), subsystem_init_done()
  • lib/event/reactor.c: spdk_reactors_init(), spdk_reactors_start(), spdk_reactors_stop()

Startup Timeline

Prose diagram:

main()
  spdk_app_opts_init()
  parse command line or fill opts
  spdk_app_start(opts, start_fn, arg)
    validate options
    app_setup_env()
      spdk_env_init()
    calculate message mempool size
    spdk_reactors_init()
      spdk_thread_lib_init_ext()
      construct reactors
    create first spdk_thread named "app_thread"
    load JSON data from file or memory, if provided
    send bootstrap_fn to app_thread
    spdk_reactors_start()
      launches reactor OS threads
      main OS thread enters reactor_run()

Then, on the app thread:

bootstrap_fn()
  if JSON exists:
    spdk_subsystem_load_config(... STARTUP state ...)
  else:
    app_do_spdk_subsystem_init()

app_do_spdk_subsystem_init()
  spdk_rpc_initialize()
  if --wait-for-rpc:
    return and wait for framework_start_init
  pause RPC server
  spdk_subsystem_init()

app_subsystem_init_done()
  set RPC state to RUNTIME
  load runtime JSON config if present
  app_start_application()

app_start_application()
  resume RPC server
  call user start_fn

spdk_app_opts_init() Is Part Of The ABI Contract

lib/event/app.c:spdk_app_opts_init() zeroes the caller's option structure, stores opts_size, and fills defaults only for fields that fit in the caller-provided size.

This pattern matters because SPDK supports ABI compatibility across structures that grow over time. If old code passes a smaller opts_size, new fields are not blindly written past the old structure.

Beginner rule:

Always call spdk_app_opts_init(&opts, sizeof(opts)) before changing fields. If you allocate or copy these options yourself, preserve opts_size.

Important defaults include:

  • coredump enabled
  • shm_id = -1
  • default memory size
  • default main core
  • default RPC address
  • default log print level
  • delay_subsystem_init = false
  • interrupt_mode = false
  • enforce_numa = false

spdk_app_start() Validates Before It Builds

lib/event/app.c:spdk_app_start() checks:

  • opts_user is not NULL.
  • opts_user->opts_size is nonzero.
  • opts_user->name is set.
  • start_fn is not NULL.
  • delay_subsystem_init is not used without an RPC server.
  • If neither lcore_map nor reactor_mask is specified, a default reactor mask is assigned.

Only after those checks does it initialize logging, env, CPU locks, reactors, tracing, app thread, signal handlers, JSON loading, and the bootstrap message.

The surprising part: spdk_app_start() sends bootstrap_fn to the app thread before starting reactors, then enters spdk_reactors_start(). Once reactors are running, the app thread message can execute.

The App Thread

The first spdk_thread created is special. lib/thread/thread.c:spdk_thread_create() stores the first created thread in g_app_thread. lib/event/app.c:spdk_app_start() creates it with the name app_thread after spdk_reactors_init().

Subsystem initialization asserts that it runs on the app thread. See lib/init/subsystem.c:spdk_subsystem_init() and spdk_subsystem_init_next(), both of which assert spdk_thread_is_app_thread(NULL).

Why this matters:

  • Initialization order is serialized.
  • Startup RPC state transitions are centralized.
  • Finalization can run from the same logical context as initialization.

Subsystem Registration

Subsystems are registered by C constructors. include/spdk_internal/init.h defines:

  • struct spdk_subsystem
  • struct spdk_subsystem_depend
  • SPDK_SUBSYSTEM_REGISTER(_name)
  • SPDK_SUBSYSTEM_DEPEND(_name, _depends_on)

SPDK_SUBSYSTEM_REGISTER() creates a constructor function that calls spdk_add_subsystem(). SPDK_SUBSYSTEM_DEPEND() creates a constructor function that calls spdk_add_subsystem_depend().

Misconception to kill:

"There must be one central list of subsystems in a config file." No. The list is built at process load time by linked C objects with constructors. Missing link objects can mean missing subsystems.

Dependency Sorting

lib/init/subsystem.c:subsystem_sort() sorts the registered subsystem list based on declared dependencies.

Its algorithm is easy to understand:

  1. Create a temporary sorted list.
  2. Walk the original list.
  3. Move a subsystem when it has no dependencies or all dependencies are already in the sorted list.
  4. Repeat until the original list is empty.
  5. Swap the sorted list back into g_subsystems.

Before sorting, spdk_subsystem_init() verifies that every dependency name and depended-on subsystem name was registered. If not, initialization fails before calling subsystem init functions.

Edge case:

The sort assumes progress is possible. A circular dependency would prevent progress. Dependency declarations should be kept simple and acyclic.

Asynchronous Subsystem Init

Each struct spdk_subsystem has an init function pointer. The contract in include/spdk_internal/init.h says the user must call spdk_subsystem_init_next() when initialization is done.

That means subsystem init may do asynchronous work:

  • send messages
  • register pollers
  • issue internal RPCs
  • wait for device discovery
  • defer completion until another callback fires

lib/init/subsystem.c:spdk_subsystem_init_next() advances one subsystem at a time. If a subsystem has no init, it immediately advances. If a subsystem returns an error through spdk_subsystem_init_next(rc), the overall callback receives the error.

Beginner rule:

A subsystem init() function does not "return success." It eventually calls spdk_subsystem_init_next(0) or spdk_subsystem_init_next(error).

JSON Config Has Startup And Runtime Passes

lib/init/json_config.c:app_json_config_load_subsystem() documents the two-phase behavior.

SPDK JSON config is organized as a subsystems array. Each subsystem has a config array of RPC methods and params.

When initialization includes subsystem initialization, the loader runs two passes:

  • First pass: only STARTUP RPC methods are used. Other methods are ignored.
  • Then the framework initializes subsystems with spdk_subsystem_init().
  • Second pass: RPC state is RUNTIME, so runtime methods are used.

lib/event/app.c:bootstrap_fn() starts the first pass when JSON exists and RPC state is still SPDK_RPC_STARTUP. lib/event/app.c:app_subsystem_init_done() sets the state to runtime and loads runtime config.

This explains a common confusion:

If an RPC method appears in JSON but is registered for the wrong RPC state, it may not run in the pass you expect.

RPC Startup State And --wait-for-rpc

--wait-for-rpc maps to delay_subsystem_init.

When delay_subsystem_init is true:

  1. app_do_spdk_subsystem_init() initializes the RPC server.
  2. It returns without starting subsystems.
  3. The limited startup RPC server remains available.
  4. The user sends framework_start_init.
  5. lib/event/app.c:rpc_framework_start_init() pauses the RPC server and calls spdk_subsystem_init().
  6. rpc_framework_start_init_cpl() finishes startup and replies to the RPC.

Why this exists:

It lets an orchestrator create startup-only resources or push early configuration before SPDK subsystems move to runtime state.

Failure mode:

If --wait-for-rpc is set but no RPC server address is configured, spdk_app_start() rejects it because no one could send framework_start_init.

RPC Pause And Resume

In normal startup, app_do_spdk_subsystem_init() initializes RPC and pauses the server before subsystem initialization. After initialization and runtime config load, app_start_application() resumes the RPC server and calls the user start function.

The pause prevents clients from racing runtime RPCs into a half-initialized framework.

Operational clue:

If the RPC socket exists but runtime RPCs hang or fail during startup, check whether the framework is paused, delayed for framework_start_init, or stuck in subsystem initialization.

Shutdown Path

spdk_app_stop(rc) sends app_stop to the app thread. app_stop() records the return code, frees pending JSON data, logs deprecation hits, and starts subsystem finalization.

_start_subsystem_fini() waits if scheduling is in progress, then calls spdk_subsystem_fini(). When subsystem fini completes, subsystem_fini_done() calls spdk_rpc_finish() and spdk_reactors_stop().

The design mirrors startup:

  • init runs on app thread
  • fini runs on app thread
  • reactor stop happens after subsystem finalization begins

Edge Cases And Failure Modes

  • Missing opts.name: spdk_app_start() fails before env init.
  • Missing start_fn: spdk_app_start() fails.
  • --wait-for-rpc without RPC server: rejected.
  • JSON config file and JSON data both set: rejected.
  • JSON file cannot be read: startup fails before reactors do useful work.
  • Startup JSON method fails and ignore-errors is false: initialization stops.
  • A subsystem never calls spdk_subsystem_init_next(): startup hangs.
  • A subsystem dependency is registered but the depended-on subsystem was not linked: spdk_subsystem_init() fails.
  • RPC state mismatch: an RPC in config may be skipped in the wrong phase.
  • RPC server paused: clients may connect but not get expected runtime behavior until resume.
  • spdk_app_stop() called twice: logged and ignored after first stop path begins.

Misconceptions To Kill

  • "My start_fn is called directly by spdk_app_start()." It is called later, from the app thread.
  • "Subsystem init is synchronous because the function is named init." It is callback-driven.
  • "JSON config is just replayed top to bottom once." It is separated by RPC startup/runtime state.
  • "A subsystem exists if its source file exists." It exists if the object is linked and its constructor registered it.
  • "RPC server up means SPDK is ready." Startup RPC availability and runtime readiness are different states.

Diskengine Relevance

For diskengine, startup state is externally visible. A control loop may be waiting for SPDK RPC to accept commands, but SPDK can be in several different states:

  • Process not started.
  • Env initialized but reactors not running.
  • Startup RPC active because --wait-for-rpc is set.
  • Subsystems initializing.
  • Runtime config replaying.
  • Runtime RPC active and app start callback called.

Treat these as separate states in logs and readiness checks. A vague "SPDK unavailable" error hides the most useful clue.

Prose Diagram: Two-Pass JSON Loading

Imagine the JSON config as a binder with tabs by subsystem. SPDK reads the binder twice:

First pass:

  • It opens each tab.
  • It only executes pages labeled STARTUP.
  • At the end, it initializes the framework.

Second pass:

  • It returns to the first tab.
  • It only executes pages labeled RUNTIME.
  • At the end, it starts the application.

If a page is in the wrong label category, it will not run when you expect.

Source Reading Exercise

Trace a normal startup with JSON:

  1. lib/event/app.c:spdk_app_start()
  2. lib/event/app.c:bootstrap_fn()
  3. lib/init/json_config.c:spdk_subsystem_load_config()
  4. lib/init/json_config.c:app_json_config_load_subsystem()
  5. lib/event/app.c:app_do_spdk_subsystem_init()
  6. lib/init/subsystem.c:spdk_subsystem_init()
  7. lib/init/subsystem.c:spdk_subsystem_init_next()
  8. lib/event/app.c:app_subsystem_init_done()
  9. lib/event/app.c:app_start_application()

Questions:

  • Which functions assert they are on the app thread?
  • Where does RPC state move to RUNTIME?
  • Where is the RPC server paused?
  • Where is it resumed?
  • What callback is invoked if subsystem initialization fails?

Operational Lab

Use an SPDK app in a dev environment, such as app/spdk_tgt, with no real NVMe device required.

  1. Start with --wait-for-rpc.
  2. Confirm that runtime commands are not yet available.
  3. Send framework_start_init.
  4. Observe when runtime RPC methods become available.
  5. Repeat with a small JSON config containing one startup-safe RPC and one runtime RPC.
  6. Identify which pass executes each method.

Debug variation:

  • Add a deliberately invalid runtime RPC to the config and compare behavior with ignore-errors enabled and disabled.

Self-Check

  1. Why does spdk_app_start() block?
  2. What makes the first spdk_thread special?
  3. How are subsystems registered?
  4. Why must subsystem init call spdk_subsystem_init_next()?
  5. What is the difference between STARTUP and RUNTIME RPC state?
  6. Why does --wait-for-rpc require an RPC server?
  7. What symptom would you expect if a subsystem init hangs?

References

  • Local source: include/spdk/event.h
  • Local source: include/spdk/init.h
  • Local source: include/spdk_internal/init.h
  • Local source: lib/event/app.c
  • Local source: lib/init/subsystem.c
  • Local source: lib/init/json_config.c
  • Local source: lib/event/reactor.c