Beginner Mental Model
lvol is SPDK's logical-volume layer on top of blobstore. A logical volume store, or lvolstore, is a blobstore plus lvol-specific metadata. An lvol is a blob plus lvol-specific identity, name, UUID, lifecycle state, and helper operations. The lvol bdev module, vbdev_lvol, turns lvols into normal SPDK bdevs so the rest of SPDK can read, write, export, snapshot, clone, resize, and delete them through the bdev abstraction.
Three layers are easy to confuse:
base bdev
|
| module/blob/bdev/blob_bdev.c wraps it as struct spdk_bs_dev
v
blobstore
|
| lib/lvol/lvol.c stores lvolstore metadata and lvol blobs
v
lvol / lvolstore
|
| module/bdev/lvol/vbdev_lvol.c registers bdevs for lvols
v
lvol bdevs visible to bdev users
Blobstore is the allocator and metadata engine. lvol is the volume manager API. vbdev_lvol is the bdev adapter and RPC-facing module.
Why This Matters For diskengine/excloud
In a cloud disk service, a user-visible volume tends to behave like a bdev: it can be exported over NVMe-oF, attached to a VM through vhost/vfio-user, rate-limited, snapshotted, resized, or deleted. Internally, the fast volume operations are often blobstore operations. lvol is the translation layer between "cloud volume object" and "blobstore blob."
The most important operational consequences:
- lvol create may produce a bdev only after asynchronous blob creation and bdev registration complete.
- lvolstore load may auto-discover existing lvols through
bdev_examine. - lvol delete must unregister the bdev before destroying the underlying blob.
- lvol resize must update the blob and then notify the bdev layer of the new block count.
- external-snapshot clones can become degraded if their parent bdev is unavailable.
- names and UUIDs matter because RPCs often accept either lvolstore name/UUID and bdev name.
Public lvol API
Primary API anchors:
include/spdk/lvol.h:spdk_lvs_initinclude/spdk/lvol.h:spdk_lvs_loadinclude/spdk/lvol.h:spdk_lvs_load_extinclude/spdk/lvol.h:spdk_lvs_unloadinclude/spdk/lvol.h:spdk_lvs_destroyinclude/spdk/lvol.h:spdk_lvol_createinclude/spdk/lvol.h:spdk_lvol_openinclude/spdk/lvol.h:spdk_lvol_closeinclude/spdk/lvol.h:spdk_lvol_destroyinclude/spdk/lvol.h:spdk_lvol_create_snapshotinclude/spdk/lvol.h:spdk_lvol_create_cloneinclude/spdk/lvol.h:spdk_lvol_create_esnap_cloneinclude/spdk_internal/lvolstore.h:spdk_lvol_resizeinclude/spdk/lvol.h:spdk_lvol_inflateinclude/spdk/lvol.h:spdk_lvol_decouple_parentinclude/spdk/lvol.h:spdk_lvol_set_parentinclude/spdk/lvol.h:spdk_lvol_set_external_parentinclude/spdk/lvol.h:spdk_lvol_is_degraded
Implementation anchors:
lib/lvol/lvol.c:spdk_lvs_initlib/lvol/lvol.c:spdk_lvs_loadlib/lvol/lvol.c:spdk_lvs_load_extlib/lvol/lvol.c:load_next_lvollib/lvol/lvol.c:lvs_read_uuidlib/lvol/lvol.c:spdk_lvol_createlib/lvol/lvol.c:lvol_create_cblib/lvol/lvol.c:spdk_lvol_destroylib/lvol/lvol.c:lvol_delete_blob_cblib/lvol/lvol.c:spdk_lvol_resizelib/lvol/lvol.c:spdk_lvol_create_snapshotlib/lvol/lvol.c:spdk_lvol_create_clonelib/lvol/lvol.c:spdk_lvol_create_esnap_clone
lvolstore Lifecycle
An lvolstore is initialized on a blobstore device. The public initializer takes a struct spdk_bs_dev *, not a bdev name. The bdev adapter creates that spdk_bs_dev from the base bdev.
Creation path:
bdev_lvol_create_lvstore RPC
-> module/bdev/lvol/vbdev_lvol_rpc.c:rpc_bdev_lvol_create_lvstore
-> module/bdev/lvol/vbdev_lvol.c:vbdev_lvs_create
-> module/blob/bdev/blob_bdev.c:spdk_bdev_create_bs_dev_ext
-> lib/lvol/lvol.c:spdk_lvs_init
-> lib/blob/blobstore.c:spdk_bs_init
Load/discovery path:
new base bdev appears
-> lib/bdev/bdev.c:bdev_examine
-> module/bdev/lvol/vbdev_lvol.c:vbdev_lvs_examine_disk
-> module/bdev/lvol/vbdev_lvol.c:_vbdev_lvs_examine
-> module/blob/bdev/blob_bdev.c:spdk_bdev_create_bs_dev_ext
-> lib/lvol/lvol.c:spdk_lvs_load_ext
-> lib/blob/blobstore.c:spdk_bs_load
-> lib/lvol/lvol.c:load_next_lvol
-> module/bdev/lvol/vbdev_lvol.c:_create_lvol_disk
Source anchors:
module/bdev/lvol/vbdev_lvol.c:vbdev_lvs_examine_diskmodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvs_examinemodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvs_examine_cbmodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvs_examine_finishmodule/bdev/lvol/vbdev_lvol.c:_create_lvol_disklib/lvol/lvol.c:spdk_lvs_load_extlib/lvol/lvol.c:load_next_lvol
Beginner misconception to kill: lvolstore discovery is not a global scan done by lvol itself. The bdev subsystem calls each module's examine callbacks. vbdev_lvol receives a candidate bdev, tries to load a blobstore/lvolstore from it, claims the base bdev if successful, then creates child bdevs for the lvols.
lvol Metadata And Identity
An lvol has:
- An lvolstore pointer.
- A blob ID.
- A name.
- A UUID and string form.
- A unique ID used for bdev naming.
- A reference count.
- A
struct spdk_blob *when opened. - Flags for pending actions or degraded external snapshots.
Source anchors:
include/spdk_internal/lvolstore.h:struct spdk_lvol_storeinclude/spdk_internal/lvolstore.h:struct spdk_lvollib/lvol/lvol.c:lvol_alloclib/lvol/lvol.c:lvol_get_xattr_valuelib/lvol/lvol.c:lvs_verify_lvol_namelib/lvol/lvol.c:lvs_get_lvol_by_blob_id
The lvol name and UUID are persisted as blob xattrs. During load, lib/lvol/lvol.c:load_next_lvol opens each blob, reads xattrs, reconstructs lvol objects, and appends them to the lvolstore's lvol list.
lvol Create
Public API:
include/spdk/lvol.h:spdk_lvol_create
Implementation:
lib/lvol/lvol.c:spdk_lvol_createlib/lvol/lvol.c:lvol_alloclib/lvol/lvol.c:lvol_get_xattr_valuelib/lvol/lvol.c:lvol_create_cblib/lvol/lvol.c:lvol_create_open_cblib/blob/blobstore.c:spdk_bs_create_blob_extlib/blob/blobstore.c:spdk_bs_open_blob_ext
spdk_lvol_create() verifies the lvolstore and name, allocates an in-memory lvol, prepares blob options, sets lvol xattrs, and calls spdk_bs_create_blob_ext(). When blob creation completes, it opens the blob and moves the lvol from the pending list into the lvolstore lvol list.
At the bdev layer:
module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_createmodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvol_create_cbmodule/bdev/lvol/vbdev_lvol.c:_create_lvol_disk
_create_lvol_disk() fills in a struct spdk_bdev: name, aliases, block length, block count, supported operations, fn table, module pointer, product name, and context. Then it registers the bdev.
lvol bdev IO Path
The bdev-facing IO path starts at:
module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_submit_requestmodule/bdev/lvol/vbdev_lvol.c:lvol_readmodule/bdev/lvol/vbdev_lvol.c:lvol_writemodule/bdev/lvol/vbdev_lvol.c:lvol_unmapmodule/bdev/lvol/vbdev_lvol.c:lvol_write_zeroesmodule/bdev/lvol/vbdev_lvol.c:lvol_resetmodule/bdev/lvol/vbdev_lvol.c:lvol_seek_datamodule/bdev/lvol/vbdev_lvol.c:lvol_seek_holemodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_fn_table
Prose diagram:
SPDK bdev user submits write to lvol bdev
-> lib/bdev/bdev.c:bdev_submit_request
-> module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_submit_request
-> module/bdev/lvol/vbdev_lvol.c:lvol_write
-> lib/blob/blobstore.c:spdk_blob_io_write
-> lib/blob/blobstore.c:blob_request_submit_op
-> struct spdk_bs_dev write/writev on base bdev adapter
-> completion propagates back to spdk_bdev_io_complete()
The lvol bdev does not implement its own allocator. It forwards reads, writes, unmaps, write-zeroes, and seeks to blobstore. That is why understanding blobstore thin allocation and snapshot backing matters before debugging lvol bdev IO.
lvol Snapshots And Clones
Public lvol API:
include/spdk/lvol.h:spdk_lvol_create_snapshotinclude/spdk/lvol.h:spdk_lvol_create_cloneinclude/spdk/lvol.h:spdk_lvol_iter_immediate_clones
Implementation:
lib/lvol/lvol.c:spdk_lvol_create_snapshotlib/lvol/lvol.c:spdk_lvol_create_clonelib/lvol/lvol.c:spdk_lvol_iter_immediate_cloneslib/blob/blobstore.c:spdk_bs_create_snapshotlib/blob/blobstore.c:spdk_bs_create_clone
bdev/RPC layer:
module/bdev/lvol/vbdev_lvol_rpc.c:rpc_bdev_lvol_snapshotmodule/bdev/lvol/vbdev_lvol_rpc.c:rpc_bdev_lvol_clonemodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_create_snapshotmodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_create_clone
Snapshot and clone creation both use the same _vbdev_lvol_create_cb path to create the new lvol bdev after the underlying lvol/blob operation succeeds.
Beginner misconception to kill: bdev_lvol_snapshot names an lvol bdev and creates another lvol bdev. The snapshot is still a blobstore snapshot under the hood. vbdev_lvol is responsible for making the result visible as a bdev.
External Snapshot Clones
External snapshot support crosses all three layers: blobstore, lvol, and vbdev_lvol.
Public lvol API:
include/spdk/lvol.h:spdk_lvol_create_esnap_cloneinclude/spdk/lvol.h:spdk_lvol_set_external_parent
lvol implementation:
lib/lvol/lvol.c:spdk_lvol_create_esnap_clonelib/lvol/lvol.c:lvs_esnap_bs_dev_createlib/lvol/lvol.c:spdk_lvs_esnap_missing_addlib/lvol/lvol.c:spdk_lvs_esnap_missing_removelib/lvol/lvol.c:lvs_esnap_degraded_hotpluglib/lvol/lvol.c:spdk_lvol_is_degradedinclude/spdk_internal/lvolstore.h:spdk_lvs_notify_hotpluglib/lvol/lvol.c:spdk_lvs_notify_hotplug
vbdev_lvol implementation:
module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_create_bdev_clonemodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_esnap_dev_createmodule/bdev/lvol/vbdev_lvol.c:vbdev_lvs_examine_configmodule/bdev/lvol/vbdev_lvol.c:vbdev_lvs_hotplugmodule/bdev/lvol/vbdev_lvol.c:create_esnap_clone_lvol_disksmodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_set_external_parent
External snapshot flow:
bdev_lvol_clone_bdev
-> open external bdev read-only
-> parse external bdev UUID
-> create esnap clone blob with external snapshot ID
-> lvol create callback opens blob
-> blobstore asks lvol/vbdev_lvol to create a bs_dev for esnap ID
-> vbdev_lvol opens the parent bdev and claims it with a shared/read claim
-> child lvol bdev is registered
If the external bdev is missing during load, lvol tracks the missing esnap in lvs->degraded_lvol_sets_tree. When a bdev with the matching UUID appears, vbdev_lvs_examine_config() calls spdk_lvs_notify_hotplug(), and the lvolstore can attempt to attach the external parent and create child bdevs that were previously withheld or degraded.
Degraded lvols
Source anchors:
include/spdk/lvol.h:spdk_lvol_is_degradedlib/lvol/lvol.c:spdk_lvol_is_degradedinclude/spdk/blob.h:spdk_blob_is_degradedlib/blob/blobstore.c:spdk_blob_is_degradedmodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_get_memory_domainsmodule/bdev/lvol/vbdev_lvol.c:vbdev_lvol_esnap_dev_create
An lvol is degraded if it has no open blob or the blob is degraded. For an esnap clone, missing external snapshot state can make the blob degraded. A degraded lvol cannot perform normal IO. Delete and close paths have explicit handling so degraded metadata can still be cleaned up.
Important edge case: module/bdev/lvol/vbdev_lvol.c:_vbdev_lvol_destroy checks spdk_lvol_is_degraded(lvol). If degraded, it closes the lvol instead of unregistering a bdev that may not exist.
Resize
Public API:
include/spdk_internal/lvolstore.h:spdk_lvol_resize
Implementation:
lib/lvol/lvol.c:spdk_lvol_resizelib/lvol/lvol.c:lvol_blob_resize_cblib/lvol/lvol.c:lvol_resize_donelib/blob/blobstore.c:spdk_blob_resize
bdev adapter:
module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_resizemodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvol_resize_cbmodule/bdev/lvol/vbdev_lvol_rpc.c:rpc_bdev_lvol_resize
The lvol layer converts bytes to cluster count using the lvolstore cluster size, calls spdk_blob_resize(), then syncs blob metadata. The bdev adapter updates the visible bdev block count and notifies the bdev layer.
Edge cases:
- Resize of a read-only snapshot fails at blobstore metadata checks.
- Resize while another locked blob operation is in progress fails with
-EBUSY. - Growing a clone beyond its parent is allowed at the lvol/blob level, but backing reads beyond the parent must be treated carefully. See
lib/blob/blob_bs_dev.c:blob_bs_is_range_valid. - Resize of an exported lvol bdev may require consumers to observe bdev resize events correctly.
Delete
Public API:
include/spdk/lvol.h:spdk_lvol_deletableinclude/spdk/lvol.h:spdk_lvol_destroy
Implementation:
lib/lvol/lvol.c:spdk_lvol_deletablelib/lvol/lvol.c:spdk_lvol_destroylib/lvol/lvol.c:lvol_delete_blob_cblib/blob/blobstore.c:spdk_bs_delete_bloblib/blob/blobstore.c:bs_is_blob_deletable
bdev adapter:
module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_destroymodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvol_destroymodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvol_destroy_cbmodule/bdev/lvol/vbdev_lvol_rpc.c:rpc_bdev_lvol_delete
Delete is layered:
bdev_lvol_delete
-> find bdev by name
-> map bdev to struct spdk_lvol
-> module/bdev/lvol/vbdev_lvol.c:vbdev_lvol_destroy
-> unregister lvol bdev or close degraded lvol
-> lib/lvol/lvol.c:spdk_lvol_destroy
-> lib/blob/blobstore.c:spdk_bs_delete_blob
Deletion rules:
spdk_lvol_destroy()fails with-EBUSYif the lvol is still open.vbdev_lvolrefuses delete when the snapshot has more than one clone.- Blobstore may allow a snapshot with exactly one clone to be deleted by updating the clone.
- If the lvolstore itself is being removed, lvol bdev deletion follows the lvolstore teardown path rather than the normal RPC delete path.
Misconceptions To Kill
- "An lvol is a bdev." Not exactly. An lvol is a library object backed by a blob.
vbdev_lvolexposes it as a bdev. - "Creating an lvolstore creates lvol bdevs immediately." It creates the lvolstore. lvol bdevs appear when lvols are created or loaded and registered.
- "lvol resize is just changing a bdev field." It resizes the underlying blob and then updates the bdev.
- "lvol snapshots copy data." They call blobstore snapshot logic, which is metadata/COW based.
- "External snapshot clones are independent volumes." They depend on an external parent until decoupled or inflated.
- "If an external snapshot parent is missing, the lvolstore cannot load at all." The lvol layer can track degraded lvols and hotplug the parent later.
Source Reading Exercise
Trace lvolstore auto-discovery:
lib/bdev/bdev.c:bdev_examinemodule/bdev/lvol/vbdev_lvol.c:vbdev_lvs_examine_diskmodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvs_examinelib/lvol/lvol.c:spdk_lvs_load_extlib/lvol/lvol.c:load_next_lvolmodule/bdev/lvol/vbdev_lvol.c:_vbdev_lvs_examine_cbmodule/bdev/lvol/vbdev_lvol.c:_create_lvol_disk
Write down:
- Where the base bdev is claimed.
- Where lvol xattrs are read.
- Where each lvol bdev is registered.
- Where
spdk_bdev_module_examine_done()is called.
Operational Lab
Use local test scripts as lab guides:
test/lvol/basic.shtest/lvol/resize.shtest/lvol/snapshot_clone.shtest/lvol/external_snapshot.shtest/lvol/hotremove.sh
Suggested RPC lab:
1. Start SPDK app with JSON-RPC enabled.
2. Create a malloc bdev.
3. Create an lvolstore on the malloc bdev.
4. Create a thin lvol.
5. Run bdev_get_bdevs and identify:
- base malloc bdev
- lvol bdev
- claim state of the base bdev
6. Snapshot the lvol.
7. Clone the snapshot.
8. Resize the clone.
9. Delete clone, snapshot, original in different orders and record expected failures.
10. Restart the app and verify lvolstore/lvol bdevs are recreated through examine.
Debug prompt:
- If an lvol bdev does not appear after restart, check whether
vbdev_lvs_examine_disk()ran, whetherspdk_lvs_load_ext()succeeded, whetherspdk_bs_bdev_claim()failed, and whether_create_lvol_disk()returned an error.
Self-Check
- What is the difference between
spdk_lvol_create()andvbdev_lvol_create()? - Why does
vbdev_lvol_submit_request()call blobstore IO functions? - Which source function turns an lvol into a bdev?
- What has to happen before an lvolstore loaded from disk can expose lvol bdevs?
- Why can external snapshot clones become degraded?
- What prevents deleting an open lvol?
- Why does deleting a snapshot with one clone differ from deleting a snapshot with two clones?
- Which path handles bdev resize notification after lvol resize?
References
- Local API:
include/spdk/lvol.h - Local lvol implementation:
lib/lvol/lvol.c - Local lvol bdev implementation:
module/bdev/lvol/vbdev_lvol.c - Local lvol RPCs:
module/bdev/lvol/vbdev_lvol_rpc.c - Local tests:
test/lvol/basic.sh,test/lvol/resize.sh,test/lvol/snapshot_clone.sh,test/lvol/external_snapshot.sh,test/lvol/hotremove.sh