Removing OSD Drives from a Ceph Reef Cluster
Author: Ramon Grullon
This guide covers how to safely remove one or more OSD drives from a
Ceph Reef cluster managed by cephadm and ceph orch. Following
these steps helps prevent data loss, avoids triggering nearfull
thresholds, and ensures the cluster can fully recover before the
physical drive is pulled.
Prerequisites
Before beginning an OSD removal, confirm all of the following:
Cluster Health
The cluster must be in
HEALTH_OKor at mostHEALTH_WARNwith no active recovery or backfill in progress.ceph -s
ceph health detail
Available Capacity
Confirm the cluster has enough free space to absorb all PGs currently hosted on the target OSD after it is removed. As a rule of thumb, available capacity should be at least as large as the data hosted on the OSD being removed.
ceph osd df tree
ceph df detaildangerRemoving an OSD when the cluster is at or near its
nearfullratio (default 85%) can cause writes to be blocked or OSDs to be marked full during backfill. Confirmnearfullandfullthresholds before proceeding.ceph osd dump | grep -E "full_ratio|nearfull_ratio|backfillfull_ratio"
Replication and Erasure Coding
For replicated pools, the cluster must have at least as many active OSDs as the pool's
sizeafter the removal. Removing an OSD that would drop belowmin_sizewill cause I/O to pause for affected PGs.ceph osd pool ls detail | grep -E "pool|size|min_size"For erasure-coded pools, verify the removal does not violate the EC profile's minimum chunk requirements (
k+mOSDs must remain available).
Cephadm and Ceph orch Availability
Confirm
ceph-mgrandceph orchare healthy and thatcephadmcan reach the target host:ceph mgr stat
ceph orch host ls
ceph orch status
OSD ID and Device Mapping
Know the exact OSD ID(s) and physical device path(s) you intend to remove:
ceph osd tree
ceph orch device ls <hostname>
Safety Warnings
Physically removing a drive before the OSD has been fully drained and decommissioned will cause data loss if PGs were not fully migrated. Always complete the full drain and removal workflow before touching the hardware.
Removing more than one OSD at a time multiplies recovery load and
increases the risk of hitting nearfull or full thresholds
mid-recovery. Remove and confirm recovery for one OSD at a time unless
you have confirmed sufficient headroom.
Setting noout prevents OSDs from being marked out when they go down,
which is useful during maintenance. However, leaving noout set
permanently masks real failures. Always unset it after maintenance is
complete.
Because ceph orch manages the OSD lifecycle, simply stopping an OSD
service is not sufficient — cephadm will redeploy it. You must
use ceph orch osd rm (not manual service stops) to properly
decommission an OSD.
Removal Procedure
Step 1: Set noout Flag
Setting noout prevents PGs from being immediately re-replicated while
you work, reducing unnecessary backfill load.
ceph osd set noout
Verify it is set:
ceph osd dump | grep noout
Remember to unset this flag when you are done. Leaving it set will hide real OSD failures.
Step 2: Identify the Target OSD
Find the OSD ID corresponding to the drive you want to remove.
ceph osd tree
ceph orch ps --daemon-type osd | grep <hostname>
Note the OSD ID (e.g., osd.5) and confirm it maps to the expected
physical device:
ceph osd metadata <osd-id> | grep -E "hostname|devices|bluefs_dedicated_db|bluestore_bdev_path"
ls -la /dev/disk/by-id/ | grep nvme | grep -v part
Step 3: Initiate the OSD Removal via Ceph Orchestrator
Use ceph orch osd rm <osd-id> to safely drain and decommission the OSD. This
command marks the OSD out and waits for all PGs to migrate away, then removes
the OSD daemon. The --zap flag is intentionally omitted here — the drive will
be zapped explicitly in Step 7 after confirming full removal.
ceph orch osd rm <osd-id>
Step 4: Monitor the Drain
The OSD will be marked out and PGs will begin migrating. Monitor progress:
# Watch OSD removal status
ceph orch osd rm status
# Watch PG migration
watch ceph -s
# Check the specific OSD's PG count
ceph pg ls-by-osd <osd-id> | wc -l
Wait until ceph orch osd rm status shows no pending removals and
ceph -s returns to HEALTH_OK before proceeding.
Do not proceed to the next step until PG migration is complete.
ceph -s must show 0 degraded or misplaced PGs, and the OSD's
PG count must reach 0.
Step 5: Confirm the OSD Is Fully Removed
Verify the OSD no longer appears in the OSD tree or daemon list:
ceph osd tree
ceph orch ps --daemon-type osd | grep <hostname>
ceph osd ls
ceph osd dump | grep destroyed
Step 6: Prevent Re-ingestion via the unmanaged Flag
If your cluster uses a filter-based OSD spec (e.g., size: '1200GB:'),
the orchestrator's reconciliation loop will automatically re-ingest any
device that matches the filter — including one that was just zapped.
Setting unmanaged: true on the OSD service suspends reconciliation
for that service, preventing the orchestrator from re-adopting the
device after the zap completes.
Apply the change via your configuration management tooling, or directly if the spec is not managed:
service_type: osd
service_id: osd_spec_default
placement:
label: osd
unmanaged: true
spec:
config:
osd_memory_target: 4GB
data_devices:
size: '1200GB:'
Apply the updated spec:
ceph orch apply -i spec.yaml
Verify the service is now unmanaged:
ceph orch ls --service-type osd
The output should show unmanaged in the flags column for
osd_spec_default. Do not proceed until this is confirmed.
Step 7: Zap Drive
Zap wipes all data structures that identify the device as a Ceph OSD to both the OS and the Ceph orchestrator.
ceph orch device zap <hostname> /dev/disk/by-id/<device>
Step 8: Physical Drive Removal
Only after the above steps are complete and the cluster is healthy should you physically remove the drive from the host. Coordinate with your data center or hardware team as appropriate.
After physical removal, verify no ghost devices or stale entries remain:
ceph orch device ls <hostname>
ceph osd tree
Step 9: Re-enable Orchestrator Management
After inserting the replacement drive and confirming it is detected by the OS,
re-enable orchestrator management by setting unmanaged: false (or removing
the field) in your spec and re-applying it:
ceph orch apply -i spec.yaml
Step 10: Unset noout
If you set noout in Step 1, unset it now:
ceph osd unset noout
Confirm the cluster is healthy:
ceph -s
ceph health detail
Post-Removal Checks
Run a final health check to confirm everything is stable:
ceph -s
ceph df detail
ceph osd df tree
ceph health detail
Check that:
- Cluster is
HEALTH_OK - No PGs are in a degraded, incomplete, or stale state
- Remaining OSDs are not approaching nearfull thresholds after the capacity reduction
nooutflag is unset
Troubleshooting
OSD Removal Is Stuck
If ceph orch osd rm status shows the OSD stuck in draining:
Check if PGs are blocked waiting on a down or unavailable OSD:
ceph pg dump_stuck
ceph health detailCheck if the
fullornearfullratio is being hit, blocking writes:ceph df
ceph osd dump | grep -E "full_ratio|nearfull_ratio"If the OSD is already down and you need to force-remove it, confirm data integrity first, then:
ceph osd out <osd-id>
ceph osd purge <osd-id> --yes-i-really-mean-itdangerosd purgeis destructive and permanent. Only run this if the OSD is confirmed down and all PGs have sufficient replicas on other OSDs. Verify withceph pg dump | grep <osd-id>before proceeding.
PG Count Not Reaching Zero
If an OSD's PG count stalls:
Check the CRUSH map — the OSD may be the only OSD in a CRUSH bucket with no alternative placement:
ceph osd crush treeCheck if the PG autoscaler is interfering with expected PG counts:
ceph osd pool autoscale-status
cephadm Keeps Redeploying the OSD
If cephadm redeploys the OSD after you stop the service manually,
this is expected behavior. You must use the ceph orch osd rm workflow
— not systemctl stop — to properly decommission an OSD through the
orchestrator.