Skip to main content

Removing OSD Drives from a Ceph Reef Cluster

Author: Ramon Grullon

This guide covers how to safely remove one or more OSD drives from a Ceph Reef cluster managed by cephadm and ceph orch. Following these steps helps prevent data loss, avoids triggering nearfull thresholds, and ensures the cluster can fully recover before the physical drive is pulled.

Prerequisites

Before beginning an OSD removal, confirm all of the following:

Cluster Health

  • The cluster must be in HEALTH_OK or at most HEALTH_WARN with no active recovery or backfill in progress.

    ceph -s
    ceph health detail

Available Capacity

  • Confirm the cluster has enough free space to absorb all PGs currently hosted on the target OSD after it is removed. As a rule of thumb, available capacity should be at least as large as the data hosted on the OSD being removed.

    ceph osd df tree
    ceph df detail
    danger

    Removing an OSD when the cluster is at or near its nearfull ratio (default 85%) can cause writes to be blocked or OSDs to be marked full during backfill. Confirm nearfull and full thresholds before proceeding.

    ceph osd dump | grep -E "full_ratio|nearfull_ratio|backfillfull_ratio"

Replication and Erasure Coding

  • For replicated pools, the cluster must have at least as many active OSDs as the pool's size after the removal. Removing an OSD that would drop below min_size will cause I/O to pause for affected PGs.

    ceph osd pool ls detail | grep -E "pool|size|min_size"
  • For erasure-coded pools, verify the removal does not violate the EC profile's minimum chunk requirements (k + m OSDs must remain available).

Cephadm and Ceph orch Availability

  • Confirm ceph-mgr and ceph orch are healthy and that cephadm can reach the target host:

    ceph mgr stat
    ceph orch host ls
    ceph orch status

OSD ID and Device Mapping

  • Know the exact OSD ID(s) and physical device path(s) you intend to remove:

    ceph osd tree
    ceph orch device ls <hostname>

Safety Warnings

Do Not Hot-Pull Without Completing the Drain

Physically removing a drive before the OSD has been fully drained and decommissioned will cause data loss if PGs were not fully migrated. Always complete the full drain and removal workflow before touching the hardware.

Avoid Removing Multiple OSDs Simultaneously

Removing more than one OSD at a time multiplies recovery load and increases the risk of hitting nearfull or full thresholds mid-recovery. Remove and confirm recovery for one OSD at a time unless you have confirmed sufficient headroom.

noout Flag Side Effects

Setting noout prevents OSDs from being marked out when they go down, which is useful during maintenance. However, leaving noout set permanently masks real failures. Always unset it after maintenance is complete.

cephadm Will Attempt to Redeploy

Because ceph orch manages the OSD lifecycle, simply stopping an OSD service is not sufficient — cephadm will redeploy it. You must use ceph orch osd rm (not manual service stops) to properly decommission an OSD.

Removal Procedure

Step 1: Set noout Flag

Setting noout prevents PGs from being immediately re-replicated while you work, reducing unnecessary backfill load.

ceph osd set noout

Verify it is set:

ceph osd dump | grep noout
danger

Remember to unset this flag when you are done. Leaving it set will hide real OSD failures.

Step 2: Identify the Target OSD

Find the OSD ID corresponding to the drive you want to remove.

ceph osd tree
ceph orch ps --daemon-type osd | grep <hostname>

Note the OSD ID (e.g., osd.5) and confirm it maps to the expected physical device:

ceph osd metadata <osd-id> | grep -E "hostname|devices|bluefs_dedicated_db|bluestore_bdev_path"
ls -la /dev/disk/by-id/ | grep nvme | grep -v part

Step 3: Initiate the OSD Removal via Ceph Orchestrator

Use ceph orch osd rm <osd-id> to safely drain and decommission the OSD. This command marks the OSD out and waits for all PGs to migrate away, then removes the OSD daemon. The --zap flag is intentionally omitted here — the drive will be zapped explicitly in Step 7 after confirming full removal.

ceph orch osd rm <osd-id>

Step 4: Monitor the Drain

The OSD will be marked out and PGs will begin migrating. Monitor progress:

# Watch OSD removal status
ceph orch osd rm status

# Watch PG migration
watch ceph -s

# Check the specific OSD's PG count
ceph pg ls-by-osd <osd-id> | wc -l

Wait until ceph orch osd rm status shows no pending removals and ceph -s returns to HEALTH_OK before proceeding.

danger

Do not proceed to the next step until PG migration is complete. ceph -s must show 0 degraded or misplaced PGs, and the OSD's PG count must reach 0.

Step 5: Confirm the OSD Is Fully Removed

Verify the OSD no longer appears in the OSD tree or daemon list:

ceph osd tree
ceph orch ps --daemon-type osd | grep <hostname>
ceph osd ls
ceph osd dump | grep destroyed

Step 6: Prevent Re-ingestion via the unmanaged Flag

If your cluster uses a filter-based OSD spec (e.g., size: '1200GB:'), the orchestrator's reconciliation loop will automatically re-ingest any device that matches the filter — including one that was just zapped. Setting unmanaged: true on the OSD service suspends reconciliation for that service, preventing the orchestrator from re-adopting the device after the zap completes.

Apply the change via your configuration management tooling, or directly if the spec is not managed:

service_type: osd
service_id: osd_spec_default
placement:
label: osd
unmanaged: true
spec:
config:
osd_memory_target: 4GB
data_devices:
size: '1200GB:'

Apply the updated spec:

ceph orch apply -i spec.yaml

Verify the service is now unmanaged:

ceph orch ls --service-type osd

The output should show unmanaged in the flags column for osd_spec_default. Do not proceed until this is confirmed.

Step 7: Zap Drive

Zap wipes all data structures that identify the device as a Ceph OSD to both the OS and the Ceph orchestrator.

ceph orch device zap <hostname> /dev/disk/by-id/<device>

Step 8: Physical Drive Removal

Only after the above steps are complete and the cluster is healthy should you physically remove the drive from the host. Coordinate with your data center or hardware team as appropriate.

After physical removal, verify no ghost devices or stale entries remain:

ceph orch device ls <hostname>
ceph osd tree

Step 9: Re-enable Orchestrator Management

After inserting the replacement drive and confirming it is detected by the OS, re-enable orchestrator management by setting unmanaged: false (or removing the field) in your spec and re-applying it:

ceph orch apply -i spec.yaml

Step 10: Unset noout

If you set noout in Step 1, unset it now:

ceph osd unset noout

Confirm the cluster is healthy:

ceph -s
ceph health detail

Post-Removal Checks

Run a final health check to confirm everything is stable:

ceph -s
ceph df detail
ceph osd df tree
ceph health detail

Check that:

  • Cluster is HEALTH_OK
  • No PGs are in a degraded, incomplete, or stale state
  • Remaining OSDs are not approaching nearfull thresholds after the capacity reduction
  • noout flag is unset

Troubleshooting

OSD Removal Is Stuck

If ceph orch osd rm status shows the OSD stuck in draining:

  1. Check if PGs are blocked waiting on a down or unavailable OSD:

    ceph pg dump_stuck
    ceph health detail
  2. Check if the full or nearfull ratio is being hit, blocking writes:

    ceph df
    ceph osd dump | grep -E "full_ratio|nearfull_ratio"
  3. If the OSD is already down and you need to force-remove it, confirm data integrity first, then:

    ceph osd out <osd-id>
    ceph osd purge <osd-id> --yes-i-really-mean-it
    danger

    osd purge is destructive and permanent. Only run this if the OSD is confirmed down and all PGs have sufficient replicas on other OSDs. Verify with ceph pg dump | grep <osd-id> before proceeding.

PG Count Not Reaching Zero

If an OSD's PG count stalls:

  • Check the CRUSH map — the OSD may be the only OSD in a CRUSH bucket with no alternative placement:

    ceph osd crush tree
  • Check if the PG autoscaler is interfering with expected PG counts:

    ceph osd pool autoscale-status

cephadm Keeps Redeploying the OSD

If cephadm redeploys the OSD after you stop the service manually, this is expected behavior. You must use the ceph orch osd rm workflow — not systemctl stop — to properly decommission an OSD through the orchestrator.