Storage Migration from VMware to OpenStack + Ceph: Tips, Tools & Pitfalls

Resources » Blog » Storage Migration from VMware to OpenStack + Ceph: Tips, Tools & Pitfalls

Storage Migration from VMware to OpenStack + Ceph Tips, Tools & Pitfalls

Evaluating landing zones for your VMware workloads?

With OpenMetal, you get Hosted Private Cloud built with Ceph, and OpenStack, and the predictable cost model that makes migration planning feasible.

Moving workloads from VMware to OpenStack isn’t primarily a compute challenge—it’s a storage challenge. Your VMs can be re-instantiated quickly. Your networking can be reconfigured in an afternoon. But your storage layer—your persistent data, your stateful workloads, your multi-terabyte databases—that’s where migrations stall, fail, or drag on for months.

If you’re a storage architect or platform engineer tasked with migrating off VMware ESXi, vSAN, or VMFS to an OpenStack environment backed by Ceph, you’re facing a fundamentally different storage architecture. This isn’t a lift-and-shift. It’s a re-architecture of how block storage, shared filesystems, and object storage are provisioned, accessed, and managed. This guide walks through the migration methods, tooling, validation steps, and pitfalls that matter when you’re moving production storage workloads—not lab environments.

VMware Storage Model vs OpenStack + Ceph Storage Model

VMware’s storage stack is tightly integrated with ESXi’s hypervisor layer. Whether you’re using local VMFS datastores, shared NFS mounts, or vSAN’s distributed object store, the storage abstraction lives inside VMware’s control plane. Virtual disks are VMDK files. Storage policies are enforced by vCenter. Snapshots, clones, and thin provisioning are all managed through VMware’s APIs.

OpenStack with Ceph operates differently. Ceph is a software-defined storage system that provides block storage (RBD), shared filesystems (CephFS), and object storage (RADOS Gateway) through a unified cluster. OpenStack’s Cinder (block), Manila (file shares), and Swift/S3 (object) services interface with Ceph, but Ceph itself is hypervisor-agnostic. Virtual disks are stored as RADOS Block Device (RBD) images, not VMDK files. Snapshots are COW (copy-on-write) operations at the Ceph layer. Storage policies are defined in CRUSH maps and Ceph pools, not vCenter.

Aspect	VMware (ESXi/vSAN/VMFS)	OpenStack + Ceph (RBD/CephFS/Object)
Virtual disk format	VMDK (monolithic or split)	RBD image (object-striped across OSDs)
Storage provisioning	vCenter datastores	Cinder volumes backed by Ceph pools
Shared file storage	NFS/vSAN file services	CephFS mounted via kernel or FUSE
Object storage	vSAN object store (limited)	RADOS Gateway (S3/Swift-compatible)
Snapshot mechanism	VMDK delta files	Ceph COW snapshots at RBD layer
Thin provisioning	VMDK thin disks	RBD thin provisioning (default)
Replication/HA	vSAN erasure coding or mirroring	Ceph replica pools or erasure coding
CLI tooling	vmkfstools, esxcli	rbd, ceph, rados

Ceph is not VMFS with different branding. It’s a distributed object store that exposes block and file interfaces on top of RADOS. You’ll need to adjust your mental model for how storage is allocated, how data is replicated, and how failure domains are defined. VMware admins expect storage to be “attached” to a cluster. Ceph storage is distributed across nodes, and failure domains are defined by CRUSH topologies—not vSphere clusters.

Migration Method Options

You have four primary approaches when migrating storage from VMware to OpenStack. Each has different downtime requirements, tooling complexity, and risk profiles.

Cold migration involves shutting down the VM, exporting the VMDK, converting it to a raw or qcow2 image, and importing it into Ceph as an RBD volume. This is the simplest method, but it requires full downtime for the workload. Acceptable for dev/test environments or workloads with scheduled maintenance windows.
Live migration uses tools like virt-v2v or commercial platforms (Hystax, Trilio) to sync block-level changes while the VM remains running in VMware. A cutover window is still required, but it’s measured in minutes rather than hours. This method requires network bandwidth, intermediate storage, and careful handling of I/O consistency.
Block streaming involves attaching the source VMDK as a backing file to the destination RBD volume and streaming blocks on-demand as they’re accessed. This minimizes initial downtime but can cause performance degradation during the migration window. Rarely used in production due to complexity.
Rebuild means standing up a new VM in OpenStack, installing the OS, and migrating application data separately (rsync, database replication, object sync). This is the cleanest method for stateless workloads or when you’re modernizing the stack during migration. It’s also the most time-consuming.

Method	Downtime	Complexity	Best For
Cold migration	Hours to days	Low	Non-critical workloads, scheduled maintenance windows
Live migration	Minutes	Medium	Production databases, stateful apps
Block streaming	Seconds (initial)	High	Experimental; rarely used
Rebuild	Variable	Medium	Stateless apps, modernization efforts

Choose cold migration when downtime is acceptable and tooling simplicity matters.
Choose live migration when uptime SLAs are strict and you have the bandwidth to sync deltas.
Choose rebuild when you’re refactoring the application stack or when the workload doesn’t justify VMDK conversion.

Tools

The tooling landscape for VMware-to-OpenStack storage migration ranges from open-source CLI utilities to commercial platforms. Your choice depends on scale, automation requirements, and tolerance for manual intervention.

qemu-img is the workhorse for VMDK-to-raw or VMDK-to-qcow2 conversion. It’s free, well-documented, and handles most disk formats. It doesn’t migrate metadata (VM config, network settings), so you’ll need to recreate those in OpenStack manually or via scripting.
virt-v2v (part of libguestfs) automates more of the process. It converts VMDKs, injects virtio drivers (critical for performance on KVM), and can push images directly to OpenStack via Glance. It’s purpose-built for VMware-to-KVM migrations, but it requires access to the VMware API or exported OVF files.
rbd is Ceph’s native block device CLI. You’ll use it to import raw disk images into Ceph pools, create snapshots, clone volumes, and manage RBD mappings. It’s fast, but you need to ensure your disk images are in raw format before importing. On modern Ceph deployments managed by cephadm (standard on OpenMetal v3.0.0+ environments), Ceph services run in Docker containers. You can execute rbd commands either from the host if ceph-common is installed, or via the containerized environment using cephadm shell — rbd <command>.
ovftool (VMware’s OVF export utility) packages VMDKs and VM metadata into OVF/OVA archives. Useful when you need to export VMs from vCenter in a structured format before conversion. It doesn’t handle the Ceph import step—just the export from VMware.
Hystax, Trilio, Storware are commercial migration and disaster recovery platforms. They offer live migration capabilities, automated cutover, and delta sync. They’re expensive, but they reduce manual labor for large-scale migrations (50+ VMs). Hystax specifically supports VMware-to-OpenStack workflows.

Tool	Use Case	License	Live Migration?	Ceph Integration
qemu-img	VMDK conversion	Open source	No	Manual rbd import
virt-v2v	Automated V2V conversion	Open source	No	Via Glance/Cinder
rbd	Ceph block device mgmt	Open source	No	Native
ovftool	VMware VM export	Free (VMware)	No	None
Hystax	Enterprise migration	Commercial	Yes	Via OpenStack APIs
Trilio	Backup and migration	Commercial	Yes	Native Ceph support
Storware	Backup and DR	Commercial	Yes	Ceph plugin available

For small migrations (under 20 VMs), stick with qemu-img and rbd.
For mid-size migrations (20–100 VMs), virt-v2v will save time.
For large migrations (100+ VMs) or when you need live cutover, evaluate Hystax or Trilio. Don’t assume a commercial tool will solve architectural mismatches—they won’t convert VMFS-specific features (like Storage DRS policies) into Ceph equivalents.

Example Command Workflows

Here’s a typical cold migration workflow using open-source tools. This assumes you’ve already exported the VM from VMware and have SSH access to a machine with Ceph client tools installed.

Note for OpenMetal v3.0.0+ deployments: Ceph services run in containers managed by cephadm. Execute rbd commands via cephadm shell — <command> or install ceph-common on the host for direct CLI access. The examples below show direct CLI usage for clarity—prefix with cephadm shell — if working in a containerized environment.

Export VM from VMware with ovftool

ovftool vi://vcenter.example.com/Datacenter/vm/production-db01 \
  /mnt/staging/production-db01.ova

This exports the VM as an OVA file. Extract the VMDK from the OVA:

tar -xvf /mnt/staging/production-db01.ova

Convert VMDK to raw format with qemu-img

qemu-img convert -f vmdk -O raw \
  production-db01-disk1.vmdk \
  production-db01-disk1.raw

Check the converted image size and format:

qemu-img info production-db01-disk1.raw

Import raw image into Ceph RBD

rbd import --pool openstack-volumes \
  production-db01-disk1.raw \
  production-db01-disk1

Verify the RBD image exists:

rbd ls openstack-volumes
rbd info openstack-volumes/production-db01-disk1

Create a Cinder volume from the RBD image

openstack volume create \
  --size 100 \
  --image production-db01-disk1 \
  production-db01-volume

Attach the volume to a new OpenStack instance or boot directly from the volume using Nova.

Benchmarking

Before you migrate production workloads, benchmark your Ceph cluster to confirm it meets performance expectations. VMware admins are accustomed to vSAN’s predictable latency profiles. Ceph performance depends on OSD count, network topology, disk types (NVMe vs SSD vs HDD), and CRUSH map configuration.

Use fio to test block-level I/O performance on an RBD volume:

fio --name=rbd-randwrite \
  --ioengine=rbd \
  --pool=openstack-volumes \
  --rbdname=test-volume \
  --rw=randwrite \
  --bs=4k \
  --iodepth=32 \
  --numjobs=4 \
  --runtime=60 \
  --group_reporting

This tests random 4K writes with 32 outstanding I/Os. Compare the IOPS and latency results to your VMware baseline. If you’re seeing >10ms p99 latency on NVMe-backed Ceph, investigate network bottlenecks or OSD configuration.

Use rados bench to test raw Ceph cluster performance (bypassing RBD):

rados bench -p openstack-volumes 60 write --no-cleanup
rados bench -p openstack-volumes 60 seq

This writes objects directly to the pool for 60 seconds, then reads them back sequentially. It helps isolate whether performance issues are in Ceph itself or in the RBD/Cinder layer.

Data Integrity Validation

Migrating storage without validating data integrity is asking for corruption issues weeks after cutover. Always checksum your data before and after migration.

Generate SHA256 hash of source VMDK

sha256sum production-db01-disk1.vmdk > vmdk-hash.txt

After converting to raw and importing to Ceph, map the RBD volume and hash it:

rbd map openstack-volumes/production-db01-disk1
sha256sum /dev/rbd0 > rbd-hash.txt

Compare the hashes:

diff vmdk-hash.txt rbd-hash.txt

If the hashes don’t match, you have a corruption or conversion issue. Don’t proceed to cutover until you’ve identified the cause. Common culprits include incomplete VMDK exports, qemu-img version mismatches, or network interruptions during rbd import.

For large volumes, consider block-level validation tools like virt-diff (part of libguestfs) or filesystem-level checksums (e.g., ZFS checksums if your source datastore supports it).

Rollback Planning

No rollback = no migration. You need a tested rollback path before you cut over production workloads. Ceph snapshots make this straightforward, but you need to plan the workflow in advance.

Before cutover, take a snapshot of the original VMDK in VMware. Keep the VM powered off but don’t delete it. In OpenStack, create a Ceph snapshot of the newly imported RBD volume immediately after import:

rbd snap create
openstack-volumes/production-db01-disk1@pre-cutover

If the cutover fails (application doesn’t start, data corruption discovered, performance unacceptable), you have two rollback options:

Roll back to VMware: Power on the original VM in vCenter. You’re back to the pre-migration state within minutes.
Roll back the Ceph volume: Revert the RBD image to the snapshot, detach it from the OpenStack instance, and troubleshoot offline.

rbd snap rollback
openstack-volumes/production-db01-disk1@pre-cutover

Define your rollback SLA before migration. For Tier 1 workloads, you should be able to roll back within 15 minutes. Test the rollback procedure in a dev environment before attempting it in production. Keep the source VMDKs and VMware VMs intact for at least 30 days post-migration.

Common Pitfalls

Pitfall	Symptom	Solution
Missing virtio drivers	VM boots slowly or not at all	Inject virtio drivers via virt-v2v or install manually
Thin VMDK converted to thick	Ceph volume consumes full allocated size	Use qemu-img with sparse flag; preallocate=off
Network MTU mismatch	High packet loss during migration	Set jumbo frames (MTU 9000) on migration network
Ceph replication lag	RBD import stalls or times out	Check OSD health; reduce concurrent migrations
Incorrect CRUSH map	Data on wrong failure domain (e.g., all on one rack)	Review CRUSH rules before migration; reweight OSDs
No I/O scheduler tuning	Poor performance post-migration	Set mq-deadline or none scheduler on Ceph OSD nodes
Cinder volume type mismatch	Volume created in wrong pool or replication tier	Define Cinder volume types that map to correct Ceph pools
Incomplete VM metadata	VM boots but network/hostname wrong	Export and parse VMX file; recreate metadata in OpenStack

The most common failure mode isn’t corruption—it’s performance degradation. Your workload boots, runs, but responds 2x slower than it did in VMware. This usually points to missing virtio drivers, suboptimal Ceph pool configuration (e.g., replica 2 instead of 3), or network bottlenecks (1Gbps instead of 10Gbps+). Benchmark early, benchmark often, and compare against your VMware baselines before declaring success.

Another frequent issue: migrating VMDKs with snapshots or linked clones. qemu-img and virt-v2v don’t handle VMDK snapshots gracefully. Consolidate all snapshots in VMware before exporting the VM. If you have linked clones, convert them to full clones first.

Migration Timeline

A realistic storage migration timeline for a 50-VM production environment looks like this:

Weeks 1–2: Inventory and discovery. Identify VMDK sizes, snapshot dependencies, application dependencies, and downtime windows.
Weeks 3–4: Pilot migration of 5 non-critical VMs. Test tooling, validate performance, document workflows.
Weeks 5–8: Migrate dev/test workloads (20 VMs). Refine scripts, train team, identify performance gaps.
Weeks 9–12: Migrate Tier 2 production workloads (15 VMs). Schedule downtime windows, execute cold migrations, validate data integrity.
Weeks 13–16: Migrate Tier 1 production workloads (10 VMs). Use live migration tools if available, or schedule extended maintenance windows.
Weeks 17–20: Decommission VMware infrastructure. Archive VMDKs, power off ESXi hosts, reclaim licenses.

This timeline assumes you have a functioning Ceph cluster, competent OpenStack operators, and no major architectural surprises. If you’re also deploying Ceph and OpenStack from scratch, add 8–12 weeks to the front end. If you’re migrating 500+ VMs, scale the timeline linearly but add buffer for coordination overhead and troubleshooting.

Don’t rush the pilot phase. A poorly executed pilot will cascade into production failures. Use the pilot to identify gaps in your tooling, networking, or Ceph configuration—not to declare victory and accelerate the timeline.

Example Storage Migration Checklist

Task	Owner	Status
☐ Inventory all VMs, VMDK sizes, snapshot dependencies	Platform team
☐ Benchmark Ceph cluster (fio, rados bench)	Storage architect
☐ Test qemu-img/virt-v2v tooling on dev VM	Migration engineer
☐ Define rollback procedure and test in dev	Operations team
☐ Export VMDKs from vCenter with ovftool	VMware admin
☐ Convert VMDKs to raw format	Migration engineer
☐ Import raw images to Ceph RBD	Storage engineer
☐ Create Cinder volumes from RBD images	OpenStack operator
☐ Take pre-cutover snapshots (VMware + Ceph)	Operations team
☐ Boot OpenStack instance from migrated volume	Platform team
☐ Validate data integrity (checksums)	Storage engineer
☐ Run application smoke tests	Application owner
☐ Monitor performance for 48 hours post-cutover	Operations team
☐ Archive source VMDKs for 30 days	VMware admin
☐ Decommission VMware hosts after 30-day retention	Platform team

Why OpenMetal’s Hosted Private Cloud Works for VMware Migrations

If you’re planning a VMware-to-OpenStack migration, you need a stable, performant Ceph-backed landing zone. OpenMetal’s Hosted Private Cloud provides exactly that—without the operational burden of deploying and managing Ceph yourself.

OpenMetal’s infrastructure is built on NVMe storage, 25–100Gbps networking, and Ceph pools configured for production workloads. Starting with OpenMetal v3.0.0, deployments use cephadm for simplified cluster lifecycle management—making it easier to add OSDs, replace disks, or enable CephFS during or after your migration. You’re not inheriting someone else’s underprovisioned cluster. You get dedicated hardware with predictable, fixed-cost pricing—no surprise egress fees or noisy-neighbor performance drops.

For storage architects migrating off VMware, this means you can focus on the migration process itself—VMDK conversion, data validation, application cutover—rather than tuning CRUSH maps or troubleshooting OSD failures at 2 AM. You still get root access to the OpenStack control plane and Ceph cluster, so you maintain full operational control when you need it.

If you’re evaluating landing zones for your VMware workloads, consider OpenMetal as an alternative to hyperscaler cloud, DIY OpenStack, or proprietary converged infrastructure. You get Ceph, OpenStack, and the predictable cost model that makes storage migration planning feasible.

Works Cited

Ceph Documentation. “RADOS Block Device (RBD).” Ceph.io, https://docs.ceph.com/en/latest/rbd/. Accessed 6 Nov. 2025.
Ceph Documentation. “Cephadm — Ceph Orchestrator.” Ceph.io, https://docs.ceph.com/en/latest/cephadm/. Accessed 6 Nov. 2025.
Red Hat. “Converting Virtual Machines from Other Hypervisors to KVM with virt-v2v.” Red Hat Customer Portal, https://access.redhat.com/articles/1351473. Accessed 6 Nov. 2025.
QEMU Project. “QEMU Disk Image Utility.” QEMU Documentation, https://www.qemu.org/docs/master/tools/qemu-img.html. Accessed 6 Nov. 2025.
OpenStack Foundation. “Cinder Volume Drivers: Ceph RBD.” OpenStack Docs, https://docs.openstack.org/cinder/latest/configuration/block-storage/drivers/ceph-rbd-volume-driver.html. Accessed 6 Nov. 2025.
VMware. “OVF Tool User’s Guide.” VMware Technical Documentation, https://developer.vmware.com/web/tool/ovf/. Accessed 6 Nov. 2025.