Regularly testing your Ceph storage is key to identifying performance bottlenecks, optimizing configurations, and ensuring your cluster meets application demands. This guide, based on our experience at OpenMetal, will walk you through preparing your environment, selecting the right tools, and running benchmarks effectively.

Preparing Your Benchmarking Environment

The accuracy of your benchmarks hinges on how closely your test setup replicates production conditions. A well-prepared environment leads to reliable results, making it important to mirror your production setup as much as possible.

System Requirements and Prerequisites

To get started, make sure your Ceph cluster is running on supported hardware, with root-level access available for benchmarking. Most benchmarking tools require elevated privileges to directly interact with storage devices and system resources.

Hardware selection plays a major role in determining benchmark outcomes. At OpenMetal, our standard configurations use high-performance hardware, like Micron 7450 MAX NVMe drives for OSDs and low-latency, high-speed networking (up to 100 Gbps), which we have validated for Ceph performance. For best results, use SSDs for Ceph Monitor, Ceph Manager, CephFS Metadata Server metadata pools, and Ceph Object Gateway (RGW) index pools.

Storage drives should be dedicated to specific tasks: one for the operating system, another for OSD data, and a separate one for BlueStore WAL+DB. This separation minimizes interference and ensures smoother performance.

Install benchmarking tools such as FIO, rados bench, and COSBench or GOSBench on dedicated client machines. These tools should not run on Ceph cluster nodes to avoid resource conflicts.

Ensure your Ceph storage interfaces—whether block, file, or object—are correctly configured and accessible from your test clients. For more on the distinctions between storage types, you can check out our article on Block Storage vs. Object Storage. Finally, create an isolated test environment that matches your production hardware to validate these prerequisites.

Test Environment Configuration

Once you’ve met the hardware and software requirements, isolate your testing environment to achieve consistent results. Workload isolation is key to preventing production traffic from interfering with your benchmarks, ensuring that your data reflects true storage performance.

Set up dedicated test nodes that closely match your production hardware. This includes aligning CPU cores, memory capacity, network interfaces, and storage types.

Use a dedicated network for Ceph cluster traffic to reduce latency. Split client and recovery traffic across separate network interfaces to avoid bandwidth contention. A 10 Gbps network may struggle under heavy loads, while upgrading to 100 Gbps can significantly improve performance.

Your storage configuration should reflect the specifics of your production setup. Choose between erasure coding and replication based on your workload. Replication generally performs better in write-heavy scenarios, whereas erasure coding is better suited for read-heavy workloads.

Adjust the number of placement groups per OSD to strike a balance between performance and resource usage. This tuning affects data distribution across your cluster and can influence your results.

Set workload parameters that align with your actual use cases. Match I/O patterns, block sizes, queue depths, and concurrency levels to what your applications typically generate. Avoid relying on synthetic workloads that don’t represent real-world usage.

Benchmarking Tools and Selection Methods

Picking the right benchmarking tool is key to gathering reliable performance data from your Ceph storage cluster.

Available Benchmarking Tools

  • RADOS bench is Ceph’s built-in tool for testing the RADOS layer.
  • FIO (Flexible I/O Tester) is a versatile tool for simulating I/O patterns on both CephFS and Ceph Block Devices.
  • COSBench and GOSBench are tailored for benchmarking the Ceph Object Gateway (RGW).
  • s3cmd provides a simpler way to benchmark the Ceph Object Gateway by measuring the speed of get and put requests.

Choosing the Right Tool for Your Storage Type

  • For RADOS cluster testing, RADOS bench is your go-to option. It’s a native Ceph utility that provides a direct look at your cluster’s core performance. For a deeper dive, read our introduction to Ceph architecture.
  • For block storage benchmarking, use FIO for simulating complex I/O patterns that mimic real-world workloads.
  • For file storage performance with CephFS, FIO is the ideal tool.
  • For object storage benchmarking, tools like COSBench, GOSBench, or s3cmd are best.

How to Run Ceph Storage Benchmarks

Once your environment is set up, it’s time to start running benchmarks.

FIO Benchmark Testing

The FIO tool is ideal for testing Ceph Block Devices and CephFS. Start with a 4k block size and gradually increase it (e.g., 4k, 8k, 16k) to determine the best size for your workload.

To test random write performance, use the following command:

fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=4k --iodepth=32 --size=5G --runtime=60 --group_reporting=1

RADOS Performance Testing with rados bench

The rados bench tool is used to measure the performance of the RADOS cluster itself. For write tests, use the --no-cleanup option to keep the test data for subsequent read tests. For example:

rados bench -p your_pool 600 write -t 16 --object_size=4MB --no-cleanup

Once the write test is complete, you can measure read performance:

rados bench -p your_pool 60 seq -t 16 --object_size=4MB

Object Storage Testing with COSBench and GOSBench

To benchmark Ceph’s object storage via the RADOS Gateway, tools like COSBench and GOSBench are commonly used. Both coordinate workers to perform operations like read, write, delete, and list on your object storage endpoints.

Best Practices for Reliable Results

Getting accurate benchmarking results requires a controlled environment and a consistent approach.

Workload Isolation Techniques

Workload isolation is important to prevent interference from other applications or background tasks. Using techniques like container-based isolation with Docker can provide a controlled environment for benchmarking.

Creating Performance Baselines

Performance baselines are essential for evaluating whether configuration tweaks genuinely improve system performance. As Klara Systems notes, “Effective storage benchmarking requires a structured approach – defining scope, designing realistic tests, and ensuring repeatability.”

Running Multiple Test Iterations

A single test run isn’t enough. System performance can vary, so running multiple iterations helps account for this variability and identifies outliers that could distort your data.

Wrapping Up: Benchmarking Ceph Storage Performance

The benchmarking processes we’ve discussed are the same ones we use at OpenMetal to validate our own cloud infrastructure. This ensures that when you deploy an OpenMetal private cloud, you get a storage system that is already optimized for predictable, high performance, removing the complexity and guesswork for your team.

Regular benchmarking allows you to monitor the effects of configuration changes and ensures your Ceph storage continues to meet your data requirements. This process becomes even more important as your private cloud grows, helping you identify and address bottlenecks before they disrupt production workloads.

FAQs

What’s the difference between COSBench and GOSBench for testing Ceph object storage performance?

COSBench is a widely used, Java-based tool for assessing cloud object storage performance. GOSBench, written in Golang, is a more modern alternative that often delivers better performance and scalability in demanding scenarios.

How can I set up a benchmarking environment that accurately reflects my production setup?

To get reliable results, recreate your production environment as closely as possible. This includes using the same hardware configurations, network setup, and software versions. Simulating production workloads and including routine maintenance tasks will help ensure your testing environment reflects real-world conditions.

How can I isolate workloads during Ceph storage benchmarking?

Allocate specific hardware resources—like separate CPU cores and network interfaces—exclusively for the benchmarking process. Set up dedicated networks for distinct traffic types, such as cluster, public, and client traffic.

Interested in OpenMetal’s Hosted Private Cloud Powered by OpenStack and Ceph?

Chat With Our Team

We’re available to answer questions and provide information.

Chat With Us

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options

 

 

 Read More on the OpenMetal Blog

How to Benchmark Ceph Storage Performance

Jul 25, 2025

Learn how to effectively benchmark your Ceph storage cluster. This guide covers everything from preparing your test environment and choosing the right tools (like FIO and RADOS bench) to running tests and analyzing results, ensuring your storage is optimized for performance.

Integrating Your Data Lake and Data Warehouse on OpenMetal

Jul 16, 2025

Tired of siloed data lakes and warehouses? This article shows data architects how, why, and when to build a unified lakehouse. Learn how to combine raw data for ML and structured data for BI into one system, simplifying architecture and improving business insights.

Ceph RBD Mirroring for Disaster Recovery

Jul 09, 2025

Ceph RBD mirroring offers a streamlined way to handle disaster recovery for private OpenStack clouds. RBD mirroring also reduces downtime and supports business continuity by enabling a smooth failover to secondary clusters during primary site outages. This functionality lays the groundwork for a resilient disaster recovery strategy and ongoing improvements.

Guide to All-NVMe Ceph Cluster Performance

Jun 20, 2025

When you combine Ceph’s smart, distributed software with storage that can handle millions of operations per second with near-zero delay, you get a solution that can go head-to-head with high-end, specialized systems. But it’s not as simple as swapping out your old hard drives for NVMe ones. Here’s how to do it right and all the potential performance benefits you can enjoy using an all-NVMe Ceph cluster.

Cinder Volume Fails to Attach: Common Causes and Fixes

Jun 06, 2025

Frustrated by a Cinder volume that won’t attach? We’ve got you. This guide breaks down the common causes like incorrect volume states, backend config errors, and network glitches. Learn to troubleshoot and fix these attachment failures with practical CLI commands and preventative tips.

CephFS Metadata Management Explained

May 29, 2025

Explore CephFS metadata management in depth. This guide explains how CephFS separates metadata, uses Metadata Servers (MDS) with active-active setups, journaling, and caching for fast, scalable, and reliable POSIX-compliant storage. Understand its benefits for private clouds, HPC, and more.

When to Use Asynchronous Replication in OpenStack Clouds

May 06, 2025

Explore asynchronous replication in OpenStack clouds for improved application performance, cost savings, and flexible disaster recovery. Learn its benefits, common use cases with Cinder and Swift, conceptual setup, and key considerations like managing RPO and resource usage for a resilient deployment.

Setting Up and Managing Ceph RADOS Gateway (RGW) in OpenStack

May 01, 2025

Learn to deploy Ceph RGW in OpenStack for scalable S3 and Swift-compatible object storage. Covers installation, configuration, quotas, Keystone integration, integrating Glance images and Cinder backups, performance monitoring and tuning, advanced multi-site replication, lifecycle rules, encryption, and tagging.

OpenMetal Enterprise Storage Tier Offerings and Architecture

Apr 29, 2025

Discover how OpenMetal delivers performance and flexibility through tiered cloud storage options. Learn the pros and use cases of direct-attached NVMe, Ceph-based high availability block storage, and scalable, low-cost erasure-coded object storage—all integrated into OpenStack.

Storage Server – Large V4 – 264TB HDD, 25.6TB NVME – Micron MAX or Pro, 5th Gen Intel® Xeon Silver 4510

Apr 28, 2025

Discover the power of the OpenMetal Large v4 Storage Server with dual Intel Xeon Silver 4510 CPUs, 720TB HDD storage, 76.8TB NVMe flash, and 512GB DDR5 RAM. Perfect for building high-performance, scalable, and resilient storage clusters for cloud, AI/ML, and enterprise data lakes.