How to Benchmark Ceph Storage Performance

Resources » Blog » How to Benchmark Ceph Storage Performance

In this article

Preparing Your Benchmarking Environment
Benchmarking Tools and Selection Methods
How to Run Ceph Storage Benchmarks
Best Practices for Reliable Results
Wrapping Up: Benchmarking Ceph Storage Performance
FAQs
Interested in OpenMetal’s Hosted Private Cloud Powered by OpenStack and Ceph?

Regularly testing your Ceph storage is key to identifying performance bottlenecks, optimizing configurations, and ensuring your cluster meets application demands. This guide, based on our experience at OpenMetal, will walk you through preparing your environment, selecting the right tools, and running benchmarks effectively.

Preparing Your Benchmarking Environment

The accuracy of your benchmarks hinges on how closely your test setup replicates production conditions. A well-prepared environment leads to reliable results, making it important to mirror your production setup as much as possible.

System Requirements and Prerequisites

To get started, make sure your Ceph cluster is running on supported hardware, with root-level access available for benchmarking. Most benchmarking tools require elevated privileges to directly interact with storage devices and system resources.

Hardware selection plays a major role in determining benchmark outcomes. At OpenMetal, our standard configurations use high-performance hardware, like Micron 7450 MAX NVMe drives for OSDs and low-latency, high-speed networking (up to 100 Gbps), which we have validated for Ceph performance. For best results, use SSDs for Ceph Monitor, Ceph Manager, CephFS Metadata Server metadata pools, and Ceph Object Gateway (RGW) index pools.

Storage drives should be dedicated to specific tasks: one for the operating system, another for OSD data, and a separate one for BlueStore WAL+DB. This separation minimizes interference and ensures smoother performance.

Install benchmarking tools such as FIO, rados bench, and COSBench or GOSBench on dedicated client machines. These tools should not run on Ceph cluster nodes to avoid resource conflicts.

Ensure your Ceph storage interfaces—whether block, file, or object—are correctly configured and accessible from your test clients. For more on the distinctions between storage types, you can check out our article on Block Storage vs. Object Storage. Finally, create an isolated test environment that matches your production hardware to validate these prerequisites.

Test Environment Configuration

Once you’ve met the hardware and software requirements, isolate your testing environment to achieve consistent results. Workload isolation is key to preventing production traffic from interfering with your benchmarks, ensuring that your data reflects true storage performance.

Set up dedicated test nodes that closely match your production hardware. This includes aligning CPU cores, memory capacity, network interfaces, and storage types.

Use a dedicated network for Ceph cluster traffic to reduce latency. Split client and recovery traffic across separate network interfaces to avoid bandwidth contention. A 10 Gbps network may struggle under heavy loads, while upgrading to 100 Gbps can significantly improve performance.

Your storage configuration should reflect the specifics of your production setup. Choose between erasure coding and replication based on your workload. Replication generally performs better in write-heavy scenarios, whereas erasure coding is better suited for read-heavy workloads.

Adjust the number of placement groups per OSD to strike a balance between performance and resource usage. This tuning affects data distribution across your cluster and can influence your results.

Set workload parameters that align with your actual use cases. Match I/O patterns, block sizes, queue depths, and concurrency levels to what your applications typically generate. Avoid relying on synthetic workloads that don’t represent real-world usage.

Benchmarking Tools and Selection Methods

Picking the right benchmarking tool is key to gathering reliable performance data from your Ceph storage cluster.

Available Benchmarking Tools

RADOS bench is Ceph’s built-in tool for testing the RADOS layer.
FIO (Flexible I/O Tester) is a versatile tool for simulating I/O patterns on both CephFS and Ceph Block Devices.
COSBench and GOSBench are tailored for benchmarking the Ceph Object Gateway (RGW).
s3cmd provides a simpler way to benchmark the Ceph Object Gateway by measuring the speed of get and put requests.

Choosing the Right Tool for Your Storage Type

For RADOS cluster testing, RADOS bench is your go-to option. It’s a native Ceph utility that provides a direct look at your cluster’s core performance. For a deeper dive, read our introduction to Ceph architecture.
For block storage benchmarking, use FIO for simulating complex I/O patterns that mimic real-world workloads.
For file storage performance with CephFS, FIO is the ideal tool.
For object storage benchmarking, tools like COSBench, GOSBench, or s3cmd are best.

How to Run Ceph Storage Benchmarks

Once your environment is set up, it’s time to start running benchmarks.

FIO Benchmark Testing

The FIO tool is ideal for testing Ceph Block Devices and CephFS. Start with a 4k block size and gradually increase it (e.g., 4k, 8k, 16k) to determine the best size for your workload.

To test random write performance, use the following command:

fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=4k --iodepth=32 --size=5G --runtime=60 --group_reporting=1

RADOS Performance Testing with rados bench

The rados bench tool is used to measure the performance of the RADOS cluster itself. For write tests, use the --no-cleanup option to keep the test data for subsequent read tests. For example:

rados bench -p your_pool 600 write -t 16 --object_size=4MB --no-cleanup

Once the write test is complete, you can measure read performance:

rados bench -p your_pool 60 seq -t 16 --object_size=4MB

Object Storage Testing with COSBench and GOSBench

To benchmark Ceph’s object storage via the RADOS Gateway, tools like COSBench and GOSBench are commonly used. Both coordinate workers to perform operations like read, write, delete, and list on your object storage endpoints.

Best Practices for Reliable Results

Getting accurate benchmarking results requires a controlled environment and a consistent approach.

Workload Isolation Techniques

Workload isolation is important to prevent interference from other applications or background tasks. Using techniques like container-based isolation with Docker can provide a controlled environment for benchmarking.

Creating Performance Baselines

Performance baselines are essential for evaluating whether configuration tweaks genuinely improve system performance. As Klara Systems notes, “Effective storage benchmarking requires a structured approach – defining scope, designing realistic tests, and ensuring repeatability.”

Running Multiple Test Iterations

A single test run isn’t enough. System performance can vary, so running multiple iterations helps account for this variability and identifies outliers that could distort your data.

Wrapping Up: Benchmarking Ceph Storage Performance

The benchmarking processes we’ve discussed are the same ones we use at OpenMetal to validate our own cloud infrastructure. This ensures that when you deploy an OpenMetal private cloud, you get a storage system that is already optimized for predictable, high performance, removing the complexity and guesswork for your team.

Regular benchmarking allows you to monitor the effects of configuration changes and ensures your Ceph storage continues to meet your data requirements. This process becomes even more important as your private cloud grows, helping you identify and address bottlenecks before they disrupt production workloads.

FAQs

What’s the difference between COSBench and GOSBench for testing Ceph object storage performance?

COSBench is a widely used, Java-based tool for assessing cloud object storage performance. GOSBench, written in Golang, is a more modern alternative that often delivers better performance and scalability in demanding scenarios.

How can I set up a benchmarking environment that accurately reflects my production setup?

To get reliable results, recreate your production environment as closely as possible. This includes using the same hardware configurations, network setup, and software versions. Simulating production workloads and including routine maintenance tasks will help ensure your testing environment reflects real-world conditions.

How can I isolate workloads during Ceph storage benchmarking?

Allocate specific hardware resources—like separate CPU cores and network interfaces—exclusively for the benchmarking process. Set up dedicated networks for distinct traffic types, such as cluster, public, and client traffic.

Interested in OpenMetal’s Hosted Private Cloud Powered by OpenStack and Ceph?

Chat With Our Team

We’re available to answer questions and provide information.

Chat With Us

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options

How to Benchmark Ceph Storage Performance

Preparing Your Benchmarking Environment

System Requirements and Prerequisites

Test Environment Configuration

Benchmarking Tools and Selection Methods

Available Benchmarking Tools

Choosing the Right Tool for Your Storage Type

How to Run Ceph Storage Benchmarks

FIO Benchmark Testing

RADOS Performance Testing with rados bench

Object Storage Testing with COSBench and GOSBench

Best Practices for Reliable Results

Workload Isolation Techniques

Creating Performance Baselines

Running Multiple Test Iterations

Wrapping Up: Benchmarking Ceph Storage Performance

FAQs

What’s the difference between COSBench and GOSBench for testing Ceph object storage performance?

How can I set up a benchmarking environment that accurately reflects my production setup?

How can I isolate workloads during Ceph storage benchmarking?

Interested in OpenMetal’s Hosted Private Cloud Powered by OpenStack and Ceph?

Chat With Our Team

Schedule a Consultation

Try It Out

How to Benchmark Ceph Storage Performance

Integrating Your Data Lake and Data Warehouse on OpenMetal

Ceph RBD Mirroring for Disaster Recovery

Guide to All-NVMe Ceph Cluster Performance

Cinder Volume Fails to Attach: Common Causes and Fixes

CephFS Metadata Management Explained

When to Use Asynchronous Replication in OpenStack Clouds

Setting Up and Managing Ceph RADOS Gateway (RGW) in OpenStack

OpenMetal Enterprise Storage Tier Offerings and Architecture

Storage Server – Large V4 – 264TB HDD, 25.6TB NVME – Micron MAX or Pro, 5th Gen Intel® Xeon Silver 4510