In this article
Regularly testing your Ceph storage is key to identifying performance bottlenecks, optimizing configurations, and ensuring your cluster meets application demands. This guide, based on our experience at OpenMetal, will walk you through preparing your environment, selecting the right tools, and running benchmarks effectively.
Preparing Your Benchmarking Environment
The accuracy of your benchmarks hinges on how closely your test setup replicates production conditions. A well-prepared environment leads to reliable results, making it important to mirror your production setup as much as possible.
System Requirements and Prerequisites
To get started, make sure your Ceph cluster is running on supported hardware, with root-level access available for benchmarking. Most benchmarking tools require elevated privileges to directly interact with storage devices and system resources.
Hardware selection plays a major role in determining benchmark outcomes. At OpenMetal, our standard configurations use high-performance hardware, like Micron 7450 MAX NVMe drives for OSDs and low-latency, high-speed networking (up to 100 Gbps), which we have validated for Ceph performance. For best results, use SSDs for Ceph Monitor, Ceph Manager, CephFS Metadata Server metadata pools, and Ceph Object Gateway (RGW) index pools.
Storage drives should be dedicated to specific tasks: one for the operating system, another for OSD data, and a separate one for BlueStore WAL+DB. This separation minimizes interference and ensures smoother performance.
Install benchmarking tools such as FIO, rados bench, and COSBench or GOSBench on dedicated client machines. These tools should not run on Ceph cluster nodes to avoid resource conflicts.
Ensure your Ceph storage interfaces—whether block, file, or object—are correctly configured and accessible from your test clients. For more on the distinctions between storage types, you can check out our article on Block Storage vs. Object Storage. Finally, create an isolated test environment that matches your production hardware to validate these prerequisites.
Test Environment Configuration
Once you’ve met the hardware and software requirements, isolate your testing environment to achieve consistent results. Workload isolation is key to preventing production traffic from interfering with your benchmarks, ensuring that your data reflects true storage performance.
Set up dedicated test nodes that closely match your production hardware. This includes aligning CPU cores, memory capacity, network interfaces, and storage types.
Use a dedicated network for Ceph cluster traffic to reduce latency. Split client and recovery traffic across separate network interfaces to avoid bandwidth contention. A 10 Gbps network may struggle under heavy loads, while upgrading to 100 Gbps can significantly improve performance.
Your storage configuration should reflect the specifics of your production setup. Choose between erasure coding and replication based on your workload. Replication generally performs better in write-heavy scenarios, whereas erasure coding is better suited for read-heavy workloads.
Adjust the number of placement groups per OSD to strike a balance between performance and resource usage. This tuning affects data distribution across your cluster and can influence your results.
Set workload parameters that align with your actual use cases. Match I/O patterns, block sizes, queue depths, and concurrency levels to what your applications typically generate. Avoid relying on synthetic workloads that don’t represent real-world usage.
Benchmarking Tools and Selection Methods
Picking the right benchmarking tool is key to gathering reliable performance data from your Ceph storage cluster.
Available Benchmarking Tools
- RADOS bench is Ceph’s built-in tool for testing the RADOS layer.
- FIO (Flexible I/O Tester) is a versatile tool for simulating I/O patterns on both CephFS and Ceph Block Devices.
- COSBench and GOSBench are tailored for benchmarking the Ceph Object Gateway (RGW).
- s3cmd provides a simpler way to benchmark the Ceph Object Gateway by measuring the speed of
get
andput
requests.
Choosing the Right Tool for Your Storage Type
- For RADOS cluster testing, RADOS bench is your go-to option. It’s a native Ceph utility that provides a direct look at your cluster’s core performance. For a deeper dive, read our introduction to Ceph architecture.
- For block storage benchmarking, use FIO for simulating complex I/O patterns that mimic real-world workloads.
- For file storage performance with CephFS, FIO is the ideal tool.
- For object storage benchmarking, tools like COSBench, GOSBench, or s3cmd are best.
How to Run Ceph Storage Benchmarks
Once your environment is set up, it’s time to start running benchmarks.
FIO Benchmark Testing
The FIO tool is ideal for testing Ceph Block Devices and CephFS. Start with a 4k block size and gradually increase it (e.g., 4k, 8k, 16k) to determine the best size for your workload.
To test random write performance, use the following command:
fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=4k --iodepth=32 --size=5G --runtime=60 --group_reporting=1
RADOS Performance Testing with rados bench
The rados bench
tool is used to measure the performance of the RADOS cluster itself. For write tests, use the --no-cleanup
option to keep the test data for subsequent read tests. For example:
rados bench -p your_pool 600 write -t 16 --object_size=4MB --no-cleanup
Once the write test is complete, you can measure read performance:
rados bench -p your_pool 60 seq -t 16 --object_size=4MB
Object Storage Testing with COSBench and GOSBench
To benchmark Ceph’s object storage via the RADOS Gateway, tools like COSBench and GOSBench are commonly used. Both coordinate workers to perform operations like read, write, delete, and list on your object storage endpoints.
Best Practices for Reliable Results
Getting accurate benchmarking results requires a controlled environment and a consistent approach.
Workload Isolation Techniques
Workload isolation is important to prevent interference from other applications or background tasks. Using techniques like container-based isolation with Docker can provide a controlled environment for benchmarking.
Creating Performance Baselines
Performance baselines are essential for evaluating whether configuration tweaks genuinely improve system performance. As Klara Systems notes, “Effective storage benchmarking requires a structured approach – defining scope, designing realistic tests, and ensuring repeatability.”
Running Multiple Test Iterations
A single test run isn’t enough. System performance can vary, so running multiple iterations helps account for this variability and identifies outliers that could distort your data.
Wrapping Up: Benchmarking Ceph Storage Performance
The benchmarking processes we’ve discussed are the same ones we use at OpenMetal to validate our own cloud infrastructure. This ensures that when you deploy an OpenMetal private cloud, you get a storage system that is already optimized for predictable, high performance, removing the complexity and guesswork for your team.
Regular benchmarking allows you to monitor the effects of configuration changes and ensures your Ceph storage continues to meet your data requirements. This process becomes even more important as your private cloud grows, helping you identify and address bottlenecks before they disrupt production workloads.
FAQs
What’s the difference between COSBench and GOSBench for testing Ceph object storage performance?
COSBench is a widely used, Java-based tool for assessing cloud object storage performance. GOSBench, written in Golang, is a more modern alternative that often delivers better performance and scalability in demanding scenarios.
How can I set up a benchmarking environment that accurately reflects my production setup?
To get reliable results, recreate your production environment as closely as possible. This includes using the same hardware configurations, network setup, and software versions. Simulating production workloads and including routine maintenance tasks will help ensure your testing environment reflects real-world conditions.
How can I isolate workloads during Ceph storage benchmarking?
Allocate specific hardware resources—like separate CPU cores and network interfaces—exclusively for the benchmarking process. Set up dedicated networks for distinct traffic types, such as cluster, public, and client traffic.
Schedule a Consultation
Get a deeper assessment and discuss your unique requirements.
Read More on the OpenMetal Blog