Regularly testing your Ceph storage is key to identifying performance bottlenecks, optimizing configurations, and ensuring your cluster meets application demands. This guide, based on our experience at OpenMetal, will walk you through preparing your environment, selecting the right tools, and running benchmarks effectively.

Preparing Your Benchmarking Environment

The accuracy of your benchmarks hinges on how closely your test setup replicates production conditions. A well-prepared environment leads to reliable results, making it important to mirror your production setup as much as possible.

System Requirements and Prerequisites

To get started, make sure your Ceph cluster is running on supported hardware, with root-level access available for benchmarking. Most benchmarking tools require elevated privileges to directly interact with storage devices and system resources.

Hardware selection plays a major role in determining benchmark outcomes. At OpenMetal, our standard configurations use high-performance hardware, like Micron 7450 MAX NVMe drives for OSDs and low-latency, high-speed networking (up to 100 Gbps), which we have validated for Ceph performance. For best results, use SSDs for Ceph Monitor, Ceph Manager, CephFS Metadata Server metadata pools, and Ceph Object Gateway (RGW) index pools.

Storage drives should be dedicated to specific tasks: one for the operating system, another for OSD data, and a separate one for BlueStore WAL+DB. This separation minimizes interference and ensures smoother performance.

Install benchmarking tools such as FIO, rados bench, and COSBench or GOSBench on dedicated client machines. These tools should not run on Ceph cluster nodes to avoid resource conflicts.

Ensure your Ceph storage interfaces—whether block, file, or object—are correctly configured and accessible from your test clients. For more on the distinctions between storage types, you can check out our article on Block Storage vs. Object Storage. Finally, create an isolated test environment that matches your production hardware to validate these prerequisites.

Test Environment Configuration

Once you’ve met the hardware and software requirements, isolate your testing environment to achieve consistent results. Workload isolation is key to preventing production traffic from interfering with your benchmarks, ensuring that your data reflects true storage performance.

Set up dedicated test nodes that closely match your production hardware. This includes aligning CPU cores, memory capacity, network interfaces, and storage types.

Use a dedicated network for Ceph cluster traffic to reduce latency. Split client and recovery traffic across separate network interfaces to avoid bandwidth contention. A 10 Gbps network may struggle under heavy loads, while upgrading to 100 Gbps can significantly improve performance.

Your storage configuration should reflect the specifics of your production setup. Choose between erasure coding and replication based on your workload. Replication generally performs better in write-heavy scenarios, whereas erasure coding is better suited for read-heavy workloads.

Adjust the number of placement groups per OSD to strike a balance between performance and resource usage. This tuning affects data distribution across your cluster and can influence your results.

Set workload parameters that align with your actual use cases. Match I/O patterns, block sizes, queue depths, and concurrency levels to what your applications typically generate. Avoid relying on synthetic workloads that don’t represent real-world usage.

Benchmarking Tools and Selection Methods

Picking the right benchmarking tool is key to gathering reliable performance data from your Ceph storage cluster.

Available Benchmarking Tools

  • RADOS bench is Ceph’s built-in tool for testing the RADOS layer.
  • FIO (Flexible I/O Tester) is a versatile tool for simulating I/O patterns on both CephFS and Ceph Block Devices.
  • COSBench and GOSBench are tailored for benchmarking the Ceph Object Gateway (RGW).
  • s3cmd provides a simpler way to benchmark the Ceph Object Gateway by measuring the speed of get and put requests.

Choosing the Right Tool for Your Storage Type

  • For RADOS cluster testing, RADOS bench is your go-to option. It’s a native Ceph utility that provides a direct look at your cluster’s core performance. For a deeper dive, read our introduction to Ceph architecture.
  • For block storage benchmarking, use FIO for simulating complex I/O patterns that mimic real-world workloads.
  • For file storage performance with CephFS, FIO is the ideal tool.
  • For object storage benchmarking, tools like COSBench, GOSBench, or s3cmd are best.

How to Run Ceph Storage Benchmarks

Once your environment is set up, it’s time to start running benchmarks.

FIO Benchmark Testing

The FIO tool is ideal for testing Ceph Block Devices and CephFS. Start with a 4k block size and gradually increase it (e.g., 4k, 8k, 16k) to determine the best size for your workload.

To test random write performance, use the following command:

fio --name=randwrite --rw=randwrite --direct=1 --ioengine=libaio --bs=4k --iodepth=32 --size=5G --runtime=60 --group_reporting=1

RADOS Performance Testing with rados bench

The rados bench tool is used to measure the performance of the RADOS cluster itself. For write tests, use the --no-cleanup option to keep the test data for subsequent read tests. For example:

rados bench -p your_pool 600 write -t 16 --object_size=4MB --no-cleanup

Once the write test is complete, you can measure read performance:

rados bench -p your_pool 60 seq -t 16 --object_size=4MB

Object Storage Testing with COSBench and GOSBench

To benchmark Ceph’s object storage via the RADOS Gateway, tools like COSBench and GOSBench are commonly used. Both coordinate workers to perform operations like read, write, delete, and list on your object storage endpoints.

Best Practices for Reliable Results

Getting accurate benchmarking results requires a controlled environment and a consistent approach.

Workload Isolation Techniques

Workload isolation is important to prevent interference from other applications or background tasks. Using techniques like container-based isolation with Docker can provide a controlled environment for benchmarking.

Creating Performance Baselines

Performance baselines are essential for evaluating whether configuration tweaks genuinely improve system performance. As Klara Systems notes, “Effective storage benchmarking requires a structured approach – defining scope, designing realistic tests, and ensuring repeatability.”

Running Multiple Test Iterations

A single test run isn’t enough. System performance can vary, so running multiple iterations helps account for this variability and identifies outliers that could distort your data.

Wrapping Up: Benchmarking Ceph Storage Performance

The benchmarking processes we’ve discussed are the same ones we use at OpenMetal to validate our own cloud infrastructure. This ensures that when you deploy an OpenMetal private cloud, you get a storage system that is already optimized for predictable, high performance, removing the complexity and guesswork for your team.

Regular benchmarking allows you to monitor the effects of configuration changes and ensures your Ceph storage continues to meet your data requirements. This process becomes even more important as your private cloud grows, helping you identify and address bottlenecks before they disrupt production workloads.

FAQs

What’s the difference between COSBench and GOSBench for testing Ceph object storage performance?

COSBench is a widely used, Java-based tool for assessing cloud object storage performance. GOSBench, written in Golang, is a more modern alternative that often delivers better performance and scalability in demanding scenarios.

How can I set up a benchmarking environment that accurately reflects my production setup?

To get reliable results, recreate your production environment as closely as possible. This includes using the same hardware configurations, network setup, and software versions. Simulating production workloads and including routine maintenance tasks will help ensure your testing environment reflects real-world conditions.

How can I isolate workloads during Ceph storage benchmarking?

Allocate specific hardware resources—like separate CPU cores and network interfaces—exclusively for the benchmarking process. Set up dedicated networks for distinct traffic types, such as cluster, public, and client traffic.

Interested in OpenMetal’s Hosted Private Cloud Powered by OpenStack and Ceph?

Chat With Our Team

We’re available to answer questions and provide information.

Chat With Us

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Consultation

Try It Out

Take a peek under the hood of our cloud platform or launch a trial.

Trial Options

 

 

 Read More on the OpenMetal Blog

Confidential Workloads on Private Cloud: Leveraging OpenStack for Security and Control

Sep 25, 2025

Learn how private cloud infrastructure powered by OpenStack delivers the security, compliance, and control that confidential workloads require – from healthcare to finance to blockchain applications.

Optimizing Ceph Storage Efficiency with Erasure Coding for Enterprise Workloads

Sep 23, 2025

Discover how erasure coding in Ceph storage clusters can cut your enterprise storage hardware costs in half while maintaining the same level of fault tolerance. This comprehensive guide covers implementation strategies, performance trade-offs, and best practices for maximizing storage efficiency.

Beyond Hosting: Building Blockchain Infrastructure Stacks with Compute, Storage, and Networking Control

Sep 23, 2025

Discover how blockchain teams build complete infrastructure stacks using dedicated compute, storage, and networking instead of basic hosting. Learn why validator nodes, RPC endpoints, and data-heavy applications need integrated infrastructure control to achieve predictable performance and scale reliably.

Confidential Computing for Healthcare AI: Training Models on PHI Without Public Cloud Risk

Sep 17, 2025

Healthcare organizations can now train AI models on sensitive patient data without exposing it to public cloud vulnerabilities. Confidential computing creates hardware-protected environments where PHI remains secure during processing, enabling breakthrough AI development while maintaining HIPAA compliance and reducing regulatory overhead.

Micron MAX 7500 NVMe Enterprise Storage Details and Performance

Sep 15, 2025

The Micron 7500 MAX delivers sub-1ms latency with 232-layer NAND technology. Learn how OpenMetal’s custom configurations maximize performance for enterprise workloads including databases, AI, and blockchain applications.

Confidential Computing for High-Performance Workloads: Balancing Security, GPUs, and Speed

Sep 10, 2025

Learn how confidential computing enables secure, high-performance workloads by combining Intel TDX hardware isolation with GPU acceleration. Explore real-world applications in AI training, blockchain validation, and financial analytics while maintaining data confidentiality and computational speed.

Private Cloud for Confidential Computing: Building a Controlled Environment for Sensitive Data

Sep 02, 2025

Discover how private cloud infrastructure provides the controlled environment needed for confidential computing workloads. Learn about hardware isolation, network security, and why dedicated infrastructure beats public cloud for sensitive data processing.

Building a Petabyte-Scale Video Archive with Fixed-Cost Ceph

Aug 29, 2025

Discover how to build cost-effective petabyte-scale video archives using Ceph storage with fixed monthly costs, generous egress allowances, and erasure coding that nearly doubles usable storage capacity compared to traditional cloud providers.

Ceph Clusters for Blockchain: Scalable Storage for Nodes, State, and Historical Data

Aug 28, 2025

Blockchain infrastructure demands storage that scales with validator nodes, terabytes of historical data, and unpredictable state growth. Ceph distributed storage offers a unified solution that handles these challenges while eliminating the unpredictable costs and performance bottlenecks of traditional cloud storage.

Micron 7500 MAX: Dual Drive vs Single Drive Architecture on Bare Metal for Mission-Critical Databases

Aug 27, 2025

Discover why dual smaller NVMe drives outperform single larger drives for enterprise databases on bare metal infrastructure. Learn about workload separation, parallel I/O, and endurance benefits with Micron 7500 MAX SSDs on OpenMetal’s XLv4 and XXLv4 servers for mission-critical applications.