ClickHouse is a powerful and free open source database management system (DBMS) that can handle tons of data and complex queries at super fast speeds. Originally created by Yandex for their web analytics platform, it’s now a popular choice for businesses that need real-time insights and efficient data warehousing.

OpenMetal offers a unique approach to cloud infrastructure, providing high-powered, enterprise-level hosted bare metal servers with custom automation behind them. This gives you an unmatched combo of performance, control, flexibility, and speed, making it a perfect platform for self-hosting ClickHouse.

This guide will walk you through the entire process of setting up ClickHouse on an OpenMetal server. Whether you’re a seasoned developer or a system administrator, you’ll have the knowledge and tools to deploy the ideal ClickHouse instance for your company.

Why Choose OpenMetal for ClickHouse Hosting?

OpenMetal offers several advantages for self-hosting ClickHouse:

  • Bare Metal Performance: By accessing hardware directly, you avoid the overhead of virtualization. This allows ClickHouse to fully tap into the server’s resources for faster query processing.
  • Hardware Control: Choose from a range of servers and customize them with the CPU, RAM, and storage specs you need. This flexibility lets you exactly fit the hardware to your workload and optimize for cost and performance.
  • Cost-Effectiveness: OpenMetal’s transparent pricing and flexible configurations help keep infrastructure costs both reasonable and predictable. You can avoid the high and fluctuating costs of public cloud providers and pay only for the resources needed.
  • Private Cloud Flexibility: You have the option to combine bare metal servers with on-demand hosted private cloud for extra scalability and varying styles of resource management. You can easily scale your ClickHouse deployment as your data and query volume grow.
  • Open Source Ecosystem: OpenMetal’s commitment to open source technologies makes integrating with ClickHouse and other tools easy. Choosing open source also avoids vendor lock-in, letting you pick the best tools for your needs.

Why Self-Host ClickHouse?

While managed ClickHouse solutions exist, self-hosting has a number of benefits:

  • Cost Control: OpenMetal’s bare metal offerings are competitively and transparently priced for high-powered servers. You can also choose from monthly billing all the way up to a five year agreement, letting you lock in your price and avoid increases.
  • Customization: Tailor hardware and software configurations completely to your needs to maximize performance and efficiency.
  • Data Security: You have complete control over your data and infrastructure, ensuring compliance with internal security policies and industry regulations.
  • Performance Optimization: Fine-tune the system to achieve the best performance for your workloads and data access patterns.
  • High-Performance Analytics: ClickHouse is designed to address the need for high-performance analytics on large datasets, particularly for real-time applications.
  • Handle High-Cardinality Data and Dimensionality: ClickHouse can efficiently store and query data with a large number of columns and wide events.
  • Versatile Use Cases: ClickHouse can be used for a range of applications, including log storage as an alternative to Elasticsearch, unbounded analytics for analyzing large volumes of unsampled tracing data, and as a data warehouse for in-house analysts.

High Availability Considerations

For production deployments, high availability is needed for data consistency and minimizing downtime. Consider these factors:

  • Redundancy and Failover: Impart redundancy by deploying ClickHouse in a cluster with multiple replicas. This ensures that if one node fails, the others can take over.
  • ClickHouse Keeper: ClickHouse Keeper plays a major role in fault tolerance and data consistency in ClickHouse clusters. It manages metadata, coordinates replication, and handles failover scenarios.
  • Upgrades: Upgrading ClickHouse clusters can be complex. Plan upgrades carefully and test thoroughly to minimize disruptions.
  • Write/Read Services: For high-scale applications, consider dedicated write/read services to handle streaming ingestion and efficiently expose the query engine.
  • Observability: Use monitoring and troubleshooting tools to ensure the health and performance of your ClickHouse deployment.

Prerequisites to Install ClickHouse on OpenMetal

Before you begin, ensure you have the following:

  • An active OpenMetal account. Create your account here.
  • An SSH key pair for accessing your OpenMetal server.
  • Basic familiarity with Linux command line and server administration.

Step 1: Choose Your OpenMetal Server

ClickHouse can be resource-intensive, so selecting the right server is a must for optimal performance. Think about these factors when choosing your OpenMetal server:

  • CPU: ClickHouse benefits from high core counts and clock speeds. For small deployments, a server with at least 8 cores is recommended. For enterprise-level deployments, consider servers with 32 or more cores.
  • RAM: A general guideline is to have a 1:100 to 1:130 memory-to-storage ratio for data warehousing use cases. For example, if you plan to store 10TB of data, aim for 100GB of RAM per replica. For customer-facing workloads with frequent access, a 1:30 to 1:50 ratio is recommended.
  • Storage: ClickHouse performs best with fast storage. NVMe SSDs are ideal for high-performance deployments. Currently, all of OpenMetal’s bare metal server options include NVMe SSDs.
  • Network: A 10Gbps network is recommended for optimal performance, especially for large deployments with frequent data transfers. Our servers include a generous allotment of bandwidth, so with all but our XS servers you’ll have 20Gbps internal private bandwidth available out of the gate.

Refer to the OpenMetal Bare Metal Pricing page to explore available server options and choose the one that best suits your needs. We’re also happy to chat with you or meet with you to help you figure out the best choice.

Here are a few recommended server configurations for different deployment sizes:

Deployment Size

CPU (Cores)

RAM (GB)

Storage

Network

OpenMetal Server Options

Small8+32+NVMe SSD1Gbps or 10GbpsSmall, Medium
Medium16+64+NVMe SSD10GbpsMedium, Large
Enterprise

32+

128+NVMe SSD in RAID10GbpsXL, XXL

 

Step 2: Provision Your OpenMetal Server

Once you’ve chosen your server, follow these steps to provision it:

  1. Log in to your OpenMetal account.
  2. Navigate to the Bare Metal section.
    ClickHouse Deploy Bare Metal
  3. Select the desired server configuration and operating system. ClickHouse is compatible with various Linux distributions, including Ubuntu, CentOS, and Debian. We recommend using Ubuntu as it’s highly compatible with tools like Kubernetes and Docker, plus has solid security features and efficient resource utilization.
    ClickHouse Bare Metal Hardware
  4. Configure network settings, including IP addresses and VLANs.
  5. Assign your SSH key for server access.
  6. Review your order and confirm the provisioning.

OpenMetal’s automated platform will provision your server in minutes, providing you with a ready-to-use environment for ClickHouse installation.

Step 3: Install ClickHouse

After your server is provisioned, connect to it via SSH and follow these steps in Bash to install ClickHouse:

  1. Update the system:
    sudo apt update && sudo apt upgrade -y
  2. Install required packages:
    sudo apt-get install -y apt-transport-https ca-certificates curl gnupg
  3. Add the ClickHouse repository:
    curl -fsSL 'https://packages.clickhouse.com/rpm/lts/repodata/repomd.xml.key' | sudo gpg --dearmor -o /usr/share/keyrings/clickhouse-keyring.gpg
    ARCH=$(dpkg --print-architecture)
    echo "deb [signed-by=/usr/share/keyrings/clickhouse-keyring.gpg arch=${ARCH}] https://packages.clickhouse.com/deb stable main" | sudo tee /etc/apt/sources.list.d/clickhouse.list
    sudo apt-get update
  4. Install ClickHouse server and client:
    sudo apt-get install -y clickhouse-server clickhouse-client
  5. Install ClickHouse Keeper (for cluster deployments): If you are deploying ClickHouse in a cluster, you’ll need to install ClickHouse Keeper on dedicated nodes.
    sudo apt-get install -y clickhouse-keeper
  6. Start the ClickHouse server:
    sudo service clickhouse-server start
  7. Enable and start ClickHouse Keeper (for cluster deployments): If you installed ClickHouse Keeper, enable and start it using these commands.
    sudo systemctl enable clickhouse-keeper
    sudo systemctl start clickhouse-keeper
    sudo systemctl status clickhouse-keeper
  8. Connect to ClickHouse client:
    clickhouse-client

You should now have a running ClickHouse instance on your OpenMetal server!

ClickHouse Clients

ClickHouse provides clients for connecting to the database and executing queries:

  • Console Client (clickhouse-client): A command-line interface for interacting with ClickHouse.
  • HTTP API: ClickHouse exposes an HTTP API for interacting with the database programmatically.
  • Language Wrappers: ClickHouse provides language wrappers for various programming languages, including Python, PHP, NodeJS, Perl, Ruby, and R.

Choose the client that best fits your needs and workflow.

ClickHouse Keeper

ClickHouse Keeper provides fault tolerance and data consistency in ClickHouse clusters. Some of its benefits are:

  • Metadata Management: ClickHouse Keeper stores metadata about the cluster, including information about tables, partitions, and replicas.
  • Replication Coordination: ClickHouse Keeper coordinates data replication between replicas, making sure that data is consistently copied across the cluster.
  • Failover Handling: In case of node failures, ClickHouse Keeper automatically handles failover, promoting a replica to become the new primary and ensuring continuous availability.

For production deployments, it’s recommended to run ClickHouse Keeper on dedicated servers like OpenMetal’s to ensure optimal performance and stability. Learn more about ClickHouse Keeper: https://clickhouse.com/clickhouse/keeper

Step 4: Configure ClickHouse

ClickHouse offers a range of configuration options to fine-tune its performance and security. The main configuration file is located at /etc/clickhouse-server/config.xml.

Here are some main configuration settings to consider:

  • max_memory_usage: Sets the maximum memory limit for a query. Adjust this based on your server’s RAM and workload characteristics.
  • max_concurrent_queries: Limits the number of concurrently running queries.
  • max_threads: Controls the number of threads used for query processing.
  • listen_host: Specifies the IP address on which ClickHouse listens for connections.
  • tcp_port: Defines the port for client connections using the native protocol.
  • http_port: Sets the port for the HTTP interface.
  • https_port: Sets the port for the HTTPS interface.
  • tcp_port_secure: Sets the port for secure client connections using the native protocol.
  • Choosing the right MergeTree engine: ClickHouse offers different MergeTree engine variants, each optimized for different data management needs. Choose the appropriate engine based on your requirements:
    • ReplacingMergeTree: For managing data deduplication.
    • CollapsingMergeTree: For handling events with a lifecycle (like state changes).
    • SummingMergeTree: For data that benefits from pre-aggregation.
  • Managing data changes: ClickHouse is optimized for immutable data. If your application requires frequent updates or deletes, consider restructuring your tables or using techniques like partitioning by date to delete or overwrite old data every now and then.
  • Partitioning: Use time-based or ID-based partitioning to enhance query performance and data management.
  • Asynchronous reads: ClickHouse supports asynchronous reads, which can improve throughput in environments with high I/O demands.

For detailed information on ClickHouse configuration settings, refer to the official documentation: https://clickhouse.com/docs/en/operations/server-configuration/settings/

Step 5: Optimize ClickHouse Performance and Security

For the best performance and security, consider these tips:

Performance Optimization

  • Batch Inserts: Insert data in batches to improve write performance.
  • Optimize Order By Granularity: Choose an appropriate granularity for the ORDER BY clause in your table definitions to balance merge performance and query performance.
  • Use Data Skipping Indices: Create data skipping indices to speed up queries that filter on high-cardinality columns.
  • Utilize Materialized Views: Create materialized views to pre-aggregate frequently accessed data and improve query performance.
  • Monitor System Performance: Use ClickHouse’s system tables and monitoring tools to identify performance bottlenecks and optimize resource utilization.
  • Keep Data in Wide Parts: Use the table-level settings min_rows_for_wide_part=0 and min_bytes_for_wide_part=0 to ensure ClickHouse keeps inserted data in the wide format.
  • Check Merge Levels: Observe merge levels to understand how many times data is re-merged within a part. Adjust the min_bytes_for_full_part_storage setting to optimize merge behavior.
  • Use Compact Column Types: Use the most compact column types possible to reduce storage requirements and improve query performance.
  • Denormalization: Consider denormalizing data to avoid slow joins on high-cardinality columns.
  • Dictionaries: Use dictionaries as an alternative to joins for frequently accessed data.
  • Approximate Count Distinct: For large datasets, use the uniq() and uniqCombined() functions for approximate distinct counts instead of COUNT DISTINCT.
  • Hash Long Strings: Hash long strings before grouping or sorting to improve efficiency.
  • Two-Pass or Multi-Pass Grouping: For heavy grouping operations, use two-pass or multi-pass grouping to improve performance.
  • Arrays: Use arrays to store nested data structures within a single column, reducing the need for complex joins.
  • Distributed Query Optimizations: Optimize distributed queries by using techniques like GLOBAL IN for efficient filtering.
  • Memory Management: Use perf top to monitor the time spent in the kernel for memory management. Run the SYSTEM JEMALLOC PURGE command to flush the memory cached by the memory allocator. Avoid using S3 or Kafka integrations on low-memory machines due to their high memory requirements.
  • CPU Management: Use the performance scaling governor for optimal CPU performance. Monitor CPU temperatures using dmesg and check for throttling due to overheating. Use turbostat to monitor CPU performance under load.

Security Best Practices

  • Enable SSL Encryption: Secure client-server communication by enabling SSL encryption for all external-facing ports.
  • Configure Access Control: Set up user authentication and authorization to restrict access to sensitive data.
  • Regularly Update ClickHouse: Keep your ClickHouse installation up-to-date with the latest security patches and bug fixes.
  • Monitor Logs for Suspicious Activity: Regularly review ClickHouse logs to detect and prevent potential security breaches.
  • Backups: Use the clickhouse-backup tool to create regular backups of your ClickHouse data.

Step 6: Troubleshooting Common Issues

Here are some common issues you might run into when installing or running ClickHouse and how to troubleshoot them:

  • Connection refused: Ensure that ClickHouse is running and listening on the correct IP address and port. Check the listen_host and tcp_port settings in the configuration file.
  • Memory limit exceeded: Adjust the max_memory_usage setting to increase the memory limit for queries. Optimize queries to reduce memory consumption.
  • Too many parts: Insert data in batches to reduce the number of MergeTree parts. Adjust MergeTree settings to optimize merge behavior.
  • Slow query performance: Profile queries to identify bottlenecks. Optimize table schema, use appropriate indices, and consider materialized views.
  • Debugging: Use the following commands for debugging ClickHouse issues:
    • SYSTEM FLUSH LOGS: Ensures that logs are written to disk for debugging purposes.
    • clickhouse-server --config-file: Starts ClickHouse with a specific configuration file for debugging purposes.
    • journalctl -u clickhouse-server: Views ClickHouse server logs for debugging purposes.

For more detailed troubleshooting information, refer to the ClickHouse documentation: https://clickhouse.com/docs/en/guides/troubleshooting

Wrapping Up: Installing Self-Hosted ClickHouse on an OpenMetal Cloud

OpenMetal provides a highly capable and affordable solution for self-hosting ClickHouse. Your business will appreciate its high-performance analytics and data warehousing capabilities, and you’ll find great performance, flexibility, and cost efficiency for your ClickHouse deployment with OpenMetal.

This guide has hopefully made it simple for you to successfully install and configure ClickHouse on OpenMetal. If you have any feedback or issues you’ve run into while following this guide, contact us!

Ready to Self-Host ClickHouse on OpenMetal Cloud?

Chat About Bare Metal

We’re available to answer questions and provide information.

Chat with Us

Request a Quote

Let us know your requirements and we’ll build you a customized quote.

Request Quote

Schedule a Consultation

Get a deeper assessment and discuss your unique requirements.

Schedule Meeting

You can also reach our team at sales@openmetal.io


 Read More on the OpenMetal Blog

How to Install ClickHouse on OpenMetal Cloud – Quick Start Guide

Jan 31, 2025

Learn how to self-host ClickHouse on OpenMetal’s bare metal servers for unmatched performance and cost-effectiveness. This step-by-step guide provides everything you need to deploy the ideal ClickHouse instance for your business.

Confidential Computing: Enhancing Data Privacy and Security in Cloud Environments

Oct 04, 2024

Learn about the need for confidential computing, its benefits, and some top industries benefiting from this technology.

Delta Lake Deployment with Spark and MLFlow on Ceph and OpenStack

Jun 12, 2024

We are creating a standard open source only install of Delta Lake, Spark, and optionally, supporting systems like MLflow.  This means we will only be installing and depending on bare metal servers, VMs on OpenStack, or open source cloud storage systems. 

Dedicated Servers for Apache Kafka – Recommended Hardware

Mar 26, 2024

With more focus on big data and the need to translate many data sources to other data consumers, Apache Kafka has emerged as the leading tool for efficiently and reliably handling this. In addition to configurations, maximizing Kafka’s capabilities is tied directly to the infrastructure you select.

What Is ClickHouse?

Mar 06, 2024

ClickHouse is an open source columnar database management system created by Yandex in 2016. ClickHouse was designed to provide users with a rapid and efficient system for processing large-scale analytical queries on enormous  volumes of data. Today, organizations use ClickHouse for data warehousing, business intelligence, and analytical processing.

Dedicated Servers for Apache Spark

Feb 07, 2024

In the landscape of big data analytics, Apache Spark has emerged as a powerful tool for in memory big data processing. The foundation for maximizing Spark’s capabilities lies in the infrastructure. OpenMetal’s XL V2.1 servers offer a solution that marries high performance with cost-effectiveness for Spark clusters.

Dedicated Servers for Hadoop

Feb 07, 2024

When it comes to processing big data, Hadoop clusters are a popular and mature open source system that enables businesses to analyze vast amounts of data efficiently.
That’s why our OpenMetal Storage XL V2 servers are designed to offer optimal performance for Hadoop environments.

Comparing Hosting Solutions for Big Data Platforms

Jul 21, 2023

This article defines big data and its applications, the big data solutions platform that process the data, and big data infrastructure requirements necessary to support operational efficiencies.

Understanding Big Data Infrastructure Options

Jun 26, 2023

This article defines big data and its applications, the big data solutions platform that process the data, and big data infrastructure requirements necessary to support operational efficiencies.