In this article
- The Technical Reality of Double Overlay Networks
- The OpenMetal Difference: Layer 2 Access
- Who Should Make This Move
- Implementation Guide: Moving to Direct Routing
- For Advanced Users: BGP Mode
- The Economic Impact of Eliminating Overlay Networks
- Performance Benefits: Real-World Impact
- Troubleshooting Common Issues
- Wrapping Up: Running Cilium with eBPF on Bare Metal
If you are deploying high-traffic microservices today, you have likely already moved to (or are seriously considering) Cilium as your CNI. Cilium uses eBPF (extended Berkeley Packet Filter) to provide kernel-level network visibility, highly efficient load balancing, and security enforcement without the overhead of traditional sidecars.
But there is a performance ceiling you hit on public clouds that often goes unaddressed.
When you deploy a Kubernetes cluster on AWS or Google Cloud, you are almost always running an overlay network (Cilium’s VXLAN or Geneve) on top of an existing overlay network (the cloud provider’s VPC). This double-encapsulation creates a specific type of overhead that eats into your CPU cycles and adds latency, effectively neutralizing some of the efficiency gains you switched to eBPF for in the first place.
Here is the technical reality of why that happens, and the steps you can take on OpenMetal to architect a cleaner, faster network stack.
The Technical Reality of Double Overlay Networks
In a standard public cloud environment, the network you see is an abstraction. Under the hood, the provider encapsulates your packets in their own headers to route traffic across their datacenter fabric.
When you add Cilium in tunneling mode (the default for most easy installations), you are wrapping your Pod-to-Pod traffic in yet another header, usually VXLAN.
The lifecycle of a packet looks like this:
- Pod Egress: Packet leaves the container
- Cilium Encapsulation: CPU interrupts occur to wrap the packet in VXLAN
- Host Egress: Packet traverses the virtualized network adapter
- Cloud Encapsulation: The cloud hypervisor wraps it again for physical wire transit
- Wire Travel: Physical transit across datacenter network
- Double Decapsulation: The reverse process happens at the destination
For standard REST APIs, this overhead is negligible. But if you are an SRE optimizing for high-throughput workloads like real-time data ingestion, AI/ML training clusters, or high-frequency trading infrastructure, that overhead becomes measurable. It manifests as SoftIRQ spikes on your nodes and increased tail latency, with typical overhead ranging from 5-15% CPU utilization for packet processing alone.
Additionally, the standard Ethernet MTU of 1500 bytes gets reduced to approximately 1450 bytes to accommodate VXLAN headers (50 bytes of overhead), which increases packet fragmentation and further degrades performance for large data transfers.
The OpenMetal Difference: Layer 2 Access
This is where OpenMetal’s architecture differs fundamentally from a hyperscaler. We do not just rent you a VM. We provide dedicated bare metal infrastructure with true Layer 2 networking access.
When you provision bare metal servers with us, such as a cluster of Large V4 nodes, you are assigned a dedicated Layer 2 VLAN that isolates your traffic between physical servers. You have dual 10 Gbps private links per server (20 Gbps total aggregate bandwidth) that are completely unmetered for private traffic.
Because you have control over this Layer 2 domain, you are not forced to use overlay networking for your Kubernetes cluster. You can configure Cilium to run in Direct Routing or Native Routing mode, eliminating an entire layer of encapsulation. For more guidance on deploying Kubernetes on OpenMetal infrastructure, see our comprehensive Kubernetes deployment guides.
Public Cloud vs OpenMetal: Network Architecture Comparison
| Aspect | AWS/GCP | OpenMetal |
|---|---|---|
| Network Access | Virtualized VPC with cloud overlay | Direct Layer 2 access on dedicated VLAN |
| Encapsulation | Double overlay (VPC + CNI) | Single layer or no overlay needed |
| MTU | Reduced to ~1450 bytes | Full 9000 bytes (Jumbo Frames) |
| Private Traffic | Charged or limited | Unmetered at 20 Gbps per server |
| Egress Pricing | $0.08-0.12/GB starting from first byte | Generous allowances included, then 95th percentile billing |
| CPU Overhead | 5-15% for packet processing | Minimal – direct kernel routing |
Who Should Make This Move
This architecture is particularly valuable for:
- Real-time data ingestion platforms processing streaming data from IoT devices, application logs, or event streams
- AI/ML training clusters moving large datasets between nodes during distributed training
- Financial services and high-frequency trading where microsecond-level latency matters
- Gaming infrastructure requiring low-latency player connections and real-time state synchronization
- Video streaming and transcoding workloads with sustained high bandwidth requirements
- Big data processing with frameworks like Spark or Hadoop that shuffle large amounts of data between workers
- High-performance computing (HPC) workloads requiring maximum compute density and network performance
Implementation Guide: Moving to Direct Routing
By disabling tunneling and utilizing the native routing capabilities of the underlying Linux kernel, packets are sent directly to the destination node’s IP. Here is how you actually implement this on OpenMetal infrastructure.
1. The Network Prerequisites
On OpenMetal, your servers are connected via a bonded interface (usually bond0 for private traffic) on a private VLAN. This is a flat Layer 2 network, meaning every node in your cluster can reach every other node via L2 switching with no routers required between them.
This provides the foundation for direct routing: all nodes are on the same broadcast domain.
2. Configuring Cilium for Native Routing
Instead of the standard installation, you need to alter your Helm values to disable the overlay and enable direct routing.
The Strategy: You want Cilium to install routes into the Linux kernel routing table of each host machine so that the host knows exactly where every Pod CIDR lives.
Here is the relevant configuration logic for your Helm values:
tunnel: disabled
autoDirectNodeRoutes: true
ipv4:
nativeRoutingCIDR: "10.0.0.0/8" # Adjust to match your Cluster/Pod CIDR
loadBalancer:
mode: dsr # Direct Server Return (optional, see note below)Why this matters:
tunnel: disabled: This stops Cilium from wrapping packets in VXLANautoDirectNodeRoutes: true: Since all your OpenMetal nodes share a common L2 segment, Cilium can automatically inject routes into the local routing table. Node A learns that Pod-CIDR-B is reachable via Node-B-IPmode: dsr: Direct Server Return preserves the client source IP and allows the backend to reply directly to the client, bypassing the load balancer on the return path. On bare metal, this can provide significant throughput improvements for specific workloads. Note: DSR mode is an advanced configuration option you can enable based on your application requirements – OpenMetal provides the infrastructure flexibility to support it, but implementation is user-managed.
3. Tuning MTU for Jumbo Frames
One of the biggest headaches in cloud VPCs is MTU management. Because of the VXLAN headers, you usually have to lower your inner packet MTU from the standard 1500 bytes down to 1450 or 1400 to avoid fragmentation.
On OpenMetal’s private network, we support Jumbo Frames (9000 MTU). Since you are not using an overlay, you can increase your interface MTU significantly.
Below is an example Netplan configuration for Ubuntu. This sets up two physical interfaces (eno1 and eno2) into a bonded pair (bond0) specifically for your private network traffic with Jumbo Frames enabled.
/etc/netplan/01-netcfg.yaml
network:
version: 2
renderer: networkd
ethernets:
eno1:
dhcp4: false
mtu: 9000
eno2:
dhcp4: false
mtu: 9000
bonds:
bond0:
interfaces: [eno1, eno2]
mtu: 9000
parameters:
mode: 802.3ad
mii-monitor-interval: 100
transmit-hash-policy: layer3+4
addresses:
- 10.10.10.5/24 # Your Private IP assigned by OpenMetal
routes:
- to: 10.10.10.0/24
via: 10.10.10.1
nameservers:
addresses: [1.1.1.1, 8.8.8.8]Note: Ensure you update the interface names (eno1, eno2) and IP addresses to match your specific hardware assignment found in OpenMetal Central.
Once this is applied with sudo netplan apply, update your Cilium config to match:
mtu: 9000This drastically reduces the number of packets required to transfer large datasets, significantly lowering CPU load. For example, a 1GB file transfer requires:
- Standard MTU (1500 bytes): ~699,051 packets
- Jumbo Frames (9000 bytes): ~116,509 packets
That’s an 83% reduction in packet count, which translates directly to reduced CPU interrupts and improved throughput.
4. Verifying Direct Routing is Working
After deploying Cilium with direct routing, verify your configuration:
# Verify no VXLAN interfaces exist
ip link show type vxlan
# Should return nothing
# Check that Cilium installed direct routes
ip route show
# You should see routes like:
# 10.244.1.0/24 via 10.10.10.6 dev bond0
# Verify MTU is set correctly
ip link show bond0 | grep mtu
# Should show: mtu 9000
# Check Cilium status
cilium status
# Should show: Routing: Direct5. Migration Path from Overlay to Direct Routing
Transitioning from overlay to direct routing can be done with minimal disruption:
- Deploy a new node pool with direct routing configuration
- Cordon old overlay nodes to prevent new pod scheduling
- Drain nodes gradually to migrate workloads to the new node pool
- Monitor connectivity between old and new nodes during migration
- Decommission overlay nodes once all workloads are migrated
This approach maintains service availability throughout the transition.
For Advanced Users: BGP Mode
If you are running a very large cluster, or multiple clusters that need to communicate within the same OpenMetal private environment, direct routing might not scale indefinitely across complex network topologies.
This is where OpenMetal’s infrastructure control shines. You can run Cilium in BGP mode. By utilizing kube-router or Cilium’s built-in BGP control plane, your nodes can announce Pod CIDRs to upstream routers. Learn more about best practices for Kubernetes and OpenStack integration.
This allows for:
- Pod IP Routability: External legacy workloads (perhaps running on an older OpenStack VM in your private cloud) can reach Pod IPs directly without NAT
- ECMP Load Balancing: You can use Equal-Cost Multi-Path routing to spread traffic across multiple links for redundancy and bandwidth aggregation
- Multi-cluster networking: Route between multiple Kubernetes clusters on the same Layer 2 segment without complex overlay meshes
The Economic Impact of Eliminating Overlay Networks
Technical efficiency correlates directly with cost efficiency on OpenMetal.
If you are running a high-bandwidth application on a public cloud, the overhead is not just CPU cycles, it’s actual dollars. Public cloud egress fees are notorious, typically starting at $0.08-0.12 per GB from the first byte. Learn more about why transparent network architecture matters.
OpenMetal’s model is built for high-traffic workloads:
Private Traffic: Unmetered. Your Cilium nodes can communicate at 20 Gbps all day long on the private VLAN at no additional cost.
Public Egress: We include generous allowances with each server. For example, a 3-node XL V4 cluster includes 12 Gbps of egress bandwidth (4 Gbps per server). With a typical traffic pattern of 70% utilization over 8 hours per day, this supports approximately 1,850 TB of monthly egress traffic included. Calculate your specific egress needs with our egress pricing calculator.
Billing Logic: We use 95th percentile billing for overages. This effectively ignores the top 5% of your traffic bursts, which is approximately 36 hours per month. If you have a massive spike during a deployment or a scheduled batch job, you are not penalized for that momentary peak bandwidth.
Cost Comparison Example
Consider a workload with 500 TB monthly egress:
| Provider | Cost Calculation | Monthly Cost |
|---|---|---|
| AWS | 500 TB × $0.09/GB = 500,000 GB × $0.09 | $45,000 |
| GCP | 500 TB × $0.08/GB = 500,000 GB × $0.08 | $40,000 |
| OpenMetal (3x XL V4) | Included in bandwidth allotment | $0 |
For workloads exceeding the included allotment, OpenMetal’s 95th percentile billing ensures you only pay for sustained usage, not temporary bursts.
Performance Benefits: Real-World Impact
Eliminating the double overlay provides measurable improvements:
Latency Reduction: Removing one layer of encapsulation/decapsulation typically reduces pod-to-pod latency by 100-300 microseconds per hop. For applications making thousands of internal service calls per user request, this compounds significantly.
CPU Efficiency: The CPU cycles saved from avoiding VXLAN processing (typically 5-15% of total CPU) can be redirected to your actual workload, effectively increasing your cluster capacity without adding hardware.
Throughput: With Jumbo Frames enabled, large file transfers between pods can see 30-50% improvement in throughput due to reduced packet overhead and fewer CPU interrupts.
Predictability: Direct routing eliminates jitter introduced by the virtualization layer, providing more consistent performance for latency-sensitive applications.
Troubleshooting Common Issues
Problem: Pods on different nodes cannot communicate after switching to direct routing
Solution: Verify that autoDirectNodeRoutes is enabled and that all nodes are on the same Layer 2 broadcast domain. Check ip route on each node to ensure routes to remote pod CIDRs are present.
Problem: MTU mismatch causing packet loss
Solution: Ensure MTU is consistently set across all interfaces (physical NICs, bond interface, Cilium CNI config). Use tracepath to detect MTU issues between nodes.
Problem: Direct routing working but load balancing performance is suboptimal
Solution: If using DSR mode, ensure your application can handle responses going directly to clients. Some applications require symmetric routing. Standard load balancing mode may be more appropriate for your use case.
Wrapping Up: Running Cilium with eBPF on Bare Metal
Running eBPF on top of a virtualized VPC works, but it’s like driving a sports car in stop-and-go traffic…you never get to use the performance you paid for!
By deploying on OpenMetal, you strip away the virtualization overhead. You gain:
- Direct L2 Access: No overlay encapsulation penalty
- Jumbo Frames: MTU 9000 for dramatically improved throughput
- Hardware Power: Modern Xeon processors and NVMe drives that can actually saturate the 20 Gbps links
- Cost Efficiency: Unmetered private traffic and generous public egress allowances with 95th percentile billing
You get the full performance that Cilium and eBPF were designed to deliver, without the cloud overhead tax on your CPU or your budget.
Ready to eliminate the overlay tax? Explore OpenMetal’s bare metal infrastructure or contact our team to discuss your high-performance networking requirements. For detailed hardware specifications, see our complete bare metal server catalog.
Schedule a Consultation
Get a deeper assessment and discuss your unique requirements.
Read More on the OpenMetal Blog


































