FreeSWITCH High Availability: Active-Active Cluster Setup

FreeSWITCH does not ship with native clustering. It is designed as a single-node media server, and its internal state — active calls, channel variables, dialplan state — lives in process memory. Building high availability on top of FreeSWITCH means externalizing that state and routing around failures at the SIP proxy layer. This post covers an active-active architecture that handles node failures without dropping established calls and routes new calls away from unhealthy nodes within seconds.

Architecture Overview

                    ┌─────────────────────────────┐
   SIP Trunk ──────►│    Kamailio (load balancer)  │◄── SIP clients
                    └────────┬────────────┬────────┘
                             │            │
              ┌──────────────▼──┐      ┌──▼──────────────┐
              │  FreeSWITCH-1   │      │  FreeSWITCH-2   │
              │  (active)       │      │  (active)        │
              └──────────┬──────┘      └──────┬──────────┘
                         │                    │
                    ┌────▼────────────────────▼────┐
                    │    PostgreSQL (shared state)  │
                    │    + Redis (call registry)    │
                    └──────────────────────────────┘

Both FreeSWITCH nodes are active simultaneously. Kamailio distributes new calls across nodes using dispatcher. Active calls stay pinned to the node they started on — FreeSWITCH does not support live call migration between nodes. When a node fails, in-flight calls on that node drop (unavoidable without media server clustering), but new calls immediately route to the surviving node.

FreeSWITCH Node Configuration

Each node needs a unique rtp-ip and sip-ip binding but can share the same SIP profile structure:

<!-- /etc/freeswitch/sip_profiles/external.xml (Node 1) -->
<profile name="external">
  <settings>
    <param name="sip-ip" value="10.0.1.10"/>
    <param name="rtp-ip" value="10.0.1.10"/>
    <param name="ext-rtp-ip" value="203.0.113.10"/>
    <param name="ext-sip-ip" value="203.0.113.10"/>
    <param name="sip-port" value="5080"/>
    <param name="rtp-start-port" value="16384"/>
    <param name="rtp-end-port" value="32768"/>
    <param name="apply-nat-acl" value="rfc1918"/>
    <param name="manage-presence" value="false"/>
    <!-- Unique node identifier for call routing -->
    <param name="user-agent-string" value="FreeSWITCH/node-1"/>
  </settings>
</profile>

Node 2 mirrors this with 10.0.1.11 and 203.0.113.11. Keep RTP port ranges non-overlapping between nodes if they share any network segment.

Kamailio Dispatcher Configuration

Kamailio acts as the SIP load balancer. Configure dispatcher to probe both FreeSWITCH nodes:

# /etc/kamailio/dispatcher.list
# setid  destination                    flags  priority
1        sip:10.0.1.10:5080             0      10
1        sip:10.0.1.11:5080             0      10

# kamailio.cfg — relevant dispatcher section
loadmodule "dispatcher.so"

modparam("dispatcher", "list_file", "/etc/kamailio/dispatcher.list")
modparam("dispatcher", "probing_mode", 1)
modparam("dispatcher", "ds_ping_method", "OPTIONS")
modparam("dispatcher", "ds_ping_from", "sip:monitor@kamailio.example.com")
modparam("dispatcher", "ds_ping_interval", 10)
modparam("dispatcher", "ds_probing_threshold", 3)
modparam("dispatcher", "ds_inactive_threshold", 3)
modparam("dispatcher", "ds_timeout_after_inactive", 900)

request_route {
    if (is_method("INVITE") && !has_totag()) {
        # New call — load balance across active FS nodes
        if (!ds_select_dst(1, 4)) {
            send_reply("503", "Service Unavailable");
            exit;
        }
        t_on_failure("DISPATCH_FAILURE");
    } else if (has_totag()) {
        # In-dialog request — route to same node
        if (!ds_is_from_list()) {
            # From client — forward to the FS node that owns this dialog
            route(ROUTE_TO_FS_NODE);
        }
    }
    t_relay();
}

failure_route[DISPATCH_FAILURE] {
    if (t_is_canceled()) exit;
    if (t_check_status("503") || t_branch_timeout()) {
        if (ds_next_dst()) {
            t_on_failure("DISPATCH_FAILURE");
            t_relay();
            exit;
        }
    }
    send_reply("503", "All media servers unavailable");
}

The ds_probing_threshold=3 means a node must fail 3 consecutive OPTIONS probes (30 seconds) before being marked inactive. Adjust down to 1 for faster failover detection at the cost of brief false-positives during network blips.

Call Pinning with Redis

In-dialog requests (re-INVITE, BYE, REFER) must reach the same FreeSWITCH node that answered the original INVITE. Store the call-to-node mapping in Redis:

# kamailio.cfg — store FS node on call answer
onreply_route[STORE_NODE] {
    if (t_check_status("200")) {
        $var(dialog_id) = $ci;
        $var(fs_node) = $du;
        redis_cmd("SET", "call:$var(dialog_id)", "$var(fs_node)", "EX", "7200");
    }
}

route[ROUTE_TO_FS_NODE] {
    $var(dialog_id) = $ci;
    redis_cmd("GET", "call:$var(dialog_id)");
    if ($redis(reply) != $null) {
        $du = $redis(reply);
        t_relay();
        exit;
    }
    # Dialog not in Redis — node may have failed
    send_reply("481", "Call Leg/Transaction Does Not Exist");
}

Set the Redis key TTL to your maximum call duration (7200 seconds = 2 hours). After TTL, Kamailio cleans up automatically without a separate cleanup job.

PostgreSQL Shared State

FreeSWITCH uses a local SQLite database by default. Switch to PostgreSQL for shared state between nodes:

<!-- /etc/freeswitch/autoload_configs/switch.conf.xml -->
<configuration name="switch.conf">
  <settings>
    <param name="core-db-name" value=""/>
    <param name="core-db-dsn" value="pgsql://user=freeswitch;password=secret;host=db.example.com;dbname=freeswitch;"/>
    <param name="auto-create-schemas" value="true"/>
    <param name="auto-clear-sql" value="true"/>
  </settings>
</configuration>

Also configure mod_voicemail and mod_sofia to use the shared database:

<!-- sofia.conf.xml -->
<param name="db-dsn" value="pgsql://user=freeswitch;password=secret;host=db.example.com;dbname=freeswitch;"/>

With shared PostgreSQL, SIP registrations written by Node 1 are visible to Node 2. A registered user can reach their endpoint even if the node they registered against goes down.

Health Checks and Monitoring

FreeSWITCH exposes an ESL (Event Socket Layer) interface for health checks. A lightweight health check script:

#!/bin/bash
# /usr/local/bin/fs-healthcheck.sh
# Returns 0 if healthy, 1 if not — used by Kamailio OPTIONS response

FS_STATUS=$(fs_cli -x "status" 2>/dev/null | grep -c "READY")
ACTIVE_CALLS=$(fs_cli -x "show calls count" 2>/dev/null | grep -oP '\d+(?= total)')

if [ "$FS_STATUS" -eq 0 ]; then
    echo "FreeSWITCH not ready"
    exit 1
fi

# Alert if calls exceed node capacity
if [ "${ACTIVE_CALLS:-0}" -gt 500 ]; then
    echo "Node at capacity: ${ACTIVE_CALLS} calls"
    exit 1
fi

echo "OK: ${ACTIVE_CALLS} active calls"
exit 0

Run this every 10 seconds from a systemd timer and expose the result via a lightweight HTTP endpoint that Kamailio's OPTIONS probe can hit. Kamailio marks the node inactive when OPTIONS responses stop, which happens automatically when the health check kills the FreeSWITCH OPTIONS response.

Graceful Drain Before Maintenance

Before taking a node down for maintenance, drain it rather than killing it:

# Tell Kamailio to stop sending new calls to this node
kamcmd dispatcher.set_state ip 10.0.1.10 5080 inactive

# Wait for active calls to finish (check every 30 seconds)
while [ $(fs_cli -x "show calls count" | grep -oP '\d+(?= total)') -gt 0 ]; do
    echo "Waiting for calls to finish..."
    sleep 30
done

# Safe to restart now
systemctl restart freeswitch
kamcmd dispatcher.set_state ip 10.0.1.10 5080 active

This gives existing calls up to their natural duration to finish before the node goes offline. New calls route to the peer node during the drain window.

Capacity Planning

Metric	Per FreeSWITCH node	2-node cluster
Concurrent calls (audio only)	500	1,000
Concurrent calls (HD video transcode)	50	100
INVITE/sec burst	50	100
Memory per call	~2 MB	—
Recommended RAM	16 GB	16 GB × 2

Scale horizontally by adding nodes to the Kamailio dispatcher list. The Redis and PostgreSQL backends scale independently — use a managed cloud database service (RDS, Cloud SQL) to decouple their capacity from the media server tier.

FreeSWITCH High Availability: Active-Active Cluster Setup

FreeSWITCH High Availability: Active-Active Cluster Setup

Ready to build on carrier-grade voice?