Connect agents to z4j over a private network

Run z4j in a homelab, on-prem cluster, or behind CGNAT, and let your workers reach it without exposing anything to the public internet. Pick the overlay that matches your stack.

Tailscale is the default recommendation. Six other options follow if you have a reason to choose differently.

when does this matter?

What problem are we actually solving?

z4j agents always make outbound connections. The agent runs inside your worker process, opens a WebSocket to z4j, and pushes events. There is no inbound port on the agent side ever. So if your brain has a public DNS name and a TLS-terminating reverse proxy (see the TLS setup guide), you do not need anything on this page.

You need a private network when z4j itself is not directly reachable from the agents. The common cases are:

z4j runs in a homelab on a residential connection behind CGNAT.
z4j runs on an on-prem VLAN that is not internet-facing, and the workers run somewhere else.
You have multi-site or multi-cloud workers that should never traverse the public internet to reach z4j.
Compliance requires that z4j has zero public ingress, and Cloudflare Tunnel is not acceptable.

The architecture is the same in every case: z4j and the agents both join an overlay network, and you set Z4J_BRAIN_URL to z4j's address inside that overlay. Pick whichever overlay your team already operates, or pick Tailscale if you have no preference.

Tailscale

Solo developers, small teams, homelabs, anyone who wants this working in 10 minutes

~10 minutes

Headscale

Teams that like the Tailscale UX but cannot accept a third-party control plane

~30 minutes (mostly DNS and TLS for the control plane)

WireGuard

Security-sensitive deployments, fixed peer counts, ops teams comfortable with manual key rotation

~30 minutes for the first link, less for each additional peer

Netbird

Teams that want Tailscale ergonomics with first-party self-hosting and open-source licensing

~20 minutes

Cloudflare Tunnel

Brains behind CGNAT or on hosts where you cannot open inbound ports

~10 minutes

ZeroTier

Mixed-OS fleets, IoT-style deployments, teams that want layer-2 semantics over the WAN

~15 minutes

OpenVPN

Shops standardized on OpenVPN, environments with audit requirements that already approved it

~45 minutes the first time

Option 1

Tailscale

WireGuard with automatic key exchange. MagicDNS gives z4j a stable hostname across every node.

Setup

Create a Tailscale account and a tailnet at tailscale.com.
Run tailscale up on z4j host. Note its tailnet IP.
Run tailscale up on each worker host.
Set Z4J_BRAIN_URL to z4j's MagicDNS hostname or its tailnet IP.

Worker environment

bash

# On the worker host, after `tailscale up`:
export Z4J_BRAIN_URL="https://z4j"             # MagicDNS resolves this
export Z4J_TOKEN="<bearer>"
export Z4J_HMAC_SECRET="<hmac>"

# Or use z4j's tailnet IP directly:
export Z4J_BRAIN_URL="http://100.64.0.5:7700"

Lock it down with ACLs (optional, recommended)

json

// In the Tailscale admin: ACL excerpt that lets agents reach
// z4j on port 7700 only, nothing else.
{
  "tagOwners": {
    "tag:z4j": ["autogroup:admin"],
    "tag:z4j-agent": ["autogroup:admin"]
  },
  "acls": [
    {
      "action": "accept",
      "src":    ["tag:z4j-agent"],
      "dst":    ["tag:z4j:7700"]
    }
  ]
}

Tag z4j and agents, then write an ACL that allows agents to reach only z4j on only port 7700. The rest of the tailnet stays invisible to the worker hosts.

What you get

Free tier covers 100 devices, 3 users
MagicDNS resolves z4j.tailnet to a stable hostname automatically
Identity tied to Google / GitHub / Microsoft / Okta SSO
Tailnet ACLs let you restrict which agents can reach z4j port

Best for

Solo developers, small teams, homelabs, anyone who wants this working in 10 minutes

License: Tailscale client BSD-3, control plane proprietary SaaS
Self-hosted: Optional via Headscale (separate option below)

Option 2

Headscale

Same Tailscale client, your own control plane. The right pick if you like Tailscale's UX but cannot accept a SaaS dependency.

Setup

bash

# Run Headscale on a small VM with a public DNS name.
docker run -d --name headscale \
  -v ./config:/etc/headscale \
  -p 8080:8080 -p 9090:9090 \
  ghcr.io/juanfont/headscale:latest \
  serve

# Create the user / namespace and pre-auth key.
docker exec headscale headscale users create z4j
docker exec headscale headscale --user z4j preauthkeys create --reusable

# On every node (brain and agents):
tailscale up --login-server=https://hs.example.com --authkey=<key>

Headscale ships as a single binary. The Tailscale clients on your nodes are unchanged; they just point at your control server instead of login.tailscale.com.

What you get

Same Tailscale client on every node, just pointed at your control server
MagicDNS works, ACLs work, the whole feature set works
No built-in SSO; pre-shared keys or OIDC bridge
You own the control-plane uptime

License: Headscale BSD-3, Tailscale client BSD-3
Self-hosted: Yes, fully

Option 3

WireGuard

The raw protocol. No control plane, no third-party. You own everything.

Brain (server peer)

ini

# /etc/wireguard/wg0.conf on z4j host (the public peer).
[Interface]
Address    = 10.42.0.1/24
ListenPort = 51820
PrivateKey = <brain-private-key>

# One [Peer] block per agent.
[Peer]
PublicKey  = <agent-public-key>
AllowedIPs = 10.42.0.10/32

Agent (client peer)

ini

# /etc/wireguard/wg0.conf on the agent host.
[Interface]
Address    = 10.42.0.10/32
PrivateKey = <agent-private-key>

[Peer]
PublicKey  = <brain-public-key>
Endpoint   = brain.example.com:51820
AllowedIPs = 10.42.0.0/24
PersistentKeepalive = 25

# Then on the worker:
export Z4J_BRAIN_URL="http://10.42.0.1:7700"

Generate keys with wg genkey | tee privatekey | wg pubkey > publickey on each peer. The brain's host needs a publicly reachable UDP port; agents do not.

What you get

Zero dependencies beyond the WireGuard kernel module or wireguard-go
Static peer configuration; no auto-discovery
No NAT traversal helper; one peer needs a public endpoint
Key rotation is manual unless you script it

Skip this when

Your peer count is large or churns frequently. WireGuard's static config does not autoscale; Tailscale or Netbird will save you a lot of yak-shaving.

Option 4

Netbird

Open-source overlay built on WireGuard. Self-hosted as a first-class deployment, not a footnote.

Setup

bash

# On every node (brain and agents):
curl -fsSL https://pkgs.netbird.io/install.sh | sh
netbird up --setup-key <key>            # cloud control plane
# or
netbird up --management-url https://nb.example.com:33073 --setup-key <key>

# On the worker:
export Z4J_BRAIN_URL="http://100.92.0.1:7700"

Netbird is BSD-3 end to end. The same client connects to either Netbird's hosted control plane (free for small teams) or one you run yourself.

What you get

Cloud-managed control plane available; self-hosted is a standard deployment, not an afterthought
Peer-to-peer connections with STUN/TURN fallback
User-based ACLs and groups
Smaller community than Tailscale; documentation is thinner in spots

License: BSD-3
Self-hosted: Yes, designed for it

Option 5

Cloudflare Tunnel

Outbound-only tunnel from z4j to Cloudflare's edge. Agents reach a public hostname, but no public ingress on your host.

Setup

bash

# On z4j host:
cloudflared tunnel login
cloudflared tunnel create z4j
cloudflared tunnel route dns z4j.example.com

cat > ~/.cloudflared/config.yml <<'EOF'
tunnel: z4j
credentials-file: /root/.cloudflared/<tunnel-uuid>.json
ingress:
  - hostname: brain.example.com
    service: http://localhost:7700
  - service: http_status:404
EOF

cloudflared tunnel run z4j

# On the agent:
export Z4J_BRAIN_URL="https://brain.example.com"

Cross-listed under TLS setup. Strictly speaking this is not an overlay network; it is a reverse tunnel. It belongs here because it solves the same problem (no public ports) without WireGuard.

What you get

No public ports open on your brain host
Cloudflare terminates TLS at its edge, tunnels HTTP to your brain
Agents connect to the public hostname, not a private overlay
Free tier works for small deployments

Skip this when

Compliance forbids routing traffic through a third party, or you want zero dependency on a SaaS edge for the agent connection.

Option 6

ZeroTier

Peer-to-peer overlay with virtual ethernet semantics.

Setup

bash

# Create a network at my.zerotier.com (or your self-hosted controller).
# Then on every node:
curl -s https://install.zerotier.com | sudo bash
sudo zerotier-cli join <network-id>

# Authorize each node in the network admin, then on the worker:
export Z4J_BRAIN_URL="http://10.147.18.1:7700"     # ZeroTier-assigned IP

ZeroTier excels at NAT traversal and gives every node a stable virtual IP that survives physical network changes. Worth a look for fleets with mobile or roaming workers.

What you get

Virtual ethernet model: every node gets a stable private IP
Strong NAT traversal across difficult networks
Commercial licensing kicks in above 25 nodes
Smaller user base than WireGuard-based options

License: ZeroTier BSL-1.1 (free for non-commercial; paid for commercial fleets above 25 nodes)
Self-hosted: Optional via ztncui or ZeroTier's own self-hosted controller

Option 7

OpenVPN

Legacy stack, still common in regulated enterprises.

Agent connection

bash

# Assuming a standard OpenVPN server at vpn.example.com:1194/udp.
sudo openvpn --config /etc/openvpn/client/z4j.ovpn

# Brain reachable on the VPN's internal IP:
export Z4J_BRAIN_URL="http://10.8.0.1:7700"

We are not going to walk through OpenVPN server provisioning here, that is well-trodden ground. The point is that if your shop already runs OpenVPN, the agent does not care; it just sees z4j on the VPN's internal IP.

What you get

TCP or UDP transport; works through hostile firewalls
Heavier than WireGuard but extremely well-understood
Active-active HA needs an external load balancer
Slower throughput than WireGuard-family options

Skip this when

You are setting up a new private network from scratch. WireGuard family options are simpler, faster, and lighter for the same threat model.

websocket

Verifying the agent connects (any overlay)

The z4j agent maintains a long-lived WebSocket to z4j at /ws/agent on port 7700. Heartbeat frames flow every 10 seconds, z4j's idle-timeout is 90 seconds (so 6 missed heartbeats of headroom), and the agent reconnects automatically with exponential backoff if the socket drops.

Every overlay covered above carries TCP transparently: WireGuard family (Tailscale, Headscale, raw WireGuard, Netbird), L2 ZeroTier, and L3 OpenVPN all pass through HTTP Upgrade headers and WebSocket frames without any extra config. Cloudflare Tunnel supports WebSocket by default since cloudflared v2022.x. There is nothing to enable, you just point Z4J_BRAIN_URL at the overlay address and the agent does the rest.

Three quick tests confirm the path is healthy before you point a production worker at it.

Generic check (works for every overlay)

bash

# Test 1: plain HTTP reachability (sanity check the overlay routes traffic).
curl -fsS http://<brain-overlay-ip>:7700/api/v1/health
# {"status":"ok","version":"1.6.7"}

# Test 2: WebSocket Upgrade headers survive end-to-end.
# wscat: https://github.com/websockets/wscat
wscat --connect "ws://<brain-overlay-ip>:7700/ws/agent" \
  --header "Authorization: Bearer <Z4J_TOKEN>" \
  --header "X-Z4J-Agent: ws-verify"
# < {"type":"hello_ack","heartbeat_interval_seconds":10,...}

# Test 3: heartbeat survives the overlay's idle assumptions.
# Just leave wscat connected; you should see heartbeat frames every 10s.

Cloudflare Tunnel (wss:// over public hostname)

bash

# Cloudflare Tunnel exposes z4j over wss:// at the public hostname.
# WebSocket is supported by default; no ingress rule changes required.
wscat --connect "wss://brain.example.com/ws/agent" \
  --header "Authorization: Bearer <Z4J_TOKEN>"

# If you see a 502 or the connection hangs at handshake, the tunnel
# likely needs cloudflared >= 2022.x and z4j ingress entry must
# NOT set 'noTLSVerify: false' against an HTTP-only backend. The
# stock config above (service: http://localhost:7700) is correct.

What "working" looks like

Worker boot logs z4j worker bootstrap: agent runtime started
Brain dashboard shows the agent under Project > Agents within ~5 seconds
Tasks queued by the worker appear in the Tasks list within ~1 second

If the WebSocket fails to upgrade

HTTP works, WS hangs at 101 Switching Protocols. A reverse proxy in the path is stripping Upgrade / Connection headers. See the TLS setup guide for working nginx and Traefik configs that preserve them.
Connection drops every ~60 seconds. An idle-timeout intermediate (older nginx default, some load balancers) is closing the socket between heartbeats. Raise proxy_read_timeout / equivalent above 120s.
4429 close code. You hit the per-IP connect rate limit; z4j accepted the upgrade and immediately closed it. Stagger restarts of large worker fleets, or whitelist the agent's overlay IP via Z4J_TRUSTED_AGENT_IPS.
401 Unauthorized. Token mismatch. z4j prints the agent name and project ID it tried to authenticate; check the secret you set in Z4J_TOKEN.

how to pick

Decision matrix

If you...	Pick
Want it working in 10 minutes and have no preference	Tailscale
Like Tailscale but cannot use a SaaS control plane	Headscale
Want zero third-party dependency and only have a few peers	WireGuard
Want self-hosted by default and an open-source license everywhere	Netbird
Cannot open inbound ports and accept Cloudflare in the path	Cloudflare Tunnel
Have roaming workers across hostile networks	ZeroTier
Already run OpenVPN as the corporate-approved stack	OpenVPN

z4j stays where you put it

The point of every option above is that z4j never has to be on the public internet. Pick whichever overlay you trust and leave the front door closed.

Back to install TLS setup