Connect agents to z4j over a private network
Run z4j in a homelab, on-prem cluster, or behind CGNAT, and let your workers reach it without exposing anything to the public internet. Pick the overlay that matches your stack.
Tailscale is the default recommendation. Six other options follow if you have a reason to choose differently.
What problem are we actually solving?
z4j agents always make outbound connections. The agent runs inside your worker process, opens a WebSocket to z4j, and pushes events. There is no inbound port on the agent side ever. So if your brain has a public DNS name and a TLS-terminating reverse proxy (see the TLS setup guide), you do not need anything on this page.
You need a private network when z4j itself is not directly reachable from the agents. The common cases are:
- z4j runs in a homelab on a residential connection behind CGNAT.
- z4j runs on an on-prem VLAN that is not internet-facing, and the workers run somewhere else.
- You have multi-site or multi-cloud workers that should never traverse the public internet to reach z4j.
- Compliance requires that z4j has zero public ingress, and Cloudflare Tunnel is not acceptable.
The architecture is the same in every case: z4j and the agents both join an overlay network, and you set Z4J_BRAIN_URL to z4j's address inside that overlay. Pick whichever overlay your team already operates, or pick Tailscale if you have no preference.
Tailscale
Solo developers, small teams, homelabs, anyone who wants this working in 10 minutes
~10 minutes
Headscale
Teams that like the Tailscale UX but cannot accept a third-party control plane
~30 minutes (mostly DNS and TLS for the control plane)
WireGuard
Security-sensitive deployments, fixed peer counts, ops teams comfortable with manual key rotation
~30 minutes for the first link, less for each additional peer
Netbird
Teams that want Tailscale ergonomics with first-party self-hosting and open-source licensing
~20 minutes
Cloudflare Tunnel
Brains behind CGNAT or on hosts where you cannot open inbound ports
~10 minutes
ZeroTier
Mixed-OS fleets, IoT-style deployments, teams that want layer-2 semantics over the WAN
~15 minutes
OpenVPN
Shops standardized on OpenVPN, environments with audit requirements that already approved it
~45 minutes the first time
Tailscale
WireGuard with automatic key exchange. MagicDNS gives z4j a stable hostname across every node.
Setup
- Create a Tailscale account and a tailnet at tailscale.com.
- Run
tailscale upon z4j host. Note its tailnet IP. - Run
tailscale upon each worker host. - Set
Z4J_BRAIN_URLto z4j's MagicDNS hostname or its tailnet IP.
Worker environment
# On the worker host, after `tailscale up`:
export Z4J_BRAIN_URL="https://z4j" # MagicDNS resolves this
export Z4J_TOKEN="<bearer>"
export Z4J_HMAC_SECRET="<hmac>"
# Or use z4j's tailnet IP directly:
export Z4J_BRAIN_URL="http://100.64.0.5:7700" Lock it down with ACLs (optional, recommended)
// In the Tailscale admin: ACL excerpt that lets agents reach
// z4j on port 7700 only, nothing else.
{
"tagOwners": {
"tag:z4j": ["autogroup:admin"],
"tag:z4j-agent": ["autogroup:admin"]
},
"acls": [
{
"action": "accept",
"src": ["tag:z4j-agent"],
"dst": ["tag:z4j:7700"]
}
]
} Tag z4j and agents, then write an ACL that allows agents to reach only z4j on only port 7700. The rest of the tailnet stays invisible to the worker hosts.
What you get
- Free tier covers 100 devices, 3 users
- MagicDNS resolves z4j.tailnet to a stable hostname automatically
- Identity tied to Google / GitHub / Microsoft / Okta SSO
- Tailnet ACLs let you restrict which agents can reach z4j port
Solo developers, small teams, homelabs, anyone who wants this working in 10 minutes
- License
- Tailscale client BSD-3, control plane proprietary SaaS
- Self-hosted
- Optional via Headscale (separate option below)
Headscale
Same Tailscale client, your own control plane. The right pick if you like Tailscale's UX but cannot accept a SaaS dependency.
Setup
# Run Headscale on a small VM with a public DNS name.
docker run -d --name headscale \
-v ./config:/etc/headscale \
-p 8080:8080 -p 9090:9090 \
ghcr.io/juanfont/headscale:latest \
serve
# Create the user / namespace and pre-auth key.
docker exec headscale headscale users create z4j
docker exec headscale headscale --user z4j preauthkeys create --reusable
# On every node (brain and agents):
tailscale up --login-server=https://hs.example.com --authkey=<key> Headscale ships as a single binary. The Tailscale clients on your nodes are unchanged; they just point at your control server instead of login.tailscale.com.
What you get
- Same Tailscale client on every node, just pointed at your control server
- MagicDNS works, ACLs work, the whole feature set works
- No built-in SSO; pre-shared keys or OIDC bridge
- You own the control-plane uptime
- License
- Headscale BSD-3, Tailscale client BSD-3
- Self-hosted
- Yes, fully
WireGuard
The raw protocol. No control plane, no third-party. You own everything.
Brain (server peer)
# /etc/wireguard/wg0.conf on z4j host (the public peer).
[Interface]
Address = 10.42.0.1/24
ListenPort = 51820
PrivateKey = <brain-private-key>
# One [Peer] block per agent.
[Peer]
PublicKey = <agent-public-key>
AllowedIPs = 10.42.0.10/32 Agent (client peer)
# /etc/wireguard/wg0.conf on the agent host.
[Interface]
Address = 10.42.0.10/32
PrivateKey = <agent-private-key>
[Peer]
PublicKey = <brain-public-key>
Endpoint = brain.example.com:51820
AllowedIPs = 10.42.0.0/24
PersistentKeepalive = 25
# Then on the worker:
export Z4J_BRAIN_URL="http://10.42.0.1:7700"
Generate keys with wg genkey | tee privatekey | wg pubkey > publickey on each peer. The brain's host needs a publicly reachable UDP port; agents do not.
What you get
- Zero dependencies beyond the WireGuard kernel module or wireguard-go
- Static peer configuration; no auto-discovery
- No NAT traversal helper; one peer needs a public endpoint
- Key rotation is manual unless you script it
Your peer count is large or churns frequently. WireGuard's static config does not autoscale; Tailscale or Netbird will save you a lot of yak-shaving.
Netbird
Open-source overlay built on WireGuard. Self-hosted as a first-class deployment, not a footnote.
Setup
# On every node (brain and agents):
curl -fsSL https://pkgs.netbird.io/install.sh | sh
netbird up --setup-key <key> # cloud control plane
# or
netbird up --management-url https://nb.example.com:33073 --setup-key <key>
# On the worker:
export Z4J_BRAIN_URL="http://100.92.0.1:7700" Netbird is BSD-3 end to end. The same client connects to either Netbird's hosted control plane (free for small teams) or one you run yourself.
What you get
- Cloud-managed control plane available; self-hosted is a standard deployment, not an afterthought
- Peer-to-peer connections with STUN/TURN fallback
- User-based ACLs and groups
- Smaller community than Tailscale; documentation is thinner in spots
- License
- BSD-3
- Self-hosted
- Yes, designed for it
Cloudflare Tunnel
Outbound-only tunnel from z4j to Cloudflare's edge. Agents reach a public hostname, but no public ingress on your host.
Setup
# On z4j host:
cloudflared tunnel login
cloudflared tunnel create z4j
cloudflared tunnel route dns z4j.example.com
cat > ~/.cloudflared/config.yml <<'EOF'
tunnel: z4j
credentials-file: /root/.cloudflared/<tunnel-uuid>.json
ingress:
- hostname: brain.example.com
service: http://localhost:7700
- service: http_status:404
EOF
cloudflared tunnel run z4j
# On the agent:
export Z4J_BRAIN_URL="https://brain.example.com" Cross-listed under TLS setup. Strictly speaking this is not an overlay network; it is a reverse tunnel. It belongs here because it solves the same problem (no public ports) without WireGuard.
What you get
- No public ports open on your brain host
- Cloudflare terminates TLS at its edge, tunnels HTTP to your brain
- Agents connect to the public hostname, not a private overlay
- Free tier works for small deployments
Compliance forbids routing traffic through a third party, or you want zero dependency on a SaaS edge for the agent connection.
ZeroTier
Peer-to-peer overlay with virtual ethernet semantics.
Setup
# Create a network at my.zerotier.com (or your self-hosted controller).
# Then on every node:
curl -s https://install.zerotier.com | sudo bash
sudo zerotier-cli join <network-id>
# Authorize each node in the network admin, then on the worker:
export Z4J_BRAIN_URL="http://10.147.18.1:7700" # ZeroTier-assigned IP ZeroTier excels at NAT traversal and gives every node a stable virtual IP that survives physical network changes. Worth a look for fleets with mobile or roaming workers.
What you get
- Virtual ethernet model: every node gets a stable private IP
- Strong NAT traversal across difficult networks
- Commercial licensing kicks in above 25 nodes
- Smaller user base than WireGuard-based options
- License
- ZeroTier BSL-1.1 (free for non-commercial; paid for commercial fleets above 25 nodes)
- Self-hosted
- Optional via ztncui or ZeroTier's own self-hosted controller
OpenVPN
Legacy stack, still common in regulated enterprises.
Agent connection
# Assuming a standard OpenVPN server at vpn.example.com:1194/udp.
sudo openvpn --config /etc/openvpn/client/z4j.ovpn
# Brain reachable on the VPN's internal IP:
export Z4J_BRAIN_URL="http://10.8.0.1:7700" We are not going to walk through OpenVPN server provisioning here, that is well-trodden ground. The point is that if your shop already runs OpenVPN, the agent does not care; it just sees z4j on the VPN's internal IP.
What you get
- TCP or UDP transport; works through hostile firewalls
- Heavier than WireGuard but extremely well-understood
- Active-active HA needs an external load balancer
- Slower throughput than WireGuard-family options
You are setting up a new private network from scratch. WireGuard family options are simpler, faster, and lighter for the same threat model.
Verifying the agent connects (any overlay)
The z4j agent maintains a long-lived WebSocket to z4j at /ws/agent on port 7700. Heartbeat frames flow every 10 seconds, z4j's idle-timeout is 90 seconds (so 6 missed heartbeats of headroom), and the agent reconnects automatically with exponential backoff if the socket drops.
Every overlay covered above carries TCP transparently: WireGuard family (Tailscale, Headscale, raw WireGuard, Netbird), L2 ZeroTier, and L3 OpenVPN all pass through HTTP Upgrade headers and WebSocket frames without any extra config. Cloudflare Tunnel supports WebSocket by default since cloudflared v2022.x. There is nothing to enable, you just point Z4J_BRAIN_URL at the overlay address and the agent does the rest.
Three quick tests confirm the path is healthy before you point a production worker at it.
Generic check (works for every overlay)
# Test 1: plain HTTP reachability (sanity check the overlay routes traffic).
curl -fsS http://<brain-overlay-ip>:7700/api/v1/health
# {"status":"ok","version":"1.6.7"}
# Test 2: WebSocket Upgrade headers survive end-to-end.
# wscat: https://github.com/websockets/wscat
wscat --connect "ws://<brain-overlay-ip>:7700/ws/agent" \
--header "Authorization: Bearer <Z4J_TOKEN>" \
--header "X-Z4J-Agent: ws-verify"
# < {"type":"hello_ack","heartbeat_interval_seconds":10,...}
# Test 3: heartbeat survives the overlay's idle assumptions.
# Just leave wscat connected; you should see heartbeat frames every 10s. Cloudflare Tunnel (wss:// over public hostname)
# Cloudflare Tunnel exposes z4j over wss:// at the public hostname.
# WebSocket is supported by default; no ingress rule changes required.
wscat --connect "wss://brain.example.com/ws/agent" \
--header "Authorization: Bearer <Z4J_TOKEN>"
# If you see a 502 or the connection hangs at handshake, the tunnel
# likely needs cloudflared >= 2022.x and z4j ingress entry must
# NOT set 'noTLSVerify: false' against an HTTP-only backend. The
# stock config above (service: http://localhost:7700) is correct. What "working" looks like
- Worker boot logs
z4j worker bootstrap: agent runtime started - Brain dashboard shows the agent under Project > Agents within ~5 seconds
- Tasks queued by the worker appear in the Tasks list within ~1 second
- HTTP works, WS hangs at 101 Switching Protocols. A reverse proxy in the path is stripping
Upgrade/Connectionheaders. See the TLS setup guide for working nginx and Traefik configs that preserve them. - Connection drops every ~60 seconds. An idle-timeout intermediate (older nginx default, some load balancers) is closing the socket between heartbeats. Raise
proxy_read_timeout/ equivalent above 120s. - 4429 close code. You hit the per-IP connect rate limit; z4j accepted the upgrade and immediately closed it. Stagger restarts of large worker fleets, or whitelist the agent's overlay IP via
Z4J_TRUSTED_AGENT_IPS. - 401 Unauthorized. Token mismatch. z4j prints the agent name and project ID it tried to authenticate; check the secret you set in
Z4J_TOKEN.
Decision matrix
| If you... | Pick |
|---|---|
| Want it working in 10 minutes and have no preference | Tailscale |
| Like Tailscale but cannot use a SaaS control plane | Headscale |
| Want zero third-party dependency and only have a few peers | WireGuard |
| Want self-hosted by default and an open-source license everywhere | Netbird |
| Cannot open inbound ports and accept Cloudflare in the path | Cloudflare Tunnel |
| Have roaming workers across hostile networks | ZeroTier |
| Already run OpenVPN as the corporate-approved stack | OpenVPN |
z4j stays where you put it
The point of every option above is that z4j never has to be on the public internet. Pick whichever overlay you trust and leave the front door closed.