If your WebSocket keeps reconnecting every few seconds, it’s usually not your app code. In most production incidents, the root cause is an incomplete reverse-proxy chain in Nginx (or one proxy layer before it).
This guide gives you a copy-paste checklist to stabilize WebSocket connections fast.
1) Missing Upgrade/Connection headers (no 101 handshake)
WebSocket requires an HTTP upgrade. Without proper header forwarding, upstream sees plain HTTP.
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
location /ws/ {
proxy_pass http://127.0.0.1:8080;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
}
Verify:
curl -i -N \
-H "Connection: Upgrade" \
-H "Upgrade: websocket" \
-H "Sec-WebSocket-Version: 13" \
-H "Sec-WebSocket-Key: SGVsbG9Xb3JsZDEyMw==" \
https://example.com/ws/
You should get HTTP/1.1 101 Switching Protocols.
2) Idle timeout is too short
A very common killer: default timeout values close idle but healthy sockets.
location /ws/ {
proxy_connect_timeout 15s;
proxy_send_timeout 60s;
proxy_read_timeout 3600s;
}
Also send ping/pong heartbeats every 20-30 seconds from server or client.
3) Load-balanced upstream without session stickiness
Handshake lands on node A, next frame hits node B, in-memory session is gone, socket drops.
upstream ws_backend {
ip_hash;
server 10.0.0.11:8080;
server 10.0.0.12:8080;
}
If possible, move session state to Redis and avoid sticky dependencies.
4) Another proxy layer closes first (CDN/LB/Ingress)
Even if Nginx is set to 1 hour, your outer ALB/CDN might cut at 60 seconds.
Check all layers and align idle timeout to a sane baseline (300s+ for many real-time apps).
5) Protocol mismatch (http2, ws, wss)
Public edge can run HTTP/2, but WebSocket upgrade semantics still rely on HTTP/1.1 proxy behavior.
Keep this consistent:
- Client uses
wss://example.com/ws/ - Nginx upstream uses
proxy_http_version 1.1 - TLS cert chain is complete
openssl s_client -connect example.com:443 -servername example.com </dev/null
6) Buffering/compression side effects
For WebSocket paths, avoid normal response buffering.
location /ws/ {
proxy_buffering off;
gzip off;
}
If messages arrive in bursts instead of real-time, buffering is often the culprit.
7) No observability = blind debugging
Add a focused log format and track handshake success and disconnect rates.
log_format ws '$remote_addr - $host [$time_local] '
'"$request" $status $body_bytes_sent '
'rt=$request_time urt=$upstream_response_time '
'ua="$http_user_agent" up="$upstream_addr"';
access_log /var/log/nginx/ws_access.log ws;
error_log /var/log/nginx/ws_error.log warn;
Track at least:
- active websocket connections
- disconnects per minute
- HTTP 101 success ratio
Production-ready baseline snippet
map $http_upgrade $connection_upgrade {
default upgrade;
'' close;
}
upstream ws_backend {
least_conn;
server 127.0.0.1:8080 max_fails=3 fail_timeout=30s;
}
server {
listen 443 ssl http2;
server_name example.com;
location /ws/ {
proxy_pass http://ws_backend;
proxy_http_version 1.1;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection $connection_upgrade;
proxy_connect_timeout 15s;
proxy_send_timeout 60s;
proxy_read_timeout 3600s;
proxy_buffering off;
gzip off;
}
}
Summary
Most WebSocket disconnect incidents are boring infrastructure issues, not mysterious app bugs.
Fix these first:
- Correct upgrade headers + HTTP/1.1 proxying
- Longer read timeout + heartbeat
- Aligned idle timeouts across all proxy layers
Do this and your reconnect storm usually disappears.