App Access Stops Working After Proxy Restart – Teleport

Type:

issue

Question/Problem:

Application access stops working properly after kubernetes-hosted proxy restart.

Symptoms:

Users who connected to an application through Teleport may find that it no longer works after a refresh if a kuberentes-hosted Teleport Proxy has restarted in the interim. This will manifest in an `Internal Error 500` messages in the WebUI and users will notice that they cannot access the application until the old cookies age out completely.

Logs:


2021-11-18T22:32:41Z ERRO [APP:WEB] "Error forwarding to /favicon.ico, err: Teleport proxy failed to connect to \"app\" agent \"@local-node\" over reverse tunnel:\n\n no tunnel connection found: no app reverse tunnel for f7261349-a79f-4163-96d5-a803b782822b.proxy.example.com found\n\nThis usually means that the agent is offline or has disconnected. Check the\nagent logs and, if the issue persists, try restarting it or re-registering it\nwith the cluster." forward/fwd.go:179

Repro Steps:

Deploy a Teleport Cluster in a Kubernetes environment
Connect an application to the cluster with an external k8s agent pod.
Navigate to the application and confirm that it works.
Delete the Teleport Proxy pod.
Refresh the application in the browser and try to Launch it again via Teleport. Observe `Internal Error 500` messages in the browser and the logs noted above on the Teleport Proxy.

Solution:

This issue is related to cookies not expiring properly when an application session is not terminated gracefully. The message seen in the logs above shows that this is related to an improper tunnel connection, and users may see this log message in other circumstances as well outside of this specific scenario.

One solution is to delete all browser cookies associated with the application, which should restore normal functionality. An alternate solution is to use a LoadBalancer to drain the connections as you bring up a new Teleport Proxy node so that the connections can age out and transfer gracefully.

Teleport is also tracking this issue in a Github Issue here.