To enable diagnostics on the root cluster you will have to start the teleport service with the following options (port is malleable):
This will expose several endpoints that we can use to gather diagnostic info.
Once the diagnostic endpoint has been exposed, you will use the following commands to capture the relevant info periodically (Recommend at least at the start of the day for most users, during peak times, and at the end at least. If issue is being triggered more often, may be prudent to do every 30 min/1 hr…etc):
To gather the memory profile, run the following on the server/node:
curl -o heap.profile http://127.0.0.1:3000/debug/pprof/heap
To gather the cpu profile, run the following on the server/node:
curl -o cpu.profile http://127.0.0.1:3000/debug/pprof/profile
To gather the Goroutine profile, run the following on the server/node:
curl -o goroutine.profile http://127.0.0.1:3000/debug/pprof/goroutine
Each of these commands may take up to 30 or so seconds to run and it will look like it's hung while running. This is expected. Let each run until it finishes and spits out the corresponding ".profile" file for each.
The ".profile" dumps ca be viewed only if you have Golang and Graphviz are installed on the appliance you're using to analyze the dump. If both are installed you can run the following command to open a browser and view the process tree:
go tool pprof heap.profile|cpu.profile|goroutine.profile
Alternatively you can view the info directly on the server/node without exporting to a file by running the following command if you have Golang and Graphviz installed on the cluster server/node:
go tool pprof heap -http 127.0.0.1:3000
After letting it run and complete (should appear hung and take around 30 or so seconds like the commands in Option 1 above), you can use the
web command to open up the profile in a browser and view the process tree.