Troubleshooting AST

  1. Troubleshooting AST
    1. Useful Commands
      1. View Docker Container Status
      2. View Docker Container Logs
      3. Save Logs To A File
      4. Stop and Start A Container
    2. Grafana Not Reachable
    3. Dashboard Data Not Visible (All Dashboards)
      1. Check the Opentelemetry Collector Dashboard
    4. Dashboard Data Not Visible (Specific Dashboards)
    5. GTM and DNS Metrics Not Loading

Useful Commands

View Docker Container Status

You can view the state of the docker containers after they’ve been started with docker ps. The STATUS field is a good indicator whether containers are running correctly or not.

This output shows everything looks good:

$ docker ps
CONTAINER ID   IMAGE                                                                      COMMAND                  CREATED              STATUS          PORTS                       NAMES
cb4cf8867390   grafana/grafana:11.2.0                                                     "/run.sh"                About a minute ago   Up 49 seconds   0.0.0.0:3000->3000/tcp      grafana
bb8891f2cd47   prom/prometheus:v2.54.1                                                    "/bin/prometheus --c…"   About a minute ago   Up 49 seconds   0.0.0.0:9090->9090/tcp      prometheus
df2739cd67cb   ghcr.io/f5devcentral/application-study-tool/otel_custom_collector:v0.6.0   "/otelcol-custom --c…"   About a minute ago   Up 49 seconds   4317/tcp, 55679-55680/tcp   application-study-tool-otel-collector-1

This output shows a problem (Restarting container) for the application-study-tool-otel-collector-1 container (the otel collector):

$ docker ps
CONTAINER ID   IMAGE                                                                      COMMAND                  CREATED          STATUS                         PORTS                    NAMES
fdbde8a3ee16   ghcr.io/f5devcentral/application-study-tool/otel_custom_collector:v0.6.0   "/otelcol-custom --c…"   14 seconds ago   Restarting (1) 5 seconds ago                            application-study-tool-otel-collector-1
b7ef41accd46   grafana/grafana:11.2.0                                                     "/run.sh"                14 seconds ago   Up 13 seconds                  0.0.0.0:3000->3000/tcp   grafana
8edff3e8666e   prom/prometheus:v2.54.1                                                    "/bin/prometheus --c…"   14 seconds ago   Up 13 seconds                  0.0.0.0:9090->9090/tcp   prometheus

View Docker Container Logs

When containers aren’t running correctly or other issues are present, the container logs are a good place to look. Once you have the Container ID or Container Name field from the output of docker ps, you can view the container logs using the docker logs <container id or name> command.

Logs for the broken container above (which show a parsing error with the config file) can be gathered as shown:

$ docker logs application-study-tool-otel-collector-1
Error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

'receivers' expected a map, got 'string'
2024/10/17 15:53:44 collector server run finished with error: failed to get config: cannot unmarshal the configuration: decoding failed due to the following error(s):

'receivers' expected a map, got 'string'

Save Logs To A File

The output of docker logs command for specific containers may be requested during troubleshooting. The logs can be gathered and saved to file with a command similar to:

$ docker logs application-study-tool-otel-collector-1 >& otellogs.txt && gzip otellogs.txt

Stop and Start A Container

Individual containers can be started and stopped using the docker stop <container name or id> command. This will cause the container to restart with any modifications to config files picked up (alternative to the docker-compose down and docker-compose up commands which restart all containers):

$ docker stop fdbde8a3ee16
fdbde8a3ee16

$ docker start fdbde8a3ee16
fdbde8a3ee16

Grafana Not Reachable

Once containers are up and running, you should be able to view the Grafana instance by navigating to port 3000 on the host machine. If this isn’t working, things to check include:

  • Run docker ps and check the status of the Grafana container.
  • Ensure port 3000 is open between the client browser and the instance running AST.
  • Check grafana logs with docker logs -f grafana
  • Check if the prometheus endpoint (avaialble on port 9090 of the AST instance) is reachable (as a second data point assuming none of the above show issues).

Dashboard Data Not Visible (All Dashboards)

If Grafana is loading but dashboard data is not populating in any dashboards, there are a few things to check:

Check the Opentelemetry Collector Dashboard

The opentelemetry collector dashboard at the top level of the ‘Dashboards’ list in Grafana is a good starting point. The data in this dashboard is collected through a different mechanism from the rest of the dashboards.

  • If data is present in this dashboard (but no others), it indicates an issue with the collection of data from the BigIPs.
  • If no data is present in this dashboard, it indicates a problem with the connection between Prometheus and Grafana.

If Data Is Present

If data is present here, there’s very likely an issue with collection from the BigIPs themselves. Things to check:

  • Is the Otel Collector container running correctly (docker ps)
  • Do the otel collector container logs show any
    1. Authentication errors:
      • 2024-10-17T16:23:50.658Z error scraperhelper/scrapercontroller.go:197 Error scraping metrics {"kind": "receiver", "name": "bigip/1", "data_type": "metrics", "error": "endpoint /mgmt/shared/authn/login returned code 401", "scraper": "bigip"}
    2. Network connection problems:
      • 2024-10-17T16:17:44.347Z error scraperhelper/scrapercontroller.go:197 Error scraping metrics {"kind": "receiver", "name": "bigip/1", "data_type": "metrics", "error": "failed to make http request: Post \"https://10.0.0.1/mgmt/shared/authn/login\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "scraper": "bigip"}
  • Do the Metrics Points Receive Rate and Metrics Points Export Rate in the dashboard show failures or ‘0’ (indicating nothing is being collected)?

If Data Is NOT Present

This indicates that there is an issue with the communication between Grafana and Prometheus. Check:

  • Is the prometheus container running docker ps
  • Are there any error logs present in the prometheus container (if running or not) docker logs prometheus
  • Can you reach the prometheus interface on port 9090 of the AST host (assuming firewalls are open)

Dashboard Data Not Visible (Specific Dashboards)

If the dashboards are GTM or DNS profile dashboards, see below.

For other dashboards, ensure that:

  • The dashboard ‘time range’ in the upper right is set to a window where data is expected to exist.
  • Any device, virtual server, or other selectors in the upper left are set to valid values where data should exist.
  • Check the responses for a specific broken panel by clicking the ‘3 dot icon’ (upper right while hovering) and selecting ‘Inspect > Query’
  • Are any queries to the BigIP timing or erroring out? Check the otel collector logs (docker ps) and the ‘BigIP Collector Stats’ dashboard at the top level of the Dashboards section in Grafana

GTM and DNS Metrics Not Loading

Metrics for DNS and GTM are disabled by default. See Configuration > Configuration Helper (Recommended) > Configure DNS & GTM for instructions to enable.