Installation and administration guide
Troubleshooting

Database Problems (MongoDB/DocumentDB)

The most common issue with databases is the connection between the application and the database. In such cases, the following aspects should be verified:

  • Connection string – Ensure that the connection string in the application settings is correct. If connecting to a MongoDB or DocumentDB instance, use the MongoDB protocol. If the MongoDB instance is hosted on MongoDB Atlas, the mongo+srv protocol should be used to connect to the cluster.
  • Network policy – A frequent issue is an incorrect or missing network policy for the application node. Verify the network policies on both the MongoDB server side and the cluster where the application is deployed.
  • User permissions – The application typically creates its own database and collections upon first connection to MongoDB. If the user credentials used by the application have insufficient permissions, the service will be unable to create the necessary resources, leading to failures. If organizational policies prevent granting such permissions, databases and collections must be created manually.
    In the case of DocumentDB, which has restrictions on multi-document transactions, manual creation is always required.

Problems with RabbitMQ

Each application in the SCDOM product uses the RabbitMQ message broker. Below are the most common issues along with possible solutions:

  • Network connection issues – If the RabbitMQ cluster is located in a different namespace, cluster, or on a separate machine, verify network policies and ensure that appropriate traffic rules allow communication between the application cluster and the RabbitMQ cluster.
  • Permission issues – Ensure that a dedicated user is created on the RabbitMQ cluster with appropriate read and write permissions. Verify that the correct credentials are used in the application configuration.

Problems with Image Registry (ImagePullBackOff)

If an application encounters an ImagePullBackOff error, check whether a Secret object exists in the namespace containing credentials for the image registry.

  • A Secret can be explicitly referenced in the Pod definition under the imagePullSecrets key within the Deployment configuration.
  • Alternatively, it can be associated with the ServiceAccount used by the Pods.

For more details, refer to the official Kubernetes documentation:
Kubernetes Docs: Pull an image from a private registry (opens in a new tab)

Problems with Starting the Application

Liveness and Readiness Probes

Liveness and readiness probes are a common cause of application startup failures. If a Pod repeatedly restarts due to probe failures, verify the following:

  • Check the livenessProbe and readinessProbe configurations in the Deployment definition.
    By default, the expected paths and ports are:

    • Liveness probe: /application-name/actuator/health/liveness
    • Readiness probe: /application-name/actuator/health/readiness
      The default port is 8080.
  • If the application does not use the default configuration provided with the Helm chart, ensure that the customStartupProbe, customLivenessProbe, and customReadinessProbe sections are correctly defined, specifying the appropriate probe endpoints.

Problems with Environment Variable Configuration

If the application fails to connect to the message broker or database, verify the correctness of environment variables obtained from Secrets** or ConfigMaps.

Possible causes:

  • The ConfigMap or Secret has been deleted or incorrectly linked to the Deployment.
  • An incorrect ConfigMap or Secret name is referenced in the volumes section.
  • The application is expecting a configuration file in a different location than /application-name/application.yml.

Useful commands for troubleshooting:

kubectl describe pod <pod-name>
kubectl describe deployment <deployment-name>

Memory problems (OOMKilled)

If an application exceeds the allocated memory, Kubernetes may terminate the container due to an Out of Memory (OOMKilled) error. Symptoms include:

  • The container restarts repeatedly, and its status indicates OOMKilled.
  • JVM memory-related errors appear in the logs.

Diagnostic steps:

  • Check memory limits defined in the Deployment (located in the resources section of the container specification).
  • Analyze container logs using:
kubectl logs <pod-name>

Corrective actions:

  • Adjusting memory limits – If the application requires more memory, update the limits in the values.yaml file used with the Helm chart.
  • Configuring JVM heap size – If the default limits are not suitable, adjust the JVM settings using:
-XX:MaxRAMPercentage=70