Agents behaviour
Bad events number are inconsistent
Data collection session 0 disk free space
Deactivating HPA agents
Error log in agents file
Extractor OOO
JVM memory allocation
Missing metrics
Navigator names are too numerous
Nginx

Those sets of files can be analysed and/or shared in case of HPA issue:

Component	Path
Server	`{{hpa_base_dir}}/hpa/modules/introscope/logs/em.log*`
Server	`{{hpa_base_dir}}/hpa/modules/hpa-extractor-/logs/`
Server	`{{hpa_base_dir}}/hpa/modules/nginx/logs/output.log`
Agent	`{{hpa_agent_dir}}/wily/logs/*`

Agents behaviour

In case of violent crash of the JVM, as the agent is running inside, it will also end up
- For JVM running very shortly, like ST, in a such context, agent can even miss its first sent of data to the portal, and then the ST execution could not appear into
In case of JVM stop, the agent will send data immediately
- Depending on the nature of metrics captured by the agent, close to the JVM end, some values could be not fully accurate
- Specially true for some ST metrics, as they JVM life duration is limited compared to back-end components

Agents behaviour can also depends on customizations that can be partially visible here

Bad events number are inconsistent

In the PS EVENTS tile of Engines menu, if the number of bad events shown does not match the number displayed in the WebAdmin, it’s’most likely due to frequent restarts of the corresponding engine.

For instance, the Liquidation engine shows a high number of bad events (while the WebAdmin shows 0):

bad events liquidation

By examining the ENTRY POINTS tile and filtering on ‘EngineManager’, we can see that this engine manager has processed thousands of operations on the Liquidation Engine (see the ‘requests’ column). In essence, it indicates that the engine has been restarted numerous time:

engine_manager

Because the bad event count shown in WebAdmin corresponds only to the events recorded since the last restart, the value displayed in the PS EVENTS tile differs from what appears in WebAdmin.

Further details on the bad event metric and other JMX metrics are available on the Calypso Engine Server and Engine metrics page.

Data collection session 0 disk free space

Minimum free space required for the agents at 2.59T
- A property is missing in the agent profile
- hpa.dumpDirectory.freeSpaceInMb=20000
- Details available here
Disk free space for agent at 0 - for all agents of the same host
- The user used to deploy the agent is probably not the correct one or is not having the sufficient Linux privileges
Data collection session monitor not running on a host
- The user used to deploy the agent is probably not the correct one or is not having the sufficient Linux privileges

Deactivating HPA agents

3 ways are available to quickly deactivate the HPA agents:

Select all the flags and services in the Agents Settings and deactivate them
- Does not require any Calypso JVM restart
Change the ENABLE value in hpa_agent_config.sh
- Requires a Calypso JVM restart

export ENABLE=0

Rename the HPA agent folder
- Requires a Calypso JVM restart

Error log in agents file

HPA log files can meet errors into:

feature is deactivated to save overhead but error trace can not be disabled, occurs during startup

[ERROR] [IntroscopeAgent.Agent] Cannot create the lock file for persistence api in null
[ERROR] [IntroscopeAgent.ExtensionDeployer] Cannot create the lock file in <path>

during the startup, agent register itself on EM, sometimes first attempt is not successful, so it trace a log and retry automatically.

[ERROR] [IntroscopeAgent.Agent] The com.hexagon.wily.service.HpaAgentBeanService extension was unable to register the bean

feature is deactivated to save overhead but when HPA settings are changed, class redefinition can occur and then this log is displayed in agent logs.

[ERROR] [IntroscopeAgent.Agent] IntervalHeartbeat.execute threw executing: Introscope Dynamic Instrumentation Behavior
java.lang.NullPointerException: Cannot invoke "com.wily.util.task.AsynchExecutionQueue.IExecutionQueue_addExecutableItem(com.wily.util.task.IExecutableItem)" because "this.fAgeAllQueue" is null
at com.wily.introscope.agent.trace.hc2.WilyTransactionStructure.ageAllImmediately(WilyTransactionStructure.java:2981)
at com.wily.introscope.autoprobe.dynamic.InstrumentationManager.wrapUpRedefinition(InstrumentationManager.java:459)
at com.wily.introscope.autoprobe.dynamic.InstrumentationManager.processAllClasses(InstrumentationManager.java:412)
at com.wily.introscope.autoprobe.dynamic.InstrumentationManager.access$2(InstrumentationManager.java:360)
at com.wily.introscope.autoprobe.dynamic.InstrumentationManager$DirectivesSyncBehavior.ITimestampedRunnable_execute(InstrumentationManager.java:566)
at com.wily.util.heartbeat.IntervalHeartbeat$BehaviorNode.execute(IntervalHeartbeat.java:972)
at com.wily.util.heartbeat.IntervalHeartbeat.executeNextBehaviorAndCalculateSleepTime(IntervalHeartbeat.java:491)
at com.wily.util.heartbeat.IntervalHeartbeat.access$2(IntervalHeartbeat.java:445)
at com.wily.util.heartbeat.IntervalHeartbeat$HeartbeatRunnable.run(IntervalHeartbeat.java:668)
at java.base/java.lang.Thread.run(Thread.java:840)

Extractor OOO

In case of OOO (Out Of Memory) issue on the hpa extractor, memory must be increased:
- Edit the file {{hpa_base_dir}}/hpa/versions/<version>/hpa-extractor-<extractor version>/module_controller.bash
- Modify the line HPA_MODULE_JOPTS="$HPA_MODULE_JOPTS -Xmx2G" and consider the value to use
- Stop the extractor via the controller {{hpa_base_dir}}/hpa/./controller.sh stop hpa-extractor-<extractor version>
- Start again (the controller starts only the stopped component(s)) {{hpa_base_dir}}/hpa/./controller.sh start

JVM memory allocation

-Xmx runtime parameter sets the Maximum amount of JVM Heap Memory for this JVM. An additional parameter, that is usually used in tandem with -Xmx, is -Xms, which controls the initial allocation of JVM Heap Memory.

For example, if the -Xms512m -Xmx3g combination of parameters were used to start a Java program, HPA would show 512 MB as the value GC Heap:Bytes Total metric. When the GC Heap:Bytes In Use reaches a certain point near the 512 MB mark, the JVM will increase the GC Heap:Bytes Total metric by a certain percentage (the exact number depends on the JVM provider and the GC algorithms used).

If the Heap memory usage ever grows (due to a potential memory leak or an unoptimized JVM heap for this particular application usage), the JVM will throw a java.lang.OutOfMemoryError indicating that a maximum of 3GB of HEAP allocated to this JVM was not enough to perform its activities.

To avoid false alert when monitoring the memory usage of a JVM, it is recommended to set the threshold to 95% or higher for GC Monitor:Percentage of Java Heap Used.

Missing metrics

Each agent profile is defining a max number of metrics it can record, a metric being defined by its full name

Beyond this value metrics are no more recorded

At the agent restart, the counter comes back to 0

For some metrics, their number can be optimized by removing some attributes of the metrics

Some less useful metrics can be also deactivated via the agents settings, but a restart is required to reset the recorded metrics for the current agent session

Navigator names are too numerous

Cf this link

Nginx

nginx: [emerg] bind() to 0.0.0.0:443 failed (13: Permission denied)
- Visible in the nginx output.log file
- If nginx is listening on a port lower than 1024, special rights must be allocated
- Those rights can disappear when nginx is reinstalled or when security policies patches are deployed
- The best action to correct this issue is to launch this command
  - setcap cap_net_bind_service=ep {{hpa_base_dir}}/hpa/modules/nginx-<nginx version>-bin/sbin/nginx

HPA installation and deployment HPA troubleshooting