Skip to content

FERR guard in FunctionEnvironmentReloadInstrumentation skips identity-based storage on Functions Flex Consumption #4712

@ahmedmuhsin

Description

@ahmedmuhsin

Summary

On Azure Functions Flex Consumption (Linux, Java) using identity-based storage (no AzureWebJobsStorage connection string, only AzureWebJobsStorage__accountName + AzureWebJobsStorage__credential), the Application Insights Java agent never runs its specialization initializer. As a result, custom logs are not correlated to operation_Id and sampling stays at the cold default (sampled=false on exported spans, no exports).

The agent's FunctionEnvironmentReloadInstrumentation aborts when the bare AzureWebJobsStorage env var is absent. On identity-based Flex apps that env var is intentionally not set, so the FERR (FunctionEnvironmentReload) signal that should trigger the agent's specialization is dropped.

Environment

  • Azure Functions Flex Consumption, Linux
  • Java 17 (also reproduces with other Java versions on Flex)
  • host.json: "telemetryMode": "OpenTelemetry", extension bundle [4.*, 5.0.0)
  • AI Java agent: 3.7.6 (image-bundled at /azure-functions-host/workers/java/agent/applicationinsights-agent.jar)
  • Storage: identity-based (AzureWebJobsStorage__accountName / AzureWebJobsStorage__credential, no bare AzureWebJobsStorage)

Observed symptoms

  1. Custom logs from function code arrive in App Insights traces with no operation_Id.
  2. Worker-side OpenTelemetry spans show sampled=false; downstream sampling decisions are stuck at 0.
  3. The agent's diagnostic log does not emit Application Insights Java Agent specialized successfully or Application Insights Java Agent disabled after FERR.

Background: how the agent specializes on Flex

The Java worker's worker.config.json selects an AppInsightsPlaceholder profile when env INITIALIZED_FROM_PLACEHOLDER=true. That profile loads -javaagent:applicationinsights-agent.jar with -DLazySetOptIn=false. The agent installs bytecode transformers at premain, but the telemetry pipeline (exporter, sampler, connection string, role name) cannot be configured at premain because no app is assigned yet. So configuration is deferred to specialization:

  1. SecondEntryPoint registers AzureFunctions.setup(hasConnectionString, AzureFunctionsInitializer).
  2. On FERR, FunctionEnvironmentReloadInstrumentation advice calls AzureFunctions.configureOnce(), which runs the AzureFunctionsInitializer lazily.
  3. AzureFunctionsInitializer.run() consults the opt-in gate (APPLICATIONINSIGHTS_ENABLE_AGENT env or -DLazySetOptIn sys prop). If enabled, it reads the connection string and applies the runtime config.

Root cause

FunctionEnvironmentReloadInstrumentation has this early return:

if (System.getenv("AzureWebJobsStorage") == null) {
    return;
}

On identity-based Flex workers this env var is absent (storage is configured via AzureWebJobsStorage__accountName + AzureWebJobsStorage__credential), so the advice returns before invoking AzureFunctions.configureOnce(). The agent stays in its pre-specialization state for the life of the worker. The presence of AzureWebJobsStorage__accountName is not considered.

This was empirically confirmed with an A/B on the same app:

State of AzureWebJobsStorage hasConnectionString after FERR Worker span exports
absent (identity-based default) not consulted, agent never specializes 0, sampled=false
present (bare conn string added) true, agent specializes full exports, sampled=true

Reproduction

  1. Create a Flex Consumption Java function app with identity-based storage (do not set a bare AzureWebJobsStorage app setting).
  2. Set host.json -> "telemetryMode": "OpenTelemetry".
  3. Deploy an HTTP function that logs via context.getLogger() and emits a worker-side OpenTelemetry span.
  4. Invoke it after specialization (avoid forcing a cold start). Observe:
    • traces rows have no operation_Id.
    • Worker spans have sampled=false and no exports.
    • Agent diagnostic log lacks the specialization message.

Workarounds and drawbacks

Workaround A: Add a bare AzureWebJobsStorage connection string AND set APPLICATIONINSIGHTS_ENABLE_AGENT=true

Both app settings are required together. Adding only one is not sufficient.

  • AzureWebJobsStorage=<full connection string> makes the FERR guard pass so the agent's specialization runs.
  • APPLICATIONINSIGHTS_ENABLE_AGENT=true flips the agent's opt-in gate to true so specialization actually configures the telemetry pipeline (instead of self-disabling). This second setting is required because Flex Consumption does not auto-set it the way Linux Dedicated does. On a portal-created Linux Dedicated Java app, APPLICATIONINSIGHTS_ENABLE_AGENT=true is added to app settings automatically by the Functions / App Service provisioning layer when Application Insights is configured. On Flex, this auto-set does not happen, so users must set it explicitly. This is a Functions platform gap, not an agent bug, and is tracked separately.

Drawbacks:

  • Storing a bare AzureWebJobsStorage connection string defeats the purpose of identity-based storage. The connection string is a long-lived account key on app settings, exactly what identity-based storage is intended to eliminate.
  • Two settings must be coordinated. Missing either silently degrades telemetry, and the failure mode is hard to diagnose without reading the agent code.

Workaround B: Force the worker to restart after the app is assigned (e.g. set languageWorkers__java__arguments to a harmless value)

The restarted worker still receives the Placeholder profile and loads the agent with -DLazySetOptIn=false, but because the app is already assigned by the time the new worker starts, APPLICATIONINSIGHTS_CONNECTION_STRING is present in the process env when premain runs. SecondEntryPoint configures the telemetry client at premain, hasConnectionString() returns true, and configureOnce() short-circuits at FERR. The opt-in gate is never consulted, so neither APPLICATIONINSIGHTS_ENABLE_AGENT nor LazySetOptIn matters. The FERR guard also no longer matters because the lazy path is bypassed.

Drawbacks:

  • Every worker recycle and scale-out pays a full Java cold start. For Java workloads this is the most expensive part of placeholder warm-start to lose.

Suggested fix

Relax the FERR guard in FunctionEnvironmentReloadInstrumentation so identity-based storage is also accepted. Options:

  • Also treat the worker as Functions-storage-configured when AzureWebJobsStorage__accountName is present.
  • Or remove the guard entirely. The FERR signal itself is the trigger that matters; the guard exists to filter out non-Functions reloads, but the FERR advice path is already Functions-specific.

The first option preserves the original intent of the guard while fixing the identity-based case. Either way, no agent-default change is needed; the existing APPLICATIONINSIGHTS_ENABLE_AGENT opt-in continues to work the same way it does on Linux Dedicated.

Related (not part of this issue)

The need to explicitly set APPLICATIONINSIGHTS_ENABLE_AGENT=true on Flex is a Functions platform gap: Linux Dedicated portal-created apps get this set for the user automatically, Flex does not. That is being tracked separately with the Functions Flex team and is not a bug in this agent. It is mentioned here only so that users following Workaround A have the full picture.

References (code paths)

  • agent/instrumentation/azure-functions/.../FunctionEnvironmentReloadInstrumentation.java (the guard)
  • agent/agent-bootstrap/.../AzureFunctions.java (configureOnce, hasConnectionString)
  • agent/agent-tooling/.../init/SecondEntryPoint.java (AzureFunctions.setup wiring)
  • agent/agent-tooling/.../init/AzureFunctionsInitializer.java (run, isAgentEnabled, initialize)
  • agent/instrumentation/azure-functions/.../InvocationInstrumentation.java (sampling override gated on hasConnectionString)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions