[SPARK-56845][K8S] Truncate ConfigMap names that exceed DNS subdomain limit#55874
Open
TongWei1105 wants to merge 1 commit into
Open
[SPARK-56845][K8S] Truncate ConfigMap names that exceed DNS subdomain limit#55874TongWei1105 wants to merge 1 commit into
TongWei1105 wants to merge 1 commit into
Conversation
… limit When `spark.app.name` is very long (>229 chars), the derived `resourceNamePrefix` plus a fixed suffix (e.g. `-hadoop-config`, `-krb5-file`, `-driver-podspec-conf-map`) can exceed the Kubernetes DNS subdomain 253-char limit. The K8s API then rejects the ConfigMap with `must be no more than 253 characters`, failing driver submission. Unify ConfigMap name construction through a single helper `KubernetesClientUtils.configMapName(prefix, suffix)` that mirrors the fallback strategy already used by `KubernetesConf.driverServiceName`: when the preferred name is too long, fall back to `spark-<uniqueID><suffix>`, which preserves uniqueness across concurrent applications and keeps the name within the limit. The three call sites (HadoopConfDriverFeatureStep, KerberosConfDriverFeatureStep, PodTemplateConfigMapStep) are migrated to the helper, and the two `def newConfigMapName` fields are converted to `lazy val` so the fallback's `uniqueID()` is captured exactly once - otherwise the ConfigMap would be created with one name while the pod mounted another. `lazy val` (rather than `val`) avoids spuriously computing - and emitting a fallback warning for - a name that is never used (e.g. the step is constructed but no Hadoop/Kerberos conf is set).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What changes were proposed in this pull request?
Add a new overload
KubernetesClientUtils.configMapName(prefix, suffix)that falls back tospark-<uniqueID><suffix>whenprefix+suffixexceedsKUBERNETES_DNS_SUBDOMAIN_NAME_MAX_LENGTH(253), mirroring thefallback strategy already used by
KubernetesConf.driverServiceName.Migrate the three driver-side ConfigMap call sites to the new helper:
HadoopConfDriverFeatureStep(suffix-hadoop-config)KerberosConfDriverFeatureStep(suffix-krb5-file)PodTemplateConfigMapStep(suffix-driver-podspec-conf-map)The two
def newConfigMapNamefields are converted tolazy valso the fallback's randomuniqueID()is captured exactly once — otherwise the ConfigMap would be created with one name while the pod's volumereferences another.
lazy val(rather thanval) avoids spuriously computing — and emitting a fallback warning for — a name that is never used (e.g. the step is constructed but no Hadoop/Kerberos conf is set).Note: this also changes the truncation behavior of the existing single-arg
configMapName(prefix)(@Since("3.3.0")), which now delegates to the new overload. Spark's own callers(
configMapNameDriver/configMapNameExecutor) use short fixed prefixes (~22 chars) and never hit the fallback, so behavior for built-in callers is unchanged. External@DeveloperApiconsumers passing very longprefixes will see different — but safer, collision-free — names than before.
Why are the changes needed?
When
spark.app.nameis very long (>229 chars), the derivedresourceNamePrefixplus a fixed suffix exceeds the Kubernetes DNS subdomain 253-char limit. The K8s API then rejects the ConfigMap withmust be no more than 253 characters, failing driver submission. This PR makes the three driver-side ConfigMap names robust to long app names.Does this PR introduce any user-facing change?
Yes — driver submission with very long
spark.app.nameno longer fails. Submissions that previously failed will now succeed; the affected ConfigMaps will be created with names likespark-<uniqueID>-hadoop-configinstead. A warning is logged when the fallback is used.For users of the public
KubernetesClientUtils.configMapName(prefix)API: the truncation strategy for over-long prefixes changed from "take first N chars of prefix" to "fall back tospark-<uniqueID>-conf-map".This avoids silent name collisions across applications that happened to share the first 244 chars of their prefix. Spark's own callers always use short prefixes and are unaffected.
How was this patch tested?
Added unit tests:
KubernetesClientUtilsSuite: verifies the new helper returnsprefix+suffixwithin the limit, falls back tospark-<id><suffix>when over the limit, and that the legacy single-arg overload still produces the-conf-mapsuffix.HadoopConfDriverFeatureStepSuite,KerberosConfDriverFeatureStepSuite,PodTemplateConfigMapStepSuite: each adds a "very long resourceNamePrefix" case asserting (a) the resulting ConfigMap name is withinthe limit, and (b) the pod's volume references the exact same name as the created ConfigMap (regression guard for the
def → lazy valchange).Was this patch authored or co-authored using generative AI tooling?
Yes,Generated-by: Claude Code (Opus 4.7)