Feat/use durable functions emulator image#8708
Conversation
…tead of building image using binary
# Conflicts: # samcli/local/docker/durable_functions_emulator_container.py
… fails to start up
| f"Durable Functions Emulator container failed to become ready within {timeout} seconds. " | ||
| "You may set the DURABLE_EXECUTIONS_EMULATOR_IMAGE_TAG env variable to a specific image " | ||
| "to ensure that you are using a compatible version. " | ||
| "Check https://gallery.ecr.aws/o4w4w0v6/aws-durable-execution-emulator. " |
There was a problem hiding this comment.
minor, but can you use self._get_emulator_image() here too?
maybe
f"Check https://${self._get_emulator_image().replace('public.ecr', 'gallery.ecr')}. "
?
or have another method that uses the same variables? Basically so if we change the location of the image, we have a consistent messaging too
There was a problem hiding this comment.
ah yeah, I can definitely do that
| "--store-path", | ||
| "/tmp/.durable-executions-local/durable-executions.db", |
There was a problem hiding this comment.
I don't see this being used before. Is this a new thing needed?
There was a problem hiding this comment.
Similar question about the other properties --log-level, --lambda-endpoint..
There was a problem hiding this comment.
yeah we removed a "layer" to the emulator, these were previously being set by default but not transparently to the SAM CLI code
There was a problem hiding this comment.
Note that this is the path within the container, not the path on the customers machine (but we create a docker "volume" to make it persistent). Maybe you could leave a comment explaining that?
| "--store-type", | ||
| self._get_emulator_store_type(), |
There was a problem hiding this comment.
In line 187 this is already added to the environment variables, so we probably don't need to add it here again.
"EXECUTION_STORE_TYPE": self._get_emulator_store_type(),
There was a problem hiding this comment.
ah, EXECUTION_STORE_TYPE is part of the emulator "layer" that was removed. So it's not being used anymore. I can remove it from the environment variables.
There was a problem hiding this comment.
conversely, I can also use the "new" environment variables instead:
There was a problem hiding this comment.
Ah. I see. I think both alternatives work, as long as it's clear. I think it makes sense to keep it here and remove it from the env vars.
There was a problem hiding this comment.
"HOST": "0.0.0.0", # ❌ UNUSED - hardcoded, passed via --host
"PORT": str(self.port), # ❌ UNUSED - passed via --port (command wins)
"LOG_LEVEL": "DEBUG", # ❌ UNUSED - hardcoded, passed via --log-level
"EXECUTION_STORE_TYPE": self._get_emulator_store_type(), # ❌ UNUSED - passed via --store-type (command wins)
"EXECUTION_TIME_SCALE": self._get_emulator_time_scale(), # ❌ UNUSED - not even a CLI arg!
so only
self.port - Can be overridden by user
self._get_emulator_store_type() - Can be overridden by user
so maybe:
command=[
"dex-local-runner", "start-server",
"--host", "0.0.0.0",
"--port", str(self.port),
"--log-level", "DEBUG",
"--lambda-endpoint", "http://host.docker.internal:3001",
"--store-type", self._get_emulator_store_type(),
"--store-path", "/tmp/.durable-executions-local/durable-executions.db",
],
environment={
"AWS_ACCESS_KEY_ID": "foo",
"AWS_SECRET_ACCESS_KEY": "bar",
"AWS_DEFAULT_REGION": "us-east-1",
}
| "start-server", | ||
| "--host", | ||
| "0.0.0.0", | ||
| "--port", |
There was a problem hiding this comment.
I guess not related to this PR, but do we have plan to assign this port dynamically. Currently it would just use 9014 no matter it's taken or not
There was a problem hiding this comment.
oh really, I thought the code already dynamically assigned the port according to the env variable.
There was a problem hiding this comment.
humm, which part set this envvar, I might have missed that
There was a problem hiding this comment.
I guess technically, the code doesn't dynamically detect if 9014 is currently used since it's the default but we have the ability to set it to something else.
There was a problem hiding this comment.
Yeah that's what I mean, probably we should consider the dynamic assign. Currently in the tests if I run 2 DAR test in parallel it would fail for port in use.
There was a problem hiding this comment.
This port is static intentionally because of the order in which the containers are created.
When we create an instance of the DurableLambdaContainer class, we have to tell the lambda container what port the emulator is listening on (by setting the AWS_ENDPOINT_URL_LAMBDA in the container environment to the emulator port first), then we start (or attach) the emulator. Since the lambda container port itself is set dynamically, we pass it through to the emulator as part of the execution payload and the emulator keeps track of it (this is also how we support concurrent executions).
However, if the emulator port was set dynamically, we'd have to update the lambda runtime container environment after the lambda container is created, which isn't possible.
| _RAPID_SOURCE_PATH = Path(__file__).parent.joinpath("..", "rapid").resolve() | ||
| _EMULATOR_IMAGE = "public.ecr.aws/ubuntu/ubuntu:24.04" | ||
| _EMULATOR_IMAGE_PREFIX = "samcli/durable-execution-emulator" | ||
| _EMULATOR_IMAGE = "public.ecr.aws/o4w4w0v6/aws-durable-execution-emulator:" |
There was a problem hiding this comment.
This is not the final URL right
| _RAPID_SOURCE_PATH = Path(__file__).parent.joinpath("..", "rapid").resolve() | ||
| _EMULATOR_IMAGE = "public.ecr.aws/ubuntu/ubuntu:24.04" | ||
| _EMULATOR_IMAGE_PREFIX = "samcli/durable-execution-emulator" | ||
| _EMULATOR_IMAGE = "public.ecr.aws/o4w4w0v6/aws-durable-execution-emulator:" |
There was a problem hiding this comment.
nit (since it needs a tag - and concatenate the : when we combine them):
| _EMULATOR_IMAGE = "public.ecr.aws/o4w4w0v6/aws-durable-execution-emulator:" | |
| _EMULATOR_IMAGE_PREFIX = "public.ecr.aws/o4w4w0v6/aws-durable-execution-emulator" |
|
|
||
| def _get_emulator_image(self): | ||
| """Get the full emulator image name with tag.""" | ||
| return self._EMULATOR_IMAGE + self._get_emulator_image_tag() |
There was a problem hiding this comment.
| return self._EMULATOR_IMAGE + self._get_emulator_image_tag() | |
| return f'{self._EMULATOR_IMAGE}:{self._get_emulator_image_tag()}' |
There was a problem hiding this comment.
ahh yeah this would be better, thanks
| """ | ||
| Allow pinning to a specific emulator image tag/version | ||
| """ | ||
| ENV_EMULATOR_IMAGE_TAG = "DURABLE_EXECUTIONS_EMULATOR_IMAGE_TAG" |
There was a problem hiding this comment.
sam local invoke has a --invoke-image parameter, where customers can pass the location of a specific image to be used as the execution base image instead of the default Lambda base image.
could explore using this established pattern rather than adding ENV_EMULATOR_IMAGE_TAG?
There was a problem hiding this comment.
I think that is a good idea however due to time constraints I won't have any more capacity to work on this for a while so I'd be happy to have this as a follow up item. I'm guessing I'll have to confer with SAM CLI team for the name of the parameter, updating the docs, etc.
I don't think having it as an environment variable makes it substantially harder to use in the meantime.
There was a problem hiding this comment.
Personally I don't think this should be exposed as a new parameter on the CLI. Our expectation is for the latest image to always be "stable", so it should only be overridden in unexpected cases. The way I see this, having this be overridable at all is like a backdoor, and we shouldn't expect customers to need to use this.
Do you agree?
|
I ran integration tests locally and they all passed |
valerena
left a comment
There was a problem hiding this comment.
LGTM. Waiting for the final official image location.
bchampp
left a comment
There was a problem hiding this comment.
Nice - I much prefer this approach, 1 blocking comment about the --lambda-endpoint and two minor suggestions.
| "--lambda-endpoint", | ||
| "http://host.docker.internal:3001", |
There was a problem hiding this comment.
This needs to be removed. We pass in the lambda endpoint dynamically as part of the payload to starting the execution. I think the emulator is already be falling back on 3001 (this is the sam local start-lambda port), but having this here is misleading since it's not actually used for the interactions between the emulator and lambda container.
| "--store-path", | ||
| "/tmp/.durable-executions-local/durable-executions.db", |
There was a problem hiding this comment.
Note that this is the path within the container, not the path on the customers machine (but we create a docker "volume" to make it persistent). Maybe you could leave a comment explaining that?
| f"Durable Functions Emulator container failed to become ready within {timeout} seconds. " | ||
| "You may set the DURABLE_EXECUTIONS_EMULATOR_IMAGE_TAG env variable to a specific image " | ||
| "to ensure that you are using a compatible version. " | ||
| f"Check https://${self._get_emulator_image().replace('public.ecr', 'gallery.ecr')}. " | ||
| "and https://github.com/aws/aws-durable-execution-sdk-python-testing/releases " | ||
| "for valid image tags. If the problems persist, you can try updating the SAM CLI version " | ||
| " in case of incompatibility." |
There was a problem hiding this comment.
If the emulator itself doesn't become ready, it likely indicates a problem with the emulator server. We have an environment variable that will capture the emulator logs and persist them: https://github.com/aws/aws-sam-cli/blob/develop/samcli/local/docker/durable_functions_emulator_container.py#L63
I wonder if we should add instructions on how to use this here?
Which issue(s) does this change fix?
#8488
Why is this change necessary?
This introduces a fix for the Homebrew linkage issue with the Durable Functions emulator. We are also introducing this so that SAM CLI can receive Durable Functions emulator updates uncoupled with SAM CLI release.
How does it address the issue?
We are automatically vending the emulator via an image published to ECR. See aws/aws-durable-execution-sdk-python-testing#196. Since the arm64 emulator is no longer included as an executable directly in the SAM CLI, this also fixed the issue where Homebrew users of SAM CLI could not emulate durable functions locally without doing some workaround.
What side effects does this change have?
As a bonus, this reduces the SAM CLI size.
Mandatory Checklist
PRs will only be reviewed after checklist is complete
make prpassesmake update-reproducible-reqsif dependencies were changedBy submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.