Serving Static Websites Using Lambda@Edge

📅 20 March 2021, updated 4 April 2021 🔖 cloud-software ⏲️ 3 minutes to read

⚠️ This post was last updated in 2021, meaning its contents may be outdated.

A typical approach to static websites on AWS involves a CloudFront distribution pointed at an S3 bucket.

One drawback to this approach is that in the event of a cache miss, CloudFront must retrieve the content from S3. If the bucket is in Ireland and the edge cache is in Australia, this will mean a round trip between those points.

In reality this latency isn't really a problem since it's only evident on a cache miss, but (more as as an experiment than anything else) I decided to write a proof of concept to serve static content entirely from the edge cache.

This meant baking an entire website into a Lambda@Edge package. And, as a result of that, this blog is now served that way; the whole of alanedwardes.com is baked into a Lambda@Edge function.

Describing Static Content

To serve content from our Lambda@Edge package, we'll write files with the response format expected by CloudFront. This example is under content/robots.txt in the package zip:

{
    "status": 200,
    "headers": {
        "Content-Type": [
            {
                "key": "Content-Type",
                "value": "text/plain"
            }
        ]
    },
    "body": "VXNlci1hZ2VudDogKgpBbGxvdzogLw==",
    "bodyEncoding": "base64"
}

This format allows returning whatever will fit in the package, so even binary files like images are possible. It also allows customising the status code and response headers.

Lambda@Edge Function

This Lambda function will operate on the Origin Request event, since it will allow faking an origin response (and bypassing the origin entirely). The below code is for a Node.JS 12 runtime environment.

The full code can be found here on GitHub.

exports.handler = async (event, context) => {
    const request = event.Records[0].cf.request;

    try {
        // Try to get the content using this URI
        return getContent(getContentPath(request.uri));
    }
    catch (err) {
        // Not found, try the compensation workflow
    }

    try {
        const adjustedUri = request.uri + '/';

        // Try to get the same content with a trailing slash
        getContent(getContentPath(adjustedUri));

        // It exists, redirect to it
        return generateRedirect(adjustedUri);
    }
    catch (err) {
        // Not found, serve a 404 response
    }

    return getContent('content/errors/404');
};

An overview of the logic:

Try to get the file matching the URI from the current package
If it's found, return it to CloudFront
If it's not found, try adding a trailing slash
If that is found return a redirect response
If it's still not found, return the 404 response

The static content does have to be processed by the function, but from my testing even when loading and running JSON.parse() on image responses spanning a few kilobytes, the overhead was under 10 milliseconds.

Gathering Static Content

To gather static content automatically for this website, I wrote a C# library which does the following:

Recursively gathers content from a website using href and src attributes
Writes the content to a ZIP archive
Updates the Lambda@Edge function code, publishing a new version
Updates the CloudFront distribution to point at the new Lambda version

The library can be used like this:

var services = new ServiceCollection();

services.AddLogging(x => x.AddConsole());
services.AddHttpClient();
services.AddFreezer();

var crawler = services.BuildServiceProvider().GetRequiredService<IFreezer>();

crawler.Freeze(new FreezerConfiguration
{
    BaseAddress = new Uri("https://uncached.alanedwardes.com"),
    ResourceWriter = x => new AmazonLambdaAtEdgeResourceWriter(new AmazonLambdaAtEdgeResourceWriterConfiguration
    {
        LambdaName = "AeBlogEdgeResponder",
        DistributionId = "E295SAMVLG12SQ"
    }, new AmazonLambdaClient(RegionEndpoint.USEast1), new AmazonCloudFrontClient())
}, CancellationToken.None).GetAwaiter().GetResult();

I baked the above into this website's admin panel, so a new version of the website can be gathered, pushed into a Lambda@Edge function package and deployed at the push of a button.

There's also an implementation for pushing content to Amazon S3, which can be used by using the AmazonS3WebsiteResourceWriter.

Limitations

There are some stringent limitations that make this approach only applicable to small, static websites:

Lambda functions which are invoked using the Origin Request event are limited to a 1MB response size
The maximum size of the Lambda function deployment package is 5MB
It doesn't seem possible to invalidate the CloudFront cache when a new Lambda is deployed (they're two separate operations which will finish at different times)
The maximum number of requests per second is 10,000 (see below)
The maximum number of concurrent executions is 1,000 (see below)
Costs $0.60 per 1M requests, $0.00005001 for every GB-second (see below)

The last points are negated if you're also using caching in CloudFront, meaning that an edge cache will only request the content once.

🏷️ website lambda@edge cloudfront cache lambda code origin s3 file maximum distribution bucket miss serve baked

Alan Edwardes