Monday, 19. April 2021 · 10 minutes

Today we faced an interesting problem with the deployment of our new website to Webflow . Currently, we have lots of SEO-relevant aggregation pages and public profiles we’re not immediately ready to move to the new application. These pages are automatically generated from our old application server, receive serious traffic (40k views per day), and cannot be replicated in Webflow, our new website provider.
Unfortunately, those pages are all placed on the root domain (matchory.com/offending-pages), and by “lots”, I meant several hundreds of them. This means we can’t simply add redirects in Webflow, but have to keep pointing the DNS for matchory.com to our own web servers. There, we take it the other way around and redirect the traffic for all non-SEO pages to Webflow. But how (TL;DR: jump to the final config )?


Luckily, we use nginx on our web servers, so we could use the proxy module to forward requests to another upstream server. Normally this is used to pass requests to an internal application server that should not be exposed to the internet directly, but of course nothing’s stopping you from proxying to a public IP address instead! We took this to our advantage by forwarding requests for our website to the main webflow host at https://proxy-ssl.webflow.com, if they match certain conditions.

How did we determine the target host?
Webflow provides directives for using a custom domain that include adding a CNAME record that points to this host:

www.matchory.com. IN CNAME proxy-ssl.webflow.com. 3600

This means whatever server serves this SSL proxy at webflow probably knows how to terminate SSL for our custom domain. If it can do that, we should be equally able to perform HTTP requests with a matching host header against it!

Configuring the upstream

We start by defining an upstream we want our requests to proxy to:

upstream webflow {
    server proxy-ssl.webflow.com:443;
}

We only add a single server here; the upstream is simply responsible for having a single target to route to in the rest of the config file. Note the port at the end of the host name: The default is port 80, even if we use the https protocol later on.

Adding a resolver

To make sure nginx is able resolve the hostname in the upstream block, we will need to configure a resolver. If you haven’t already, you can put it into your http block in the main config file (probably /etc/nginx/nginx.conf), but it may equally well be added to your site config, or server block.

resolver 8.8.8.8 8.8.4.4;

This uses the public Google resolvers , but you can use whichever public DNS you fancy.

Routing a single location to Webflow

Now that we have a working upstream, we can forward locations to it:

location = /foo {
    proxy_pass https://webflow;
}

This would proxy any request to /foo to the Webflow server, using the https protocol. At least it should: We miss lots of configuration flags, so the above would fail miserably. To make it work, we’ll have to understand what is actually happening in the background if a request is forwarded!

Excourse: Request proxying, SNI, and SSL

Imagine executing a simple HTTP request against our website. It will look like this:

GET / HTTP/1.1
Host: www.matchory.com

That instruct whoever is receiving it to deliver the index page of the host www.matchory.com.

Now suppose we’d set up nginx to proxy /foo to another server, let’s say, foo.matchory.net:

location = /foo {
    proxy_pass http://foo.matchory.net;
}

In that case, nginx would pick up our request, and perform another HTTP requeston its own - think using curl on the server. This request will look something like this:

GET /foo HTTP/1.1
Host: foo.matchory.net
X-Real-Ip: 1.2.3.4
X-Forwarded-Proto: http

As you can see, the host header is being set to the name of the upstream server. This actually makes sense, because we’re performing a plain old HTTP request to the upstream here! Then again, most of the time, the application running on the upstream server identifies itself as the web server: Our hypothetical foo server probably thinks its name was www.matchory.com. This is important so it can create fully qualified links to other resources it provides, for example.
So to convey the host header to the primary domain, we can instruct nginx to set the Host header:

location = /foo {
    proxy_pass http://foo.matchory.net;
    proxy_set_header Host $host;
}

This will ensure the host header received by nginx will be passed on to the proxy, which can now successfully respond to the request. Nice!

What happens, though, if we throw HTTPS into the mix?


Nothing extraordinary. Instead of sending a plain HTTP request to the upstream server, nginx will re-encrypt the request received before passing it on. While HTTPS requests are encrypted with the public key of the server, they do in fact expose the target hostname. Otherwise, we couldn’t host multiple domains on the same server with different certificates, because the web server simply wouldn’t know which private key to use for decrypting the request.
Therefore, web servers support a procedure called Server Name Identification, or SNI for short. If enabled, a client will send the hostname it attempts to connect to during the SSL handshake, in plain text. This allows the server to retrieve the appropriate encryption configuration before starting the actual handshake (Note that I’m grossly over-simplifying, but the details really aren’t that important here). After the secure connection was negotiated successfully, the client sends the actual, encrypted request.

If you’ve been following closely, you might have noticed that there’s a difference between the Host header we’ve used in the previous example, and the SNI name sent during the SSL handshake: The header isn’t even sent during the initial connection! If you simply attempted to swap “http” for “https” in the above example, your request would fail, because nginx would request to connect to foo.matchory.net instead of www.matchory.com, before sending the request with the proper hostname.
This might work for a backend application you control, but not for a SaaS platform like Webflow, that uses a webserver that dynamically resolves hosts.

Setting the SNI name to proxy to

Everytime I wonder whether nginx can so something, I learn that of course there’s a flag for that–you just need to know which. In this case, it’s two of them: proxy_ssl_server_name and proxy_ssl_name . The former enables SNI for the upstream connection, the latter sets the name to use during SNI. This yields the following location:

location = /foo {
    proxy_pass              https://webflow;    # Pass the request to the webflow upstream
    proxy_ssl_server_name   on;                 # Enable SNI
    proxy_ssl_name          $host;              # Set the SNI name to the current host
    proxy_set_header        Host $host;         # Set the host header in the request
}

This ensures the request will be directed at the correct server, by properly conveying the target server during the SSL handshake (via SNI) and HTTP request parsing (via header).

Performance optimizations

The last part missing here is the upstream certificate validation: nginx will attempt to verify the certificate of the upstream server, expecting that we control it and need to make sure the server actually carries one of our certificates. As we’re connecting to an externally managed service, this isn’t necessary, and we can simply turn validation off. Additionally, using the proxy_ssl_session_reuse flag, we can configure nginx to reuse SSL sessions so it doesn’t have to perform a full SSL handshake on every request. Update the location block to its full form:

location = /foo {
    proxy_pass              https://webflow;
    proxy_ssl_verify        off;
    proxy_ssl_session_reuse on;
    proxy_ssl_server_name   on;
    proxy_ssl_name          $host;

    proxy_set_header Host               $host;
    proxy_set_header X-Real-Ip          $remote_addr;
    proxy_set_header X-Forwarded-For    $remote_addr;
    proxy_set_header X-Forwarded-Proto  https;
}

Routing multiple locations to Webflow

If you wanted to proxy multiple locations with different URIs to Webflow, copy-pasting this would become tedious soon. By using another cool nginx feature, we can keep our configuration file short and concise: Named locations . Named locations are location blocks which cannot be reached by a request automatically, but referred to in other directives instead. Specifically, you may only use them in error_page, post_action or try_files, which is exactly what we are going to do. Using try_files, we can direct nginx to test multiple resources and use the first that exist. This is often useful to attempt serving static files before falling back to a web app:

location / {
    try_files $uri $uri/ /index.php;
}

Instead of a PHP file, we can also use a named location as the fallback file:

location @fallback {
    # ...
}

location / {
    try_files $uri $uri/ @fallback;
}

As we don’t want to serve anything from the proxy, we just need a dummy to try before passing the request to our named location (nginx requires at least two different arguments to try_files). This actually took me a bit, until I stumbled upon this post on ServerFault which suggests using /dev/null as the first argument. I agree with the author that this is a terrible, terrible but also beautifully simple workaround! This allows the following construct, with no performance hit:

location @fallback {
    # ...
}

location / {
    try_files /dev/null @fallback;
}

This will send all requests to the fallback location immediately. Applying this to our situation, we end up with the following (shortened for clarity):

upstream webflow {
    server proxy-ssl.webflow.com:443;
}

server {
    location @webflow {
        proxy_pass https://webflow;
        # ...
    }

    location = / {
        try_files /dev/null @webflow;
    }

    location ^~ /([a-z]{2})/.*$ {
        try_files /dev/null @webflow;
    }
}

With just a single line, we can proxy a select URI pattern to our Webflow site, without any changes to the existing site, the SSL certificates, or Webflow settings. Any performance hit by the doubled requests is levelled out by Cloudflare caching in front of nginx.

As is often the case, I would have taken considerably longer if it wasn’t for the excellent people posting on StackOverflow and ServerFault. Thank you all!

Full nginx config

The final nginx configuration looks like the following:

# Start by defining an upstream to send requests to. This helps organization.
upstream webflow {

    # Make sure to add the port after the server name, because nginx won't do this
    # automatically, even if proxying to a https target.
    server proxy-ssl.webflow.com:443;
}

# Make sure nginx can resolve the webflow domain correctly
resolver 8.8.8.8 8.8.4.4;

server {

    # The server name of our primary domain. Note that this configuration will be used for
    # both the old _and_ the new website.
    server_name www.domain.com;

    # This named location is only routed internally, and will proxy any requests to the 
    # webflow upstream with the proper configuration set.
    location @webflow {

        # Note the "https" here, making sure the request is re-encrypted
        proxy_pass                          https://webflow;

        # Set the host header of our current request (probably www.domain.com). We want to
        # forward that to the upstream, so they can properly route the request.
        proxy_set_header Host               $host;

        # Setting these is good faith
        proxy_set_header X-Forwarded-For    $remote_addr;
        proxy_set_header X_FORWARDED_PROTO  https;

        # Disable SSL verification: As we don't own the Webflow upstream, we can't keep 
        # track of their certificate. We trust their upstream anyway, so if they went rogue,
        # we couldn't do anything about it otherwise, either.
        proxy_ssl_verify        off;

        # Reuse SSL sessions: This prevents having to perform a separate SSL handshake for
        # every single request, but pool connections instead.
        proxy_ssl_session_reuse on;

        # SSL server name: This is the crucial bit. During SNI (server name identification),
        # nginx uses the name of the upstream by default, which would be Webflow's
        # proxy-ssl.webflow.com in our case. Here, we enable manual ambiguation.
        proxy_ssl_server_name   on;

        # Set the actual SNI name to send: Here, we use our Host header value instead, so 
        # the request can be identified on the Webflow origin server.
        proxy_ssl_name          $host;
    }

    # This block matches any URLs that start with a language code, and forwards them to the
    # webflow named location.
    location ~^ /([a-z]{2})/.*$ {
        try_files /dev/null @webflow;
    }

    # This block matches all requests for the homepage (eg. a single "/"), and forwards them
    # to the webflow named location.
    location = / {
        try_files /dev/null @webflow;
    }

    # This is the default location for all other requests, routed to our legacy application.
    location / {
        try_files $uri $uri/ @app;
    }

    # ...
}