Memoirs of an HTTP Request

Ashi Agrawal
Affirm Tech Blog
Published in
7 min readOct 23, 2020

--

At Affirm, we’re building a financial network that improves transactions across the entire retail ecosystem. In accordance with our core values, we provide our partners with simple, transparent ways — lightweight APIs and intuitive web interfaces — to integrate and to work with us. We’ve begun moving to a microservice-based architecture, complicating how we serve these API requests. This post outlines our routing philosophy and discusses the roles that the content delivery network, load balancer, and reverse proxy layers play in our routing architecture, with a focus on correctness, performance, and security.

Routing Philosophy

Our current routing architecture was built in accordance with the following principles:

Correctness: Enforce clear separation between services, prevent routing collisions, and defend against misconfigured routing by establishing and replicating routing boundaries.

Performance: Performance optimizations follow a two-pronged approach of defensive load minimization across all services and offensive optimizations for particular services.

Security: Malicious or fraudulent traffic should be stopped as early as possible. Security measures and levers should be available at all levels of our architecture.

In accordance with this routing philosophy, we built out three layers. Once HTTP requests are resolved via DNS (a fascinating topic out of scope here), they are routed to our content delivery network. After filtering the requests, the content delivery network forwards requests to our load balancer, which then routes requests to the appropriate service and node. Each node houses an NGINX process which forwards directly to an application server. Let’s dig into the specifics!

Content Delivery Network

A cloud delivery network (CDN) provider hosts globally distributed datacenters or points of presence (POPs), pushing caching geographically closer to the user, thereby reducing request latency. A CDN also offers last-mile performance optimizations and, via a web application firewall (WAF), traffic filtering to flag and to block malicious requests.

Performance

To maintain the overall health of our system, we minimize the load it receives by aggressively caching responses, both for static assets and for significant load reduction on non-state-changing requests. Our promos service, which powers the As Low As messaging across our and our partners’ sites, receives roughly 80 million requests a day at the CDN, accounting for more than a quarter of our API traffic. This messaging evolves with our financing programs and is personalized when possible but is otherwise relatively static. Computing each response is intensive, using a database, other services, our custom events framework, and its own application level caching. By caching at our CDN, we reduce requests to our backend by over half, minimizing our infrastructure costs and protecting our infrastructure from spikes in load.

We’re working on improving our cache hit rate, partly by migrating to a multi-tier POP cache architecture. In traditional single-layer POP networks, cache updates to one edge POP don’t propagate to other edge POPs. Once edge caches expire, they all incur the latency penalty to the origin server to update their caches. In a multi-tier architecture, there’s an additional shield cache between the edge caches and the origin server. When retrieving a new response from origin, both the shield and edge caches are updated, meaning that other stale edge caches need only to retrieve an updated response from the shield cache instead of going to the origin.

CDNs also offer a suite of last-mile performance optimizations — we’ll cover just one tuning experiment that we ran. For video streams, our provider suggests tuning the client’s initial congestion window (CWND), which controls the amount of bytes sent out together over TCP. Though we host little to no streaming, we thought reducing the number of packets necessary to download the content by raising the CWND to 30 (from 10) might be valuable. On Affirm JS, a high volume endpoint used across our merchant integrations, this change decreased cold cache download time from 100 ms to under 70 ms at P50. As a result, we saw a drop in time to last byte (TTLB), enabling our merchant integrations to be more responsive.

Security

Our WAF filters requests using lists of known vulnerabilities or attributes of attackers, such as request headers associated with known malicious bots, and is tailored to match on rules most relevant to our stack. Adding a WAF catches requests from bad actors early, but it doesn’t have the prerequisite knowledge of our application to finely target malicious patterns. As a financial services company, when our endpoints are targeted by fraud rings or other hackers, we rely both on robust application-layer fraud detection and on alerting and manual pattern matching to ascertain the shape of the attack. Once we’ve identified patterns, such as requests originating from certain IPs or using certain User-Agents, we can create custom WAF rules or use an access list (ACL) at the edge to block based on those patterns.

Load Balancer

Our load balancer handles our traffic split to services, enforcing clear separability while minimizing the amount of infrastructure service owners handle. In addition, the load balancer is responsible for assigning traffic for each service to one of its nodes.

Routing

Our requests are matched to a service using path-based routing at a level 7 load balancer, which assigns traffic using HTTP-level information such as host name and path. As we split into microservices, namespacing APIs by service and version allows us to layer in percentage-based weighting to cutover traffic to the new service. In addition to establishing clear separability between services, this branching establishes boundaries in routing ownership, with our infrastructure team owning routing to each service, and service teams owning routing thereafter to the appropriate endpoint and logic.

Once a request is routed to a node group, we use a round robin approach to pick which node the request actually goes to. Round robin offers a simple and efficient approach that has served us well thus far. As we grow, we may explore other options, but for now we find tuning latency closer to the application layer more effective.

Reverse Proxy

While NGINX is often used as a load balancer, API gateway, and more, we use it as a reverse proxy and content cache. As a lightweight, low-level piece of infrastructure, NGINX is used to replicate path-based routing calls as an additional safety check. Since we run each NGINX process alongside its application server on each node, it’s the last to interact with requests, making it ideal for security measures, and the first to interact with our responses, making it ideal for response compression.

Routing

Replicating our service routing rules at NGINX allows us to fail quickly on any misconfigurations at the load balancer. During one such incident, canary traffic for one of our services went to nodes for another service but failed quickly and wasn’t routed to application logic due to the replicated path-match. We also utilize NGINX to proxy requests for our static site and React apps to an S3 bucket backend. While these requests originally went through the application code, removing that layer for our sign-in and sign-up endpoints dropped time to first byte (TTFB) from roughly 150 ms to under 100 ms at P50. Low latencies on these React apps is especially important since they include high-traffic sites such as our user portal — we’d like to continue to pull these requests higher up the stack to reduce latency even further.

Performance

Similar to how the CDN minimizes the request load on our infrastructure, NGINX minimizes the response size as it travels back to the user to minimize latency. Along with gzip across all endpoints, we’ve implemented Brotli compression for Affirm JS. Since this endpoint is heavily cached, we raised the compression level to eleven, trading a higher initial compression time with a smaller compressed result and lower transmission latency. After turning the compression up, we see the TTLB max drop below 1.5 seconds at P95.

NGINX also holds a content cache, used primarily for a couple of flows without our CDN. To optimize our caching further, we tried implementing the stale while revalidate pattern with ETags, but found that it was inefficient as we have so many nodes per service that requests were spread out too thinly. Similar to how a single-layer cache architecture at the CDN suffers from needing to go to the origin server each time, stale while revalidate with decentralized NGINX caches suffers from updates to one NGINX cache not propagating to other caches.

Security

Since NGINX lives on each node, it is ideal for security features such as request origin validation and IP-based blocking that require per-service tuning. Request origin validation checks that state-changing requests with cookies originate from our site, which prevents bad actors from mounting a CSRF attack. Since we are an integration-heavy company, we do have a handful of endpoints that we expect to be called from other origins, in which case we need to provide an easy bypass mechanism. While origin validation is an opt-out measure, we also provide opt-in functionality for IP-based blocking via an NGINX configuration snippet. For internal tools, we need a reliable method of restricting access to internal VPN users as a first line of defense before authentication. We abstract out the IP blocklist for non-VPN IPs to NGINX in order to maintain that list centrally.

Summary

This post traced the route of HTTP requests through our infrastructure, highlighting caching, compression, and more. We’re in the middle of rethinking our infrastructure by building on top of Kubernetes and evaluating different ingress and routing ideas from service mesh to API Gateway. If you enjoyed this post, we’d love to hear from you — we’re hiring for engineers across the company. Reach me with thoughts at ashi.agrawal@affirm.com.

Thank you to Jordan Jeffries, Sally Yen, Sophia Li, Srikanth Raju, and Zubin Joseph for their feedback on drafts of this post.

--

--