Custom CDN

May 2023

Custom CDN

Efficient, geographically distributed CDN that builds and deploys a DNS server and HTTP servers for content caching and delivery from a single origin server.

Repo: https://github.com/alex-w-99/Custom-CDN

Demo video: https://youtu.be/wZDAvp1cLME

The problem

Imagine you're Netflix. You've got users all around the world, but your actual video files live on a small set of origin servers. You could try to stand up thousands of origin servers around the world, but that quickly falls apart: origin servers need to stay perfectly consistent, they're expensive to operate, and they become bottlenecks when too many users hit them directly.

If every video stream had to come from those few origins, people far from them would see high latency, the origins would get crushed under load, and performance would tank. Content Delivery Networks (CDNs) solve this by pushing cached copies of content out to edge servers worldwide so users are served locally, not from a handful of overloaded origins.

Building a CDN from scratch

This project implements a complete CDN infrastructure with minimal dependencies, demonstrating the core principles that power modern content delivery:

The architecture

One DNS server: Routes incoming requests to the optimal edge server
Seven HTTP servers: Geographically distributed worldwide to serve content
One origin server: Holds all original content (provided for this project)

When a client requests content, they first query the DNS server, which determines their location and directs them to the nearest HTTP server via DNS. That edge server either serves cached content immediately or fetches it from the origin and caches it for future requests.

The constraints

To simulate real-world tradeoffs, strict limits were imposed:

20 MB total cache per server (disk + memory combined)
Content popularity follows a Zipfian distribution (a few items are very popular, most are rarely accessed)
Goal: Minimize average download time across all requests

The implementation

The solution employs a multi-tiered caching strategy:

13 MB of popular content is pre-deployed to each edge server's disk during setup
An additional 18.5 MB in-memory cache holds frequently accessed items for instant retrieval
7 MB of additional content is downloaded by each server at startup to round out the cache
Geolocation-based routing uses MaxMind's GeoIP service to map clients to optimal servers

The DNS server is multi-threaded and caches location lookups to minimize latency. Each HTTP server implements a two-tier cache: check memory first, fall back to disk, and only then fetch from origin.

Why geographic distribution matters

A user in Tokyo gets routed to the Tokyo edge server. A user in Boston gets routed to the Boston server. This geographic optimization dramatically reduces latency. (Even with light-speed transmission in fiber optic cables, intercontinental distances create noticeable delays.)

The engineering challenge

Building this system required juggling:

DNS protocol implementation and query handling
HTTP server design with intelligent caching policies
Deployment automation across distributed cloud infrastructure
Performance optimization under strict memory constraints
Cache eviction strategies based on content popularity

The result is a working CDN that demonstrates how services we use daily, streaming video, downloading software updates, browsing the web, are delivered efficiently across a global infrastructure.