May 2023
Custom CDN
Efficient, geographically distributed CDN that builds and deploys a DNS server and HTTP servers for content caching and delivery from a single origin server.
Repo: https://github.com/alex-w-99/Custom-CDN
Demo video: https://youtu.be/wZDAvp1cLME
The problem
Imagine you're Netflix. You've got users all around the world, but your actual video files live on a small set of origin servers. You could try to stand up thousands of origin servers around the world, but that quickly falls apart: origin servers need to stay perfectly consistent, they're expensive to operate, and they become bottlenecks when too many users hit them directly.
If every video stream had to come from those few origins, people far from them would see high latency, the origins would get crushed under load, and performance would tank. Content Delivery Networks (CDNs) solve this by pushing cached copies of content out to edge servers worldwide so users are served locally, not from a handful of overloaded origins.
Building a CDN from scratch
This project implements a complete CDN infrastructure with minimal dependencies, demonstrating the core principles that power modern content delivery:
The architecture
- One DNS server: Routes incoming requests to the optimal edge server
- Seven HTTP servers: Geographically distributed worldwide to serve content
- One origin server: Holds all original content (provided for this project)
When a client requests content, they first query the DNS server, which determines their location and directs them to the nearest HTTP server via DNS. That edge server either serves cached content immediately or fetches it from the origin and caches it for future requests.
The constraints
To simulate real-world tradeoffs, strict limits were imposed:
- 20 MB total cache per server (disk + memory combined)
- Content popularity follows a Zipfian distribution (a few items are very popular, most are rarely accessed)
- Goal: Minimize average download time across all requests
The implementation
The solution employs a multi-tiered caching strategy:
- 13 MB of popular content is pre-deployed to each edge server's disk during setup
- An additional 18.5 MB in-memory cache holds frequently accessed items for instant retrieval
- 7 MB of additional content is downloaded by each server at startup to round out the cache
- Geolocation-based routing uses MaxMind's GeoIP service to map clients to optimal servers
The DNS server is multi-threaded and caches location lookups to minimize latency. Each HTTP server implements a two-tier cache: check memory first, fall back to disk, and only then fetch from origin.
Why geographic distribution matters
A user in Tokyo gets routed to the Tokyo edge server. A user in Boston gets routed to the Boston server. This geographic optimization dramatically reduces latency. (Even with light-speed transmission in fiber optic cables, intercontinental distances create noticeable delays.)
The engineering challenge
Building this system required juggling:
- DNS protocol implementation and query handling
- HTTP server design with intelligent caching policies
- Deployment automation across distributed cloud infrastructure
- Performance optimization under strict memory constraints
- Cache eviction strategies based on content popularity
The result is a working CDN that demonstrates how services we use daily, streaming video, downloading software updates, browsing the web, are delivered efficiently across a global infrastructure.