March 2023
Raw socket HTTP downloader
A robust, wget-like file-downloading program implementing network protocols like IPv4-TCP packet creation/management, congestion and flow control, setup and tear-down.
Repo: https://github.com/alex-w-99/Raw-Socket-HTTP-Downloader
Demo video: https://youtu.be/xw9mefBVJo4
The challenge
When you download a file using wget or your browser, countless abstractions handle the complexity of network communication for you. Those abstractions nest like matryoshka dolls: each layer hides the machinery inside the one below it, until the actual packets on the wire are several layers deep. There is, however, a clear didactic value in stripping those layers away and seeing what actually happens on the wire.
This project is a wget-style HTTP downloader that bypasses the operating system's networking stack entirely, constructing and managing every IPv4 and TCP packet by hand using raw sockets.
Building the internet protocol stack
Most applications rely on high-level APIs that hide the intricacies of network communication. This program takes the opposite approach, operating at the network layer, manually:
- Crafting IPv4 and TCP headers for every outgoing packet
- Parsing incoming packets bit by bit to extract data
- Managing TCP handshakes (SYN-ACK connection setup and FIN-ACK teardown)
- Implementing congestion control to handle network conditions
- Handling flow control to prevent overwhelming the receiver
- Retransmitting dropped packets when necessary
Think of it as building your own tiny operating system network stack, tailored specifically for HTTP downloads.
Technical details
The implementation required careful attention to low-level details:
- Using
SOCK_RAW/IPPROTO_RAWsockets for direct packet access - Kernel bypass techniques (disabling GRO) to prevent automatic packet coalescing
- Configuring iptables to prevent the kernel from interfering with our custom TCP implementation
- Properly calculating checksums and sequence numbers
- Managing receive windows and acknowledgments
Why this matters
While no one would use this in production (the OS networking stack exists for good reason!), building a custom implementation from scratch provides invaluable insight into:
- How the Internet works beneath the abstractions
- The elegance and complexity of TCP's reliability mechanisms
- The challenges of managing stateful connections
- Performance considerations in network I/O
In the demo video (linked above), you can watch Wireshark capture the entire conversation: the three-way handshake establishing the connection, the steady stream of data packets and acknowledgments, and the graceful teardown when complete.