TL;DR — The One-Minute Version
google.com. DNS knows the number — 142.250.195.68. Without DNS, you'd have to memorize IP addresses for every website you visit, like memorizing phone numbers before contacts existed on your phone.
Think about how you use your phone's contact list. You tap "Mom" — you don't dial 555-012-3456 from memory. DNS does the same thing for the entire internet. When you type google.com into your browser, DNS figures out which IP addressAn IP (Internet Protocol) address is a unique number that identifies every device on the internet — like a street address for computers. IPv4 example: 142.250.195.68. IPv6 example: 2607:f8b0:4004:800::200e. You can find any website's IP with: dig google.com +short that name points to, so your browser knows where to send its request.
But this isn't some abstract concept — it's a real thing you can watch happen. Open a terminal right now and type dig google.com +short. You'll see an IP address come back. That's DNS working in real time. Try a few more:
The genius of DNS is that no single server holds all the answers. Instead, responsibility is split across a hierarchyDNS uses a tree-like structure with layers. The root zone sits at the top (13 clusters, operated by orgs like NASA and ICANN). Below that are TLDs (.com, .org, .in). Below those are individual domains (google.com). Each layer only knows about the layer directly below it.: root servers at the top (13 clusters, run by organizations like Verisign, NASA, and ICANN), then servers for .com, .org, .io, and so on, then servers for individual domains. And at every step, results get cachedCaching means storing a copy of a DNS answer closer to you so the full lookup doesn't repeat every time. Your browser caches DNS results (~1 min). Your OS caches them (~minutes to hours). Your ISP caches them too. You can see your OS cache right now: on Windows run ipconfig /displaydns, or check Chrome's cache at chrome://net-internals/#dns — so most lookups never travel the full chain.
dig google.com +trace.
The Scenario — October 21, 2016: The Day the Internet "Went Down"
October 21, 2016. A Friday morning. People across the US East Coast start noticing something weird: Twitter won't load. Netflix is dead. Reddit — gone. GitHub, Spotify, PayPal, the New York Times, CNN — all down. Within minutes, social media (on the platforms that still work) erupts: "Is the entire internet down?"
But here's the twist. None of those companies' servers actually crashed. Twitter's servers were running fine. Netflix's CDNA Content Delivery Network — a network of servers spread worldwide that cache and deliver content (videos, images, web pages) from locations close to users. Netflix uses its own CDN called Open Connect with servers in ISP facilities globally. The CDN was working fine during the Dyn attack — but nobody could reach it because DNS was down. was healthy. GitHub's code repositories were untouched. The problem was something far more subtle: nobody could find them.
A company called DynDyn (pronounced "dine") was a major DNS provider based in Manchester, New Hampshire. They hosted the authoritative DNS for hundreds of major websites — meaning they were the final source of truth for translating those domain names into IP addresses. Dyn was acquired by Oracle in 2016 shortly after this incident. — a DNS provider based in New Hampshire — was under attack. The Mirai botnetMirai was malware that infected IoT devices (security cameras, DVRs, home routers) by trying default passwords like "admin/admin". At its peak, Mirai controlled 100,000+ devices and used them to flood targets with traffic. The source code was publicly released in September 2016, a month before the Dyn attack. Three college students later pled guilty to creating it., a network of over 100,000 hijacked IoT devices — security cameras, DVRs, home routers still using their default passwords — was flooding Dyn with traffic. Not a trickle. 1.2 terabits per second. That's enough to download 150 DVDs every single second, all aimed at one target.
Dyn wasn't some obscure service. It was the authoritative DNS providerAn authoritative DNS server is the "final answer" server for a domain. When someone asks "what's the IP for twitter.com?", the chain of DNS lookups eventually reaches the authoritative server — the one that OWNS the definitive answer. Dyn hosted authoritative DNS for hundreds of major sites, making it a single point of failure. for hundreds of major websites. That means when you typed twitter.com, the DNS lookup chain eventually reached Dyn's servers to get the final answer. With Dyn overwhelmed, that answer never came back. Your browser would sit there, spinning, and eventually show ERR_NAME_NOT_RESOLVED.
Meanwhile, anyone who already knew Twitter's IP address — or had it cached — could reach Twitter just fine. The attack didn't break any website. It broke the phone book. And without the phone book, most of the internet became invisible.
If DNS is just a lookup service — translating names to numbers — why did its failure take down Twitter, Netflix, and GitHub? These are companies with completely separate infrastructure, different data centers, different teams. How could one service being overwhelmed make ALL of them unreachable?
Think about what step comes before "connecting to a server" in your browser.
The First Attempt — One File to Rule Them All
Before DNS existed, the internet ran on a single text file. Literally. One file, maintained by one person, at one university. And you can still find that file on your computer right now.
On Linux or Mac, it's at /etc/hosts. On Windows, it's at C:\Windows\System32\drivers\etc\hosts. Open a terminal and look:
$ cat /etc/hosts
# Host Database
# localhost is used to configure the loopback interface
127.0.0.1 localhost
255.255.255.255 broadcasthost
::1 localhost
> type C:\Windows\System32\drivers\etc\hosts
# Copyright (c) 1993-2009 Microsoft Corp.
# This is a sample HOSTS file
127.0.0.1 localhost
::1 localhost
That file on your machine is a living fossil — the direct descendant of how the entire internet used to work.
In the 1970s, a researcher named Elizabeth Feinler at Stanford's Network Information Center (NIC)The NIC at Stanford Research Institute (SRI) was the administrative hub of ARPANET. Elizabeth Feinler and her team managed the host name registry, assigned network addresses, and maintained the HOSTS.TXT master file. She essentially ran the internet's "phone book" by hand for over a decade. She's one of the unsung pioneers of the internet. maintained a single master file called HOSTS.TXT. This file listed every computer on the ARPANETThe Advanced Research Projects Agency Network — the precursor to the modern internet, funded by the US Department of Defense. Started in 1969 connecting four universities: UCLA, Stanford, UC Santa Barbara, and the University of Utah. By 1983 it had grown to 562 hosts. ARPANET was decommissioned in 1990. and its corresponding address. Every other computer on the network would periodically download this file via FTPFile Transfer Protocol — one of the oldest internet protocols (1971). Computers would literally dial up Stanford's FTP server, download the latest HOSTS.TXT, and replace their local copy. Imagine 500+ computers all downloading the same file every day from one server — it was as fragile as it sounds. from Stanford's server. Want to know if a new computer joined the network? Download the latest HOSTS.TXT.
Here's what the actual HOSTS.TXT format looked like (this is the real format from the ARPANET era):
; HOSTS.TXT — The Network Information Center
; SRI International, Menlo Park, California
; Last updated: October 1983
;
HOST : 10.0.0.73 : SRI-NIC,SRI-NIC : DEC-2060 : TOPS20 : TCP/TELNET,FTP
HOST : 10.1.0.13 : MIT-AI,MIT-AI : DEC-KL10 : ITS : TCP/TELNET,FTP
HOST : 10.0.0.4 : UTAH-CS,UTAH-CS : VAX-11/750 : UNIX : TCP/TELNET,FTP
HOST : 10.2.0.11 : UCLA-CCN,UCLA-CCN : IBM-370/3032 : OS/MVT : TCP/TELNET
HOST : 10.3.0.20 : BBN-UNIX,BBN-UNIX : VAX-11/780 : UNIX : TCP/TELNET,FTP
HOST : 26.0.0.73 : SRI-KL,SRI-KL : DEC-KL10 : TOPS20 : TCP/TELNET,FTP
; ... 562 entries total
The process to add a new computer to the internet went like this: you'd call Elizabeth Feinler on the phone (or email her), she'd manually add your hostname and IP address to the file, and then every other computer on the network would eventually download the updated copy. If she was on vacation or if Stanford's server was down, no updates happened. The entire internet's name system depended on one person and one machine.
127.0.0.1 myapp.local and your browser thinks myapp.local is a real website running on your machine. You can even use it to block websites: add 127.0.0.1 facebook.com and Facebook disappears from your computer. Try it — then undo it when you're done.
The ARPANET had 562 hosts in 1983 and the hosts file was already breaking down. Today there are over 350 million registered domain names. What specific problems would happen if we tried to scale the hosts file approach to today's internet?
Think about file size, update frequency, naming conflicts, and who gets to decide what name belongs to whom.
Where It Breaks — The Math That Killed the Hosts File
By 1983, the ARPANET had 562 hosts and the system was already creaking. But let's not just say "it didn't scale" — let's do the actual math and see exactly why a centralized file was doomed.
Today there are roughly 350 million registered domain names. Let's see what would happen if we tried to stuff them all into one file:
But the file size is just the start. There are four fundamental problems, and every single one traces back to the same root cause: centralization.
The math is brutal. Even if we solved the bandwidth problem (maybe a diff-based download?), we'd still have the authority problem. Who gets to decide that google.com belongs to Google? Should the government of India control .in domains? Should a university in Japan manage .jp? In a single-file world, one organization makes all these decisions. That doesn't scale politically, organizationally, or technically.
By 1983, the ARPANET community knew the hosts file was doomed. The question wasn't whether to replace it, but how. The answer came from a researcher at USC's Information Sciences InstituteThe Information Sciences Institute (ISI) is a research lab at the University of Southern California. Paul Mockapetris worked here when he invented DNS. Fun fact: ISI still operates b.root-servers.net — one of the 13 root DNS server clusters. So the place where DNS was invented still runs part of the DNS infrastructure today. — and his solution was so elegant that it's still running, nearly unchanged, 40+ years later.
The Breakthrough — A Hierarchy of Servers, Each Managing Its Own Slice
In November 1983, Paul Mockapetris published RFC 882 and RFC 883RFCs (Request for Comments) are the official documents that define internet protocols. RFC 882 and 883 defined the original DNS specification. They were later superseded by RFC 1034 (concepts) and RFC 1035 (implementation) in 1987, which are still the foundation of DNS today. You can read them yourself at tools.ietf.org/html/rfc1034., and the Domain Name SystemDNS — the system Paul Mockapetris invented to replace the hosts file. Instead of one file with all names, it splits the database into a hierarchy where each level manages the level below it. Root servers know TLDs. TLD servers know domains. Domain servers know subdomains. Nobody has to know everything. was born. The key insight was deceptively simple: don't store everything in one place — split the database into a hierarchy, and let each level manage itself.
Think about how a library organizes books. You don't have one massive alphabetical list of every book ever written. Instead, there are sections (Science, History, Fiction), then shelves within sections, then individual books on shelves. The librarian in the Science section doesn't need to know anything about the Fiction shelves. Each section manages itself.
DNS works the same way. Every domain name is actually a hierarchy, read right to left with dots as separators. Take maps.google.com:
There are 350 million registered domain names. If you split them by TLD, .com alone has about 160 million. That is one server zone handling 160 million records. If every DNS lookup for a .com domain had to hit that one zone, how many queries per second would it need to handle? (Google alone gets ~100,000 searches/sec, each triggering at least one DNS lookup.) What design trick keeps this from collapsing?
Think about caching TTLs and how many queries actually reach the TLD server vs. being answered from a local cache.
That trailing dot — the one your browser hides — is the root zoneThe root zone is the very top of the DNS tree, represented by a single dot. There are 13 root server clusters worldwide: a.root-servers.net (Verisign), b.root-servers.net (USC-ISI), c.root-servers.net (Cogent), d.root-servers.net (UMD), e.root-servers.net (NASA), f.root-servers.net (ISC), and so on. Despite being only 13 logical addresses, they're backed by 1,700+ physical servers via anycast. Query one yourself: dig @a.root-servers.net . NS. It's the top of the entire tree. And here's the magic: each level of this tree is managed by a different organization, and each only needs to know about the level directly below it.
The root doesn't know what google.com's IP is. It only knows: "for anything ending in .com, ask the .com servers." The .com servers don't know Google's IP either. They only know: "for anything under google.com, ask Google's nameservers." Google's nameservers finally have the answer: "maps.google.com is at 142.250.195.68."
This hierarchy solved every problem from the hosts file era in one stroke:
- No single point of failure — thousands of servers share the load. The root level alone has 1,700+ physical servers spread worldwide via anycastAnycast is a routing technique where the same IP address is announced from multiple physical locations. When you query a.root-servers.net, you reach the nearest physical instance — which might be in Mumbai, Tokyo, or New York. This is how 13 logical root servers become 1,700+ physical servers, providing both speed and redundancy.
- No name collisions —
mail.google.comandmail.yahoo.comcoexist perfectly because they live in different branches of the tree - No bandwidth problem — you only ask for the one name you need, not 350 million entries
- Distributed authority — Google controls everything under
google.com. India's NIXI controls.in. Germany's DENIC controls.de. No central gatekeeper needed
Let's make this concrete with real server names you can actually query. The root level has exactly 13 logical server addresses — named a.root-servers.net through m.root-servers.net. Each is operated by a different organization:
Below the root are the Top-Level Domains (TLDs)TLDs are the highest level of domain names. They come in three flavors: generic TLDs (.com, .org, .net — managed by companies like Verisign), country-code TLDs (.in by NIXI, .uk by Nominet, .de by DENIC, .jp by JPRS), and newer sponsored TLDs (.io, .dev, .app). There are 1,500+ TLDs today. You can see the full list at iana.org/domains/root/db — the .com, .org, .in, .uk suffixes. Each is managed by a specific organization: .com is run by Verisign (servers named a.gtld-servers.net through m.gtld-servers.net). .org is run by PIR (Public Interest Registry). .in is run by NIXI (National Internet Exchange of India). .uk by Nominet. .de by DENIC in Germany. Countries control their own corner of the internet.
Below that are the domains you recognize — google, github, wikipedia — each managed by whoever registered them. And below those are subdomainsSubdivisions of a domain that the domain owner creates freely. Google has www.google.com, mail.google.com, maps.google.com, cloud.google.com, and thousands more. You don't need anyone's permission to create subdomains — once you own google.com, you can make anything.google.com. like www, mail, maps, and api — which the domain owner creates freely, no permission needed.
You can watch this entire hierarchy in action with one command. Open your terminal and run:
$ dig google.com +trace
; <<>> DiG 9.18.18 <<>> google.com +trace
;; global options: +cmd
. 518400 IN NS a.root-servers.net. ← Step 1: Root
. 518400 IN NS b.root-servers.net.
. 518400 IN NS c.root-servers.net.
; ... (all 13 root servers)
com. 172800 IN NS a.gtld-servers.net. ← Step 2: .com TLD
com. 172800 IN NS b.gtld-servers.net.
; ... (Verisign's .com servers)
google.com. 172800 IN NS ns1.google.com. ← Step 3: Google's nameservers
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.
google.com. 300 IN A 142.250.195.68 ← Step 4: The answer!
That output tells the whole story. Your computer starts at the root, gets pointed to .com, gets pointed to google.com's nameservers, and finally gets the IP address. Each level only had to know about the level below it. Nobody had to know everything.
dig google.com +short— just the IP, nothing elsedig @8.8.8.8 google.com— ask Google's public resolver specificallydig @1.1.1.1 google.com— ask Cloudflare's resolvernslookup -type=MX gmail.com— find Gmail's mail serversnslookup -type=NS github.com— find who runs GitHub's DNSdig google.com +stats— see the query time (usually < 50ms)- Windows:
ipconfig /displaydns— see your OS-level DNS cache - Chrome:
chrome://net-internals/#dns— see your browser's DNS cache
You type maps.google.com into your browser. Your computer doesn't know the IP. Using the hierarchy we just learned — root servers, then TLD servers, then domain nameservers — walk through the exact steps your computer takes to find the answer. Which servers would be contacted, and in what order? And here's a follow-up: do you think this full chain happens every single time you visit a website?
(Hint: think about what "caching" might mean in this context.)
How It Works — The Full Resolution Journey
You now know DNS is a hierarchy. But what actually happens, step by step, when you type maps.google.com into your browser and press Enter? Let's trace the entire journey — every cache checked, every server contacted, every millisecond accounted for. This is the core of how DNS works in practice, and you can verify every single step on your own machine.
The journey has up to six stops. Most requests never make it past stop two (because of caching). But when the full chain fires — for a domain nobody nearby has visited recently — here's exactly what happens:
Let's walk through each stop in detail. Open a terminal and follow along — you can run these commands yourself as you read.
Stop 1: Your Browser Cache
The very first place your browser checks is its own memory. Chrome, Firefox, Safari — they all keep a small cache of recent DNS lookups so they don't have to ask the operating system for domains you visited seconds ago.
You can see this cache right now. In Chrome, open a new tab and type chrome://net-internals/#dns. You'll see a list of every domain your browser has resolved recently, along with how long the cached answer is still valid. Firefox has something similar at about:networking#dns.
Chrome's cache is deliberately short-lived — typically 60 seconds, or whatever TTLTime To Live — a number (in seconds) attached to every DNS answer that says "cache this for X seconds, then throw it away and ask again." Google uses TTL=300 (5 minutes). Facebook uses TTL=60 (1 minute, because they need fast failover). You can see the TTL on any domain: dig google.com — look for the number between the domain name and "IN A". the DNS server specified, whichever is shorter. If the domain is in the cache and the TTL hasn't expired, you get your answer in 0 milliseconds. No network request at all.
If the browser cache misses — either you haven't visited the domain recently, or the TTL expired — it asks the operating system.
Stop 2: Your OS Resolver Cache
Your operating system maintains its own DNS cache, separate from the browser. This cache is shared across all applications — so if Slack just resolved google.com, and Chrome asks for it a moment later, the OS already knows the answer.
You can inspect this cache right now:
> ipconfig /displaydns
Record Name . . . . : google.com
Record Type . . . . : 1 (A record = IPv4)
Time To Live . . . : 237 (seconds remaining)
Data Length . . . . : 4
A (Host) Record . . : 142.250.195.68
> ipconfig /flushdns <-- clears the cache
$ resolvectl statistics # systemd-resolved stats
$ resolvectl query google.com # resolve + show cache status
$ cat /etc/nsswitch.conf # shows lookup order: files dns
# "files" = check /etc/hosts first, "dns" = then ask DNS
$ sudo dscacheutil -flushcache # clear cache
$ sudo killall -HUP mDNSResponder # restart DNS daemon
$ scutil --dns | head -30 # show resolver config
The OS also checks your /etc/hosts file (or C:\Windows\System32\drivers\etc\hosts on Windows) — remember that ancient file from Section 3? It still gets priority over DNS. If you've added 127.0.0.1 maps.google.com to your hosts file, the OS returns that and DNS is never consulted at all.
If the OS cache also misses, it's time to leave your machine and talk to the network.
Stop 3: Your ISP's Recursive Resolver (The Workhorse)
This is where the real work happens. Your computer sends the query to a recursive resolverA recursive resolver is a DNS server that does all the heavy lifting on your behalf. Instead of you querying root → TLD → authoritative yourself, the resolver does it for you. It's "recursive" because it recursively follows referrals (root says "ask .com," .com says "ask google," and the resolver follows each step) until it gets the final answer. Your ISP runs one. Google (8.8.8.8) and Cloudflare (1.1.1.1) run public ones. — a server whose entire job is to chase down DNS answers for you.
By default, this is your ISP's resolver. If you're on ACT Fibernet in India, it might be 49.207.47.136. On Jio, something like 49.44.121.24. But millions of people choose a public resolver instead — either for speed, privacy, or reliability:
- Google Public DNS:
8.8.8.8and8.8.4.4— the most well-known public resolver - Cloudflare:
1.1.1.1and1.0.0.1— focused on privacy, averages 11ms response time - Quad9:
9.9.9.9— blocks known malicious domains automatically
The resolver has its own cache — and this one is shared across all users. If anyone on your ISP looked up google.com in the last 5 minutes (Google's TTL is 300 seconds), the resolver already knows the answer. For popular domains, the resolver almost never has to do the full lookup.
$ dig @8.8.8.8 maps.google.com +stats # Google's resolver
;; Query time: 12 msec
$ dig @1.1.1.1 maps.google.com +stats # Cloudflare's resolver
;; Query time: 8 msec
$ dig @9.9.9.9 maps.google.com +stats # Quad9's resolver
;; Query time: 14 msec
If the resolver's cache misses, it begins the iterative process: ask the root, then the TLD, then the authoritative server. The resolver does all this work on your behalf so your computer only has to make one request and wait for one answer.
Stop 4: Root Servers — The Top of the Hierarchy
The resolver's first question goes to a root serverRoot servers sit at the very top of the DNS hierarchy. There are 13 logical addresses (a.root-servers.net through m.root-servers.net), but thanks to anycast, there are 1,700+ physical servers spread across the world. When you query a root server, you reach the nearest physical instance. Root servers don't know individual domain IPs — they only know which servers are responsible for each TLD (.com, .org, .net, etc.).: "I need to find maps.google.com. Where do I start?"
The root server doesn't know the answer. It doesn't know anything about google.com specifically. What it does know is which servers are responsible for the .com top-level domain. So it responds with a referralA referral is a DNS response that says "I don't have the answer, but here's who does." It's like asking a librarian where a specific book is, and they say "I don't know, but try the Science section on the third floor." Root servers always respond with referrals to TLD servers. TLD servers respond with referrals to domain nameservers.: "I don't know google.com, but everything ending in .com is handled by a.gtld-servers.net through m.gtld-servers.net. Ask them."
There are 13 root server clusters, operated by 12 different organizations (Verisign runs two: A and J). Despite having only 13 logical IP addresses, there are over 1,700 physical servers worldwide thanks to anycastAnycast is a networking trick where the same IP address is advertised from many physical locations via BGP (the internet's routing protocol). When your resolver queries 198.41.0.4 (a.root-servers.net), the internet's routing system automatically sends your query to the nearest physical instance of that server. From India, you might hit a root server in Mumbai or Singapore. From the US, you'd hit one in your region. This gives both speed and redundancy — if one instance goes down, traffic automatically reroutes to the next nearest one. — the same IP address is announced from data centers worldwide, and the internet's routing system sends your query to the nearest one.
$ dig @a.root-servers.net com NS
;; AUTHORITY SECTION:
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
com. 172800 IN NS c.gtld-servers.net.
...
;; Query time: 22 msec
Notice the TTL: 172800 seconds = 48 hours. TLD assignments almost never change, so root servers tell resolvers to cache this answer for two full days.
The Bootstrap Problem — How Does the Resolver Find Root Servers?
Stop and think about this for a moment. DNS resolves names to IP addresses. But root servers have names like a.root-servers.net. To resolve that name... you'd need DNS. But DNS can't start without knowing the root servers. It's a chicken-and-egg problem. How does the very first DNS query ever work?
If DNS resolves names to IPs, and root servers are identified by names (a.root-servers.net), how does a brand-new resolver that has never made a DNS query find the root servers? It can't use DNS — DNS hasn't started yet.
The answer is beautifully simple: cheating. The root server IPs aren't looked up — they're written down in advance.The answer is a small, unglamorous text file called the root hints file. Every DNS resolver on Earth — from Google's 8.8.8.8 handling billions of queries per day to the tiny DNS cache on your home router — ships with this file pre-installed. It contains the IP addresses (both IPv4 and IPv6) of all 13 root servers. No DNS lookup required. The addresses are just... there.
The file goes by different names depending on the software:
| DNS Software | File Location | Used By |
|---|---|---|
| BIND (most popular) | /etc/bind/db.root or /var/named/named.ca | Most Linux DNS servers, ISPs |
| Unbound | /etc/unbound/root.hints | Cloudflare 1.1.1.1, many resolvers |
| Windows DNS Server | C:\Windows\System32\dns\cache.dns | Active Directory environments |
| macOS | Built into mDNSResponder binary | Every Mac and iPhone |
| systemd-resolved | Compiled into the binary | Modern Ubuntu, Fedora |
You can download the official, canonical version maintained by IANA (the Internet Assigned Numbers Authority) right now:
# Download the official root hints file from IANA:
$ curl -s https://www.internic.net/domain/named.root
# Here's what you'll see (the COMPLETE file — it's tiny):
; This file holds the information on root name servers needed to
; initialize cache of Internet domain name servers.
; (Related file: named.cache)
;
; last update: December 01, 2023
; related version of root zone: 2023120101
;
. 3600000 NS A.ROOT-SERVERS.NET.
A.ROOT-SERVERS.NET. 3600000 A 198.41.0.4
A.ROOT-SERVERS.NET. 3600000 AAAA 2001:503:ba3e::2:30
;
; FORMERLY NS.INTERNIC.NET - operated by Verisign, Inc.
;
. 3600000 NS B.ROOT-SERVERS.NET.
B.ROOT-SERVERS.NET. 3600000 A 170.247.170.2
B.ROOT-SERVERS.NET. 3600000 AAAA 2001:500:200::b
;
; FORMERLY NS1.ISI.EDU - operated by USC-ISI
;
. 3600000 NS C.ROOT-SERVERS.NET.
C.ROOT-SERVERS.NET. 3600000 A 192.33.4.12
C.ROOT-SERVERS.NET. 3600000 AAAA 2001:500:2::c
;
; FORMERLY C.PSI.NET - operated by Cogent Communications
;
; ... (D through M follow the same pattern)
;
. 3600000 NS M.ROOT-SERVERS.NET.
M.ROOT-SERVERS.NET. 3600000 A 202.12.27.33
M.ROOT-SERVERS.NET. 3600000 AAAA 2001:dc3::35
;
; OPERATED BY WIDE Project (Japan)
That's the entire file. 13 entries, each with an IPv4 address (A record) and an IPv6 address (AAAA record). About 3 KB of text. This tiny file is the foundation that the entire internet's naming system bootstraps from.
Notice a few things about this file:
- The TTL is 3,600,000 seconds — that's 41.6 days. The file is saying "these addresses are valid for over a month." Root server IPs almost never change.
- Both IPv4 and IPv6 addresses are listed — so the resolver works on both protocols.
- The comments show the operator — "operated by Verisign," "operated by USC-ISI," etc. These are the 12 organizations trusted with running the internet's root.
- The last update date is in the file — the file is updated roughly once a year, sometimes less frequently. The 2023 version is still current in 2026.
What Happens When a Resolver Boots Up: The Priming Query
When a DNS resolver starts for the first time (or restarts), it doesn't just blindly trust the root hints file forever. It immediately performs what's called a priming query — it picks one of the 13 root server IPs from the hints file and asks: "Hey, what are the CURRENT root server addresses?"
# This is essentially what the resolver does internally:
$ dig @198.41.0.4 . NS
# Response: the CURRENT list of all root servers
. 518400 IN NS a.root-servers.net.
. 518400 IN NS b.root-servers.net.
. 518400 IN NS c.root-servers.net.
...all 13 listed...
# PLUS the "glue records" — their actual IP addresses:
a.root-servers.net. 518400 IN A 198.41.0.4
a.root-servers.net. 518400 IN AAAA 2001:503:ba3e::2:30
b.root-servers.net. 518400 IN A 170.247.170.2
...all 26 addresses (13 IPv4 + 13 IPv6)...
This is the clever part: even if the root hints file is outdated (say, B.ROOT-SERVERS.NET changed its IP address since the file was last updated), the priming query returns the current, live list. As long as at least ONE of the 13 IPs in your hints file still works, the resolver gets the updated list and caches it. The hints file is just the seed — the priming query is the self-correcting mechanism.
This is why root server IP changes are so rare and so carefully coordinated. When B.ROOT-SERVERS.NET changed from 128.9.0.107 to 199.9.14.201 in 2004, and then to 170.247.170.2 in 2017, it was announced months in advance. The old IP continued to work for years alongside the new one. And even if a resolver had a 10-year-old hints file, the priming query to any of the other 12 servers would return the updated address.
Glue Records — Breaking the Other Chicken-and-Egg Problem
There's a second circular dependency hiding in DNS. The root server says: "The .com TLD is handled by a.gtld-servers.net." But wait — a.gtld-servers.net is itself a domain name. To connect to it, the resolver needs its IP address. To get its IP address... it would need to do a DNS lookup. But we're in the MIDDLE of a DNS lookup. Circular again.
The solution is glue recordsGlue records are IP addresses included in a DNS referral response alongside the nameserver names. They "glue" the referral chain together by providing the IP you need to contact the next server — without requiring a separate DNS lookup. Without glue records, DNS resolution would get stuck in circular dependencies whenever a nameserver's name is within the zone it serves. — when a root server refers you to a.gtld-servers.net, it includes the IP address right there in the same response, in a section called "additional records":
$ dig @a.root-servers.net google.com
;; AUTHORITY SECTION: (referral — "ask these servers instead")
com. 172800 IN NS a.gtld-servers.net.
com. 172800 IN NS b.gtld-servers.net.
...
;; ADDITIONAL SECTION: (glue — "and here are their IPs so you don't get stuck")
a.gtld-servers.net. 172800 IN A 192.5.6.30
a.gtld-servers.net. 172800 IN AAAA 2001:503:a83e::2:30
b.gtld-servers.net. 172800 IN A 192.33.14.30
...
Without glue records, DNS resolution would deadlock. The root says "ask a.gtld-servers.net" but you can't find a.gtld-servers.net without completing a DNS query first. Glue records break the cycle by providing the IP inline — no separate lookup needed. This happens at every referral step in the chain: root → TLD and TLD → authoritative, whenever the nameserver's name falls within the zone it serves.
Stop 5: TLD Servers — The .com Level
The resolver now asks the .com TLD server: "Who is responsible for google.com?"
The .com TLD is operated by Verisign. Their servers (a.gtld-servers.net through m.gtld-servers.net) hold the zone fileA zone file is essentially a database of DNS records for a particular zone (level of the hierarchy). The .com zone file is massive — over 14 GB — because it contains nameserver (NS) records for all 160+ million .com domains. Verisign updates this file twice a day. The zone file doesn't contain individual IP addresses for domains — it only contains which nameservers are authoritative for each domain. for all 160+ million .com domains. But it doesn't know Google's IP address — it only knows which nameservers are authoritative for google.com.
Different TLDs are operated by different organizations: .org by PIR (Public Interest Registry), .net by Verisign, .in by NIXI (India), .io by Internet Computer Bureau. Each country has its own operator for its country-code TLD.
$ dig @a.gtld-servers.net google.com NS
;; AUTHORITY SECTION:
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns2.google.com.
google.com. 172800 IN NS ns3.google.com.
google.com. 172800 IN NS ns4.google.com.
;; ADDITIONAL SECTION (glue records):
ns1.google.com. 172800 IN A 216.239.32.10
ns2.google.com. 172800 IN A 216.239.34.10
;; Query time: 35 msec
The "ADDITIONAL SECTION" at the bottom is called glue recordsGlue records solve a chicken-and-egg problem. The TLD says "google.com's nameserver is ns1.google.com." But wait — to reach ns1.google.com, you need to look up google.com first! It's circular. Glue records break the loop by including the IP addresses of the nameservers directly in the referral response. Without glue records, DNS would get stuck in an infinite loop for any domain whose nameservers are under the same domain. — and they solve a clever chicken-and-egg problem. The TLD says "ask ns1.google.com" — but to reach ns1.google.com, you'd need to look up google.com first! Glue records break this loop by including the nameserver IPs directly in the referral.
Stop 6: The Authoritative Nameserver — The Final Answer
The resolver finally reaches Google's own nameservers — ns1.google.com (216.239.32.10), ns2 (216.239.34.10), ns3 (216.239.36.10), or ns4 (216.239.38.10). This is the authoritative server — the one that actually owns the answer. It doesn't refer you anywhere else. It has the record.
$ dig @ns1.google.com maps.google.com
;; ANSWER SECTION:
maps.google.com. 300 IN A 142.250.195.68
;; flags: qr aa rd; QUERY: 1, ANSWER: 1
;; Query time: 18 msec
See the aa flag in the response? That stands for Authoritative Answer — this server isn't relaying a cached copy from somewhere else. This is the definitive answer, straight from the source. The IP you get might vary by your location — Google uses GeoDNSGeoDNS returns different IP addresses depending on where the query comes from. If you're in India, Google's nameserver returns the IP of a nearby Google data center (maybe Mumbai). If you're in the US, you get a US data center. Same domain, different IPs, so everyone gets routed to the nearest server. You can test this: use a VPN to change your location and run dig google.com — you'll get different IPs from different countries. to point you to the nearest data center.
The TTL here is 300 seconds (5 minutes). After that, the resolver's cached copy expires, and the next query for maps.google.com will go through the authoritative lookup again. (Though the root and TLD results stay cached for 48 hours, so those steps are skipped.)
We just walked through 6 stops — browser cache, OS cache, recursive resolver, root server, TLD server, authoritative nameserver. In practice, most DNS queries resolve in under 1 millisecond. How is that possible if the full chain involves talking to servers potentially on different continents? What percentage of queries do you think actually reach the root servers?
(Hint: think about what caching at each level means for a domain that billions of people visit daily, like google.com.)
Going Deeper — Record Types, TTL Strategy, and GeoDNS
So far we've been looking at one type of DNS answer: "this domain maps to this IP address." But DNS is actually a general-purpose distributed databaseDNS stores more than just IP addresses. It stores mail server addresses (MX records), text verification strings (TXT records), aliases (CNAME records), IPv6 addresses (AAAA records), and more. Each piece of data is called a "record," and each record has a type. Think of DNS as a key-value store where the key is the domain name + record type, and the value is the data. that stores many kinds of information about domains — not just IP addresses. Let's explore the details that matter for system design.
When you run dig, the answer section always shows a record type like A, AAAA, or MX. Each type serves a different purpose. Here are the ones that actually matter — with real examples you can verify yourself:
The A record is by far the most common — it's the basic "domain name → IP address" mapping. The AAAA record is the same thing but for the newer IPv6 addresses (the weird-looking ones with colons like 2607:f8b0:4004:800::200e). Most modern websites have both, and your device picks whichever protocol it supports.
The MX record is how email works. When someone sends mail to you@google.com, the sender's mail server looks up the MX record for google.com to find out which server accepts incoming email. The number before the server name (like 10) is the priority — lower number = try first.
The CNAME is an alias. www.github.com is just a pointer to github.com. There's one important restriction: you can't put a CNAME at the zone apexThe zone apex (also called the "naked domain") is the domain without any subdomain — like google.com rather than www.google.com or mail.google.com. You can't use a CNAME here because the DNS spec says CNAME records can't coexist with other record types, and the apex MUST have SOA and NS records. Some DNS providers (like Cloudflare) work around this with a non-standard feature called "CNAME flattening." (the bare domain like github.com without www) because CNAME can't coexist with the SOA and NS records that are required there.
The TXT record is the Swiss Army knife. It stores arbitrary text, and it's used for email security (SPF tells receiving mail servers which IPs are allowed to send email for your domain), domain verification (Google Search Console, Let's Encrypt SSL certificates), and more. Run dig _dmarc.google.com TXT to see Google's DMARC email authentication policy.
Every DNS answer comes with a TTL (Time To Live) — a number in seconds that says "keep this answer cached for this long, then throw it away and ask again." TTL is the single most important tuning knob in DNS, and different companies set it very differently based on their priorities:
Why does Facebook use 60 seconds while Google uses 300? Failover speed. If a Facebook data center goes down, they need DNS to redirect traffic to a healthy data center within 60 seconds. Google is comfortable with a 5-minute window — their infrastructure has enough redundancy that a single data center failure doesn't require instant DNS changes.
Let's do the math on how much caching saves. Google gets roughly 8.5 billion searches per day. With a TTL of 300 seconds, each of the world's estimated 10 million DNS resolvers caches the answer for 5 minutes. That means each resolver only queries Google's nameserver 288 times per day (86,400 seconds / 300). With 10 million resolvers, that's 2.88 billion cached responses per day — meaning roughly 99.7% of DNS lookups for google.com never reach Google's authoritative servers. The caching hierarchy absorbs almost everything.
$ dig google.com # First query — TTL = 300
;; ANSWER SECTION:
google.com. 300 IN A 142.250.195.68
$ sleep 10 && dig google.com # 10 seconds later — TTL = 290
;; ANSWER SECTION:
google.com. 290 IN A 142.250.195.68
# The TTL is counting down. At 0, the resolver re-fetches.
Here's something that surprises most people: google.com doesn't have one IP address. It has many, and the one you get depends on where you are. If you're in India, Google's nameserver returns the IP of a data center near Mumbai. If you're in California, you get a California data center. Same domain, different answers, different servers — and it all happens through DNS.
There are two techniques at play here. GeoDNS returns different IP addresses based on the requester's geographic location — the nameserver looks at where the query came from and picks the nearest data center's IP. Anycast is a lower-level routing trick: the same IP address is announced from many locations via BGPBorder Gateway Protocol — the routing protocol that the internet uses to figure out how to get data from point A to point B. Every ISP and data center announces which IP ranges it can reach. When Cloudflare announces 1.1.1.1 from 300+ locations, every router on the internet picks the shortest path to the nearest Cloudflare server. This happens at the network level, below DNS., and routers automatically send your traffic to the nearest physical server.
AWS Route 53 offers several flavors of this: weighted routing (send 70% of traffic to us-east-1, 30% to eu-west-1), latency-based routing (measure network latency and pick the lowest), geolocation routing (users in India always go to ap-south-1), and failover routing (primary in us-east-1, backup in us-west-2, switch automatically if primary fails its health check).
If you've ever changed a website's DNS records — maybe moving to a new hosting provider or switching to Cloudflare — you've probably seen the warning: "DNS propagation may take up to 24-48 hours." That sounds mysterious, like DNS changes are slowly rippling across the planet. But the truth is simpler (and more frustrating).
There's no magic "propagation" mechanism. DNS changes take effect on your authoritative server instantly. The delay is entirely because of caching. Every resolver in the world has a copy of your old record, and they won't ask for a fresh copy until the TTL expires. If your old TTL was 86400 seconds (24 hours), then the worst case is that some resolver cached it right before you made the change, and won't check again for a full 24 hours.
The pro move: lower your TTL a day in advance. Change it from 86400 to 60 seconds — but don't change the actual IP yet. Wait 24 hours for the old 86400-second TTL to expire everywhere. Now every resolver has your record cached with a 60-second TTL. When you finally change the IP, the whole world sees the update within 60 seconds. After the migration is complete, raise the TTL back up to reduce ongoing query load on your nameservers.
$ dig google.com +trace
# Shows TTL at every level:
# root → .com referral: 172800 (48 hours)
# .com → google.com NS: 172800 (48 hours)
# google.com A record: 300 (5 minutes)
Variations — Public DNS, Encrypted DNS, Split-Horizon, and Load Balancing
The basic DNS resolution we covered in Section 6 is the standard flow. But the real world has variations — different ways DNS is configured, secured, or used depending on the scenario. Let's look at the four most important ones.
Public vs Private DNS — Why Use 1.1.1.1 Instead of Your ISP?
By default, your computer uses your ISP's DNS resolver. When you connect to your home Wi-Fi, your router hands your device the ISP's resolver IP via DHCPDynamic Host Configuration Protocol — the system that automatically assigns your device an IP address, subnet mask, default gateway, and DNS server address when you connect to a network. When you join your home Wi-Fi, DHCP is what gives your phone its local IP (like 192.168.1.5) and tells it to use your ISP's DNS resolver.. Most people never change this. But there are good reasons to switch to a public resolver like 1.1.1.1 (Cloudflare), 8.8.8.8 (Google), or 9.9.9.9 (Quad9).
The catch: some ISPs force you back to their DNS even if you change your settings. They intercept all traffic on port 53 (the standard DNS port) and redirect it to their own resolver. The solution? Encrypted DNS — which brings us to the next variation.
DNS-over-HTTPS (DoH) / DNS-over-TLS (DoT) — Encrypted DNS
Traditional DNS uses plaintext UDP on port 53. That means every domain you visit is visible to your ISP, your network admin, and anyone sniffing your Wi-Fi. If you switch to DNS-over-HTTPS (DoH), your queries are encrypted inside HTTPS traffic on port 443 — the same port as regular web browsing. Your ISP cannot distinguish a DNS query from a Netflix stream. What are the downsides? Think about who loses visibility when DNS goes encrypted.
Consider: enterprise network admins, parental controls, malware filtering, and government censorship systems.
Traditional DNS has a huge privacy problem: it's plaintext UDP. Every DNS query you make — every domain you visit — is sent unencrypted over the network. Your ISP can see it. Anyone monitoring the network can see it. It's like shouting the name of every website you visit in a crowded room.
DNS-over-HTTPS (DoH) wraps DNS queries inside regular HTTPS requests on port 443. To your ISP, it looks like normal web browsing — they can't distinguish your DNS queries from any other HTTPS traffic. Firefox uses DoH by default with Cloudflare as the resolver.
DNS-over-TLS (DoT) wraps DNS in TLS encryption on a dedicated port (853). It's slightly less stealthy — an ISP can see you're using port 853 and know it's encrypted DNS — but it's cleaner from a protocol perspective. Android 9+ supports DoT natively (look for "Private DNS" in your phone's network settings — set it to dns.google or 1dot1dot1dot1.cloudflare-dns.com).
Split-Horizon DNS — Different Answers for Different Networks
Imagine you work at a company. When you're in the office and open api.company.com, it should route to an internal server at 10.0.1.5 (fast, direct, inside the firewall). When you're at home and open the same URL, it should route to the public IP at 203.0.113.50 (goes through the load balancer and firewall).
Same domain name, two completely different answers — depending on which network you're on. That's split-horizon DNS (also called "split-brain DNS" or "DNS views").
Almost every large company uses this. The DNS server maintains two "views" — one for requests coming from internal IP ranges (like 10.0.0.0/8), and one for everyone else. It's how employees get fast, direct access to internal services while external users go through the proper security layers.
DNS Load Balancing — Spreading Traffic Across Servers
DNS can do basic load balancingLoad balancing means distributing incoming requests across multiple servers so no single server gets overwhelmed. DNS load balancing does this by returning different IP addresses for the same domain. The simplest form is round-robin: return a list of IPs and rotate the order. More sophisticated forms use health checks and geographic awareness. by returning multiple IP addresses for the same domain and rotating which one appears first. When a client gets multiple IPs, it typically connects to the first one in the list. By changing the order, DNS spreads traffic across servers.
There are several flavors:
- Round-robin: Return all IPs, rotate the order. Simple but no health checking — if a server dies, DNS still sends traffic to it until someone manually removes the record.
- Weighted: Return IPs with different probabilities. Send 70% of traffic to the beefy server, 30% to the smaller one.
- Health-checked: Only return IPs for servers that are currently healthy. If a server fails its health check, its IP is automatically removed from DNS responses.
AWS Route 53 offers all three plus latency-based routing (pick the server with lowest measured network latency to the user), geolocation routing (users in India always go to ap-south-1), and failover routing (primary/backup with automatic switchover). Route 53 checks server health every 10-30 seconds and updates DNS answers accordingly.
At Scale — Real Stories from the Biggest DNS Operations
DNS might seem like a simple lookup service, but at scale, it becomes one of the most demanding pieces of infrastructure on the internet. Let's look at the real numbers behind the organizations that run DNS for billions of users — and the attack that nearly broke it all.
Cloudflare 1.1.1.1 — The World's Fastest DNS Resolver
Cloudflare launched their public DNS resolver on April 1, 2018 — and people genuinely thought it was an April Fools' joke. A free, privacy-focused DNS resolver at the most memorable IP address on the internet? Surely too good to be true.
But it was real, and the numbers are staggering:
- Average response time: ~11ms globally (compared to ~34ms for most ISP resolvers)
- Daily queries: over 1 trillion DNS queries handled per day
- Infrastructure: 300+ data centers across 100+ countries, all via anycast
- Privacy: all query logs are purged after 24 hours. KPMG audits this annually.
The IP address 1.1.1.1 itself has an interesting backstory. It was previously owned by APNICThe Asia Pacific Network Information Centre — one of the five Regional Internet Registries (RIRs) that manage IP address allocation worldwide. APNIC handles Asia-Pacific. The others are ARIN (North America), RIPE NCC (Europe/Middle East), AFRINIC (Africa), and LACNIC (Latin America). APNIC owned 1.1.1.0/24 and used it for research, but the memorable address attracted so much junk traffic that it was barely usable — until Cloudflare partnered with them. (the Asia-Pacific IP registry), who used the address range for network research. But because 1.1.1.1 is so memorable, it was constantly flooded with junk traffic from misconfigured devices. Cloudflare struck a deal with APNIC: Cloudflare would absorb the junk traffic and run a public resolver, and APNIC would get anonymized research data about DNS traffic patterns.
$ dig @1.1.1.1 google.com +stats
;; Query time: 8 msec # Often under 10ms
$ dig @8.8.8.8 google.com +stats
;; Query time: 14 msec # Google's is fast too
$ dig @your-isp-dns google.com +stats
;; Query time: 34 msec # ISP resolvers are typically slower
AWS Route 53 — DNS as a Service at Cloud Scale
AWS named their DNS service Route 53 after the port DNS runs on — TCP/UDP port 53. It's the most popular managed DNS service in the world, and it shows how DNS becomes a product at cloud scale.
Route 53 manages DNS for millions of domains. When you create a "hosted zone" (Route 53's term for a domain's DNS records), AWS automatically assigns 4 name server clusters from different TLDs to maximize resilience — for example, you might get ns-123.awsdns-15.com, ns-456.awsdns-57.net, ns-789.awsdns-34.org, and ns-1234.awsdns-26.co.uk. If the entire .com TLD infrastructure went down, your domain would still be reachable through the .net, .org, and .co.uk name servers.
The pricing tells you something about DNS economics: $0.50/month per hosted zone plus $0.40 per million queries. For a site getting 1 million DNS queries per month, that's less than $1/month. DNS is cheap to run at scale because caching does most of the work.
Route 53's killer feature is health-checked routing. It pings your servers every 10-30 seconds. If a server fails its health check, Route 53 stops returning its IP in DNS responses — automatically. No human intervention needed. Combined with failover routing (primary in us-east-1, standby in us-west-2), this gives you automatic disaster recovery through DNS alone.
The Dyn Attack (2016) — 1.2 Tbps That Broke Half the Internet
We opened this page with the Dyn attack in Section 2, but now you understand DNS well enough to appreciate the full technical story.
In September 2016, a user named "Anna-senpai" released the source code for the Mirai botnet on the HackForums website. Mirai was malware that scanned the internet for IoT devices — security cameras, DVRs, baby monitors, home routers — and tried to log in using a list of 62 default username/password combinations like admin/admin, root/root, and default/default. Shockingly, it found over 100,000 devices that still had factory credentials.
On October 21, someone pointed this botnet at Dyn, a DNS provider based in Manchester, New Hampshire. The attack came in three waves:
- 7:00 AM ET: First wave hits. US East Coast loses DNS. Twitter, Reddit, GitHub go dark.
- 11:50 AM ET: Second wave. Wider geographic impact. Netflix, Spotify, PayPal affected globally.
- 4:00 PM ET: Third wave. More botnet devices join. Peak traffic: 1.2 Tbps.
The peak attack volume — 1.2 terabits per second — was enough to download about 150 DVDs every single second. All of it aimed at Dyn's DNS infrastructure. With Dyn overwhelmed, the hundreds of websites relying on Dyn for authoritative DNS became unreachable. Not because their servers were down — because nobody could find them.
The aftermath reshaped the DNS industry. Dyn was acquired by Oracle shortly after. Twitter, Netflix, and other affected companies added secondary DNS providers — if one provider goes down, the other still answers queries. The attack also led to increased regulation around IoT device security (default passwords were the root cause).
Three college students — Paras Jha (21), Josiah White (20), and Dalton Norman (21) — later pled guilty to creating Mirai. The irony: they originally built it to take down competing Minecraft servers. The DDoS tool they created for a gaming feud ended up breaking half the internet.
The .com TLD — Verisign's $1.26 Billion/Year Operation
.com is the most valuable piece of internet real estate, and one company runs all of it: Verisign. Every single .com domain lookup eventually reaches Verisign's TLD servers. The scale is mind-boggling:
- Daily queries: 170+ billion DNS queries per day to Verisign's infrastructure
- Zone file size: The
.comzone file (the database of all .com nameserver records) is over 14 GB - Domains: 160+ million
.comdomain registrations worldwide - Revenue: Verisign charges $7.85 per
.comregistration per year. With 160M+ domains, that's roughly $1.26 billion/year just from .com registry fees - Uptime: Verisign has maintained 100% operational accuracy and stability in the .com DNS since it took over operations
Verisign operates under a contract with ICANNThe Internet Corporation for Assigned Names and Numbers — the nonprofit organization that coordinates the global DNS system. ICANN manages the root zone, accredits domain registrars, oversees the creation of new TLDs, and negotiates contracts with registry operators like Verisign. ICANN is based in Los Angeles and operates under a multistakeholder governance model — governments, businesses, civil society, and technical experts all participate in decision-making. (the Internet Corporation for Assigned Names and Numbers), which gives them the exclusive right to operate the .com registry. This contract has been renewed repeatedly since 2000 and allows periodic price increases — the $7.85 per domain is up from the original fee. Verisign also operates the .net TLD and runs two of the 13 root server clusters (A and J).
The .com zone file is updated twice daily. Every time someone registers a new .com domain or changes their nameservers, the change gets batched into the next zone file update. This is why new domain registrations can take up to a few hours to become globally resolvable — the TLD zone file hasn't been regenerated yet.
The Anti-Lesson — Common DNS Mistakes in System Design
DNS is so useful that it's tempting to lean on it for things it wasn't designed to do. Here are three common mistakes that come up in system design interviews and real production systems — along with why they sound reasonable but fall apart under pressure.
This sounds logical at first: DNS can return multiple IPs and rotate them. Why pay for an Nginx or HAProxy cluster when DNS does it for free? The answer is that DNS load balancing is blind, slow, and unforgiving.
The core problem: DNS has no health checks (unless you use a managed service like Route 53). If one of your three servers crashes, DNS keeps returning its IP. Every client that gets that IP hits a dead server and sees an error. Even with Route 53's health checks, clients cache the old answer for the TTL duration. A real load balancer (Nginx, HAProxy, AWS ALB) detects failures in seconds and reroutes instantly — no waiting for caches to expire.
When DNS load balancing IS appropriate: as the first layer in a multi-tier load balancing strategy. Use DNS to route users to the nearest region (GeoDNS), then use a proper load balancer within each region to distribute traffic across servers. DNS for coarse-grained geographic routing, load balancers for fine-grained server-level distribution.
In a microservices architecture, services need to find each other. The "order service" needs to know the IP of the "payment service." DNS seems like a natural fit — after all, it's a name-to-IP lookup system. But DNS was designed for names that change rarely (domains), not names that change constantly (microservices scaling up and down).
The fundamental problem is TTL. When your auto-scaler spins up 5 new instances of the payment service, DNS records need to update. But resolvers have cached the old list. With even a modest TTL of 30 seconds, there's a 30-second window where requests might go to the old list of servers. In a fast-scaling environment where instances come and go every few seconds, DNS simply can't keep up.
Kubernetes does use DNS — its internal DNS server, CoreDNSCoreDNS is the default DNS server inside Kubernetes clusters. It resolves service names like my-service.default.svc.cluster.local to the cluster IP of the service. Kubernetes uses very low TTLs (5-30 seconds) and combines DNS with a separate service discovery mechanism (endpoints and iptables rules) that updates in near-real-time. So even if DNS is slightly stale, the actual routing is current. CoreDNS replaced kube-dns as the default in Kubernetes 1.13., resolves service names like payment-service.default.svc.cluster.local. But Kubernetes keeps TTLs extremely low (5 seconds) and supplements DNS with a separate real-time service discovery mechanism (Endpoints and kube-proxy) that updates routing rules immediately when pods come and go. DNS is just the friendly name layer — the actual traffic routing doesn't depend on DNS TTLs.
For production microservice discovery, most teams use purpose-built tools like HashiCorp Consul, etcd (the key-value store behind Kubernetes), or a service mesh (Istio, Linkerd) that maintains a real-time service registry with health checking and instant updates — not TTL-bound DNS.
If caching causes stale results, why not set TTL to 0 and eliminate caching entirely? Every lookup goes straight to the authoritative server, and changes propagate instantly. Problem solved, right?
In theory, yes. In practice, many resolvers ignore TTL=0. Some cache for a minimum of 30 seconds regardless of what the TTL says. Some cache for up to 300 seconds. The behavior is inconsistent and unpredictable across different ISPs and resolver implementations. You can't rely on TTL=0 actually meaning "don't cache."
Even when resolvers do honor TTL=0, the second problem kicks in: query load. Without caching, every single DNS lookup for your domain hits your authoritative nameserver directly. For a popular site, that's millions of uncached queries per day. Your nameserver becomes a bottleneck and a target. The practical minimum TTL for most scenarios is 30-60 seconds — short enough for reasonably fast updates, long enough for caching to handle the load.
You're designing a global e-commerce platform. You need users in India, the US, and Europe to reach the nearest data center with minimal latency. If a data center goes down, traffic should automatically reroute within 60 seconds. Which combination of DNS features (from everything we've covered) would you use? And where would DNS end and a traditional load balancer begin?
Common Mistakes — What People Get Wrong About DNS
DNS is one of those topics where "common knowledge" is often flat-out wrong. These six myths show up in blog posts, Stack Overflow answers, and even job interviews. Each one sounds plausible — until you actually run the commands and see what's really happening. Let's bust them one by one, with real commands you can run to prove it to yourself.
This is the most widespread DNS myth on the internet, and it's almost right — which makes it dangerous. The truth is simpler: propagation takes as long as the old TTL that was in place before you made the change.
If your old record had a TTL of 86,400 seconds (24 hours), then yes — resolvers around the world cached it for up to 24 hours, and they won't check again until that cache expires. That's where the "24-48 hours" myth comes from. But if your TTL was 300 seconds (5 minutes), propagation takes about 5 minutes. The TTL is the timer, not some magical internet delay.
# See the TTL at each level of the chain
$ dig google.com +trace
# Check what TTL a specific resolver has cached
$ dig @8.8.8.8 google.com | grep -A1 "ANSWER"
google.com. 217 IN A 142.250.195.68
# ^^^ TTL counting down — 217 seconds left
# Wait 30 seconds, query again — TTL drops:
google.com. 187 IN A 142.250.195.68
# ^^^ now 187 — 30 seconds less
They look similar — both point a domain name somewhere. But they work fundamentally differently. An A record maps a name directly to an IP address. A CNAME maps a name to another name (an alias). The critical rule: you cannot have a CNAME at the zone apex (the bare domain like google.com). Why? Because the apex already has SOA and NS records, and RFC 1034 says CNAME cannot coexist with any other record type.
# CNAME — www is an alias pointing to another name
$ dig www.github.com CNAME +short
github.com.
# A record — the actual IP address
$ dig github.com A +short
140.82.121.4
# This is why www.github.com works as an alias
# but github.com needs a direct A record
The Dyn attack of October 2016 proved this wrong for the entire internet. Dyn was a single DNS provider, and when the Mirai botnet hit it with 1.2 Tbps of traffic, every company that relied solely on Dyn went offline — Twitter, GitHub, Reddit, Netflix, Spotify, all at once.
The fix is straightforward: use two DNS providers. Configure both Route 53 and Cloudflare (for example) to serve the same zone with identical records. Your domain's NS records point to nameservers from both providers. If one provider goes down, the other handles 100% of queries automatically — no manual failover needed.
ns1.awsdns.com and ns1.cloudflare.com). Resolvers try any available nameserver. If Route 53 is down, they reach Cloudflare instead. The catch: you need to keep records in sync across both providers. Tools like octodns or dnscontrol automate this.
Most DNS queries do use UDP — it's faster because there's no handshake overhead, and most responses fit in a single 512-byte packet. But DNS uses TCP in several important cases:
- Zone transfers (AXFR/IXFR) — when a secondary nameserver copies the full zone from a primary, it always uses TCP
- Large responses — DNSSEC signatures often push responses past 512 bytes, triggering TCP fallback
- DNS-over-TLS (DoT) — port 853, always TCP
- DNS-over-HTTPS (DoH) — port 443, always TCP (it's HTTPS)
# Force a TCP query (works fine)
$ dig google.com +tcp
;; ->>HEADER<<- opcode: QUERY, status: NOERROR
;; flags: qr rd ra; QUERY: 1, ANSWER: 1
# Check DNSSEC — response is larger than 512 bytes
$ dig google.com +dnssec
;; MSG SIZE rcvd: 676 # Exceeds 512 → would use TCP
When you query a domain that doesn't exist, the resolver doesn't just shrug and forget about it. It caches that "doesn't exist" answer too — this is called negative caching (defined in RFC 2308). The NXDOMAINNXDOMAIN stands for "Non-Existent Domain" — it's the DNS response code that means "this domain name does not exist anywhere in DNS." You can see it by querying a domain that doesn't exist: dig thisdomaindoesnotexist12345.com — the response will show "status: NXDOMAIN" response gets cached with a TTL taken from the SOA record's minimum field.
This bites people in a specific scenario: you're setting up DNS for a brand new domain. Someone (or a monitoring bot) queries the domain before you've added any records. The resolver caches the NXDOMAIN response. Now even after you add your records, that resolver still thinks the domain doesn't exist — until the negative cache TTL expires.
# Query a domain that doesn't exist
$ dig thisdomaindoesnotexist.com
;; ->>HEADER<<- status: NXDOMAIN # Doesn't exist
# The SOA record in the AUTHORITY section shows the negative TTL:
;; AUTHORITY SECTION:
com. 900 IN SOA a.gtld-servers.net. ...
# ^^^ negative responses cached for 900 seconds (15 min)
Running ipconfig /flushdns on Windows or sudo dscacheutil -flushcache on Mac clears your machine's DNS cache — and only your machine's. Your ISP's resolver still has the old record cached. Google's 8.8.8.8 still has the old record cached. Cloudflare's 1.1.1.1 still has the old record cached. Every resolver in the world that has queried your domain still has its own cached copy.
Flushing your local cache makes your next query go to your ISP's resolver fresh — but if that resolver still has the old answer cached, you'll get the same stale result right back. The only thing that truly clears all caches is time — waiting for the TTL to expire everywhere.
Interview Playbook — "What Happens When You Type google.com?"
This is the single most popular system design warm-up question in interviews. It sounds simple, but a great answer touches DNS, TCP, TLS, HTTP, rendering — and the depth you go into tells the interviewer exactly what level you're at. DNS is the first step and often where candidates either shine or stumble. Here's the full answer structure:
The DNS part of the answer should cover: checking the browser cache, then the OS cache, then the resolver, then the recursive lookup through root → TLD → authoritative. The level of detail you add depends on the role you're interviewing for:
What they expect you to know:
- DNS translates domain names to IP addresses
- The resolution chain: browser cache → OS cache → resolver → root → TLD → authoritative
- What TTL is and why caching matters
- Common record types: A (IPv4), AAAA (IPv6), CNAME (alias), MX (mail)
Sample answer: "When I type google.com, the browser first checks its own DNS cache. If it's not there, it asks the operating system, which checks its cache. If still not found, the OS sends a query to a recursive resolver — usually the ISP's or a public one like 8.8.8.8. That resolver walks the DNS hierarchy: root server, then .com TLD server, then Google's authoritative nameserver, which returns the actual IP address. The result gets cached at every level with a TTL so the next lookup is instant."
What they expect you to know:
- Everything from Junior, plus:
- Recursive vs iterative resolution — the resolver does the work, not the client
- Caching math: if TTL is 300s and the record was cached 100s ago, you have 200s left
- GeoDNS — how companies like Netflix return different IPs based on user location
- The difference between authoritative nameservers (hold the truth) and recursive resolvers (find the truth)
- Why DNS uses UDP (speed) and when it falls back to TCP (large responses, zone transfers)
Sample answer (add to Junior): "The resolver uses iterative queries — it asks each server 'do you know this?' and follows referrals. Caching happens at every level with TTLs. Google's A record has TTL around 300 seconds, so resolvers re-query every 5 minutes. For a global service, they likely use GeoDNS — the authoritative server checks the resolver's IP, determines geographic location, and returns the IP of the nearest data center. This is how google.com resolves to different IPs in India vs the US."
What they expect you to know:
- Everything from Mid, plus:
- Anycast routing — root servers and TLD servers use anycast so the same IP is served from hundreds of locations worldwide. Your packets reach the nearest instance via BGP routing.
- DNSSEC chain of trust — how the root zone signs TLD keys, TLDs sign domain keys, and resolvers validate the entire chain to prevent spoofing
- DNS-over-HTTPS (DoH) — privacy implications, ISP visibility, enterprise challenges (you can't inspect DoH traffic for security monitoring)
- CDN traffic management — how CDNs like Cloudflare use DNS to steer users to optimal edge nodes based on latency, capacity, and health
- Split-horizon DNS — returning different records for internal vs external queries (e.g.,
api.company.comresolves to a private IP inside the VPN, public IP outside) - DDoS resilience — multi-provider DNS, anycast, over-provisioning, rate limiting at the resolver level
Sample answer (add to Mid): "At senior scale, I'd design DNS infrastructure with redundancy in mind. Dual DNS providers — say Route 53 and Cloudflare — both serving the same zone via anycast. DNSSEC for integrity, though it adds latency from signature validation. For internal services, split-horizon DNS so the same hostname resolves differently inside and outside the network. I'd also consider DNS-over-HTTPS implications: it improves user privacy but breaks corporate security monitoring and makes it harder to enforce content policies. For DDoS protection, anycast absorbs volumetric attacks by distributing them across dozens of PoPs."
Hands-On Challenges — Real Commands, Real DNS
DNS is one of the few networking topics where you can run real commands and see real infrastructure respond. These five exercises use actual domains and actual DNS servers — nothing simulated. Open a terminal and try each one. You'll learn more from 10 minutes of dig than from 10 hours of reading.
Run dig +trace google.com and identify each server in the chain. Which root server responded? Which TLD server? What's the TTL at each level? Write down the full path from root to final answer.
Time target: 5 minutes
The output has 4 sections. First: the root zone (.) with NS records listing root servers like a.root-servers.net. Second: the .com TLD servers. Third: Google's authoritative nameservers (like ns1.google.com). Fourth: the final A record with the IP.
# Step 1: Root zone → lists all 13 root server clusters
. 518400 IN NS a.root-servers.net. # Verisign
. 518400 IN NS b.root-servers.net. # USC-ISI
# ... (13 total, TTL = 518400 = 6 days)
# Step 2: Root refers us to .com TLD servers
com. 172800 IN NS a.gtld-servers.net. # Verisign
com. 172800 IN NS b.gtld-servers.net.
# ... (TTL = 172800 = 2 days)
# Step 3: .com TLD refers us to Google's nameservers
google.com. 172800 IN NS ns1.google.com.
google.com. 172800 IN NS ns2.google.com.
# Step 4: Google's nameserver returns the A record
google.com. 300 IN A 142.250.195.68
# TTL = 300 seconds (5 minutes)
Key insight: TTLs get shorter as you go down the chain. Root NS records: 6 days. TLD referrals: 2 days. Final A record: 5 minutes. This makes sense — root and TLD servers rarely change, but website IPs change much more often.
Query the same domain against three different resolvers and compare response times. Run dig @8.8.8.8 example.com +stats, then dig @1.1.1.1 example.com +stats, then dig @9.9.9.9 example.com +stats. Which is fastest from your location? Run each query twice — why is the second query faster?
Time target: 5 minutes
Look at the ;; Query time: line at the bottom of each response. The first query for each resolver may need to do a full recursive lookup. The second query hits the resolver's cache — hence the dramatic speed improvement.
Typical results (your numbers will vary by location):
- 1.1.1.1 (Cloudflare) — First: ~12ms, Second: ~2ms. Generally fastest due to extensive anycast network.
- 8.8.8.8 (Google) — First: ~18ms, Second: ~4ms. Slightly slower but very consistent globally.
- 9.9.9.9 (Quad9) — First: ~25ms, Second: ~5ms. Adds malware domain filtering, which adds slight overhead.
The second query is faster because the resolver already has the answer in its cache. This is the entire point of DNS caching — the recursive lookup only happens once per TTL period.
For gmail.com, find all of these records and explain what each one tells you: all MX records (mail servers), the SPF TXT record, the DMARC policy, and all NS records. Use dig gmail.com MX, dig gmail.com TXT, and dig gmail.com NS.
Time target: 10 minutes
MX records have a priority number — lower number = higher priority. SPF records start with v=spf1 and list which servers are allowed to send email for that domain. DMARC is a TXT record on _dmarc.gmail.com — try dig _dmarc.gmail.com TXT.
# MX records — where email gets delivered
$ dig gmail.com MX +short
5 gmail-smtp-in.l.google.com. # Primary (priority 5)
10 alt1.gmail-smtp-in.l.google.com. # Backup 1
20 alt2.gmail-smtp-in.l.google.com. # Backup 2
30 alt3.gmail-smtp-in.l.google.com. # Backup 3
40 alt4.gmail-smtp-in.l.google.com. # Backup 4
# SPF record — who's allowed to send email as @gmail.com
$ dig gmail.com TXT +short
"v=spf1 redirect=_spf.google.com"
# Redirects to Google's SPF config
# DMARC policy — what to do with spoofed email
$ dig _dmarc.gmail.com TXT +short
"v=DMARC1; p=none; sp=quarantine; rp=quarantine"
# p=none: don't reject spoofed gmail.com emails (monitor only)
# NS records — Google's authoritative nameservers
$ dig gmail.com NS +short
ns1.google.com.
ns2.google.com.
ns3.google.com.
ns4.google.com.
What this tells you: Gmail has 5 mail servers with cascading priorities for redundancy. Their SPF record delegates to Google's central SPF configuration. The DMARC policy is set to "none" for the main domain (monitoring mode) but "quarantine" for subdomains — meaning spoofed emails from subdomains get flagged.
Query dig google.com twice, 10 seconds apart. Watch the TTL count down. Calculate when it will expire. Then query dig facebook.com — why is Facebook's TTL so much shorter? What does this tell you about their infrastructure strategy?
Time target: 5 minutes
Google's TTL is around 300 seconds, Facebook's is around 60 seconds. A lower TTL means the DNS record gets refreshed more often, which allows faster failover between data centers. Facebook prioritizes rapid traffic shifting; Google prioritizes reducing DNS query volume.
- Google — TTL ≈ 300s. They prioritize caching efficiency. With billions of daily users, lower TTLs would generate massive query load on their nameservers. 300s is enough for reasonable failover.
- Facebook — TTL ≈ 60s. They prioritize fast traffic steering. Facebook shifts traffic between data centers frequently (capacity management, deployments, incident response). A 60s TTL means they can redirect all traffic within a minute.
- Cloudflare — TTL ≈ 300s for
cloudflare.comitself, but many customer domains use much lower TTLs because Cloudflare's proxy handles the actual routing.
The tradeoff: Low TTL = fast changes but more DNS traffic. High TTL = fewer queries but slower propagation. Every company picks their sweet spot based on how often they need to shift traffic.
Write a Python script that sends a raw DNS query (UDP packet to 8.8.8.8 on port 53) and parses the binary response to extract the IP address. No libraries except socket and struct. This forces you to understand the actual DNS wire protocol.
Time target: 30-60 minutes
A DNS query packet has a 12-byte header (ID, flags, question count), followed by the question section (domain name encoded as length-prefixed labels, type, class). The response has the same header plus an answer section with the IP in 4 bytes. See RFC 1035 section 4 for the exact format.
import socket, struct
def build_query(domain):
# Header: ID=0xABCD, flags=0x0100 (standard query, recursion desired)
# QDCOUNT=1, ANCOUNT=0, NSCOUNT=0, ARCOUNT=0
header = struct.pack('!HHHHHH', 0xABCD, 0x0100, 1, 0, 0, 0)
# Question: encode domain as DNS labels
question = b''
for part in domain.split('.'):
question += bytes([len(part)]) + part.encode()
question += b'\x00' # null terminator
question += struct.pack('!HH', 1, 1) # Type=A (1), Class=IN (1)
return header + question
def parse_response(data):
# Skip header (12 bytes) and question section
offset = 12
# Skip question name (length-prefixed labels until 0x00)
while data[offset] != 0:
offset += data[offset] + 1
offset += 5 # null byte + type (2) + class (2)
# Parse answer: skip name (may be compressed pointer 0xC0xx)
if data[offset] & 0xC0 == 0xC0:
offset += 2 # compressed pointer = 2 bytes
# Type (2) + Class (2) + TTL (4) + RDLENGTH (2)
rtype, rclass, ttl, rdlen = struct.unpack('!HHIH', data[offset:offset+10])
offset += 10
if rtype == 1 and rdlen == 4: # A record, 4-byte IPv4
ip = '.'.join(str(b) for b in data[offset:offset+4])
return ip, ttl
return None, 0
# Send query to Google's public resolver
domain = 'google.com'
sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
sock.settimeout(5)
sock.sendto(build_query(domain), ('8.8.8.8', 53))
response, _ = sock.recvfrom(512)
sock.close()
ip, ttl = parse_response(response)
print(f'{domain} → {ip} (TTL: {ttl}s)')
What you learn from this: DNS is just a binary protocol over UDP. The "magic" of DNS resolution is really just structured bytes — a 12-byte header, length-prefixed domain labels, and a 4-byte IP address in the answer. Every dig command you've run in this page does exactly what this script does, just with better error handling and output formatting.
Quick Reference — DNS Cheat Cards
Bookmark this section. These six cards have everything you need when you're debugging DNS, preparing for an interview, or designing infrastructure. All real data — no filler.
Browser cache → ~0ms OS cache → ~0ms Resolver cache → ~1-5ms Root server → 13 clusters (a-m) TLD server → .com, .net, .org... Authoritative → ns1.google.com Answer: 142.250.195.68
A → name → IPv4 address AAAA → name → IPv6 address CNAME → name → another name MX → domain → mail server NS → domain → nameserver TXT → domain → text (SPF, etc) SOA → zone → authority info PTR → IP → name (reverse)
High (3600s+): fewer queries, slow failover, stable sites Med (300s): Google's choice, good balance, 5-min updates Low (60s): Facebook's choice, fast failover, more queries Pre-change: lower TTL first, wait old TTL, then change Never: TTL=0 (resolvers ignore)
dig domain.com # A record dig domain.com MX # mail servers dig +trace domain.com # full chain dig +short domain.com # just the IP dig @8.8.8.8 domain.com # use resolver nslookup domain.com # simpler tool host domain.com # simplest tool whois domain.com # registration
1.1.1.1 Cloudflare ~11ms avg Privacy-focused, fastest 8.8.8.8 Google ~14ms avg Most popular, very reliable 9.9.9.9 Quad9 ~20ms avg Blocks malware domains 208.67.222.222 OpenDNS Cisco-owned, content filtering 76.76.2.0 Control D Customizable filtering
A Verisign J Verisign B USC-ISI K RIPE NCC C Cogent L ICANN D U of Maryland M WIDE Project E NASA F ISC G US DoD H US Army I Netnod 1,700+ instances via anycast
Connected Topics — Where DNS Leads Next
DNS doesn't exist in isolation — it's the first step in almost every network interaction. Understanding DNS deeply unlocks these related topics, and each one builds on concepts you've learned here.