Domain name system
From Wikipedia, the free encyclopedia
The domain name system (DNS) stores and associates many types of information with domain names, but most importantly, it translates domain names (computer hostnames) to IP addresses. It also lists mail exchange servers accepting e-mail for each domain. In providing a worldwide keyword-based redirection service, DNS is an essential component of contemporary Internet use.
Useful for several reasons, the DNS pre-eminently makes it possible to attach easy-to-remember domain names (such as "wikipedia.org") to hard-to-remember IP addresses (such as 22.214.171.124). People take advantage of this when they recite URLs and e-mail addresses. In a subsidiary function, the domain name system makes it possible for people to assign authoritative names without needing to communicate with a central registrar each time.
History of the DNS
The practice of using a name as a more human-legible abstraction of a machine's numerical address on the network predates even TCP/IP, and goes all the way back to the ARPAnet era. Originally, each computer on the network retrieved a file called HOSTS.TXT from SRI (now SRI International) which mapped an address (such as 126.96.36.199) to a name (such as www.example.net.) The Hosts file still exists on most modern operating systems, either by default or through configuration, and allows users to specify an IP address to use for a hostname without checking the DNS. This file now serves primarily for troubleshooting DNS errors or for mapping local addresses to more organic names. (The Hosts file can also help in ad-blocking, and spyware may utilize it to hijack a computer.) But a system based on a HOSTS.TXT file had inherent limitations, because of the obvious requirement that every time a given computer's address changed, every computer that wanted to communicate with it would need an update to its Hosts file.
The growth of networking called for a more scalable system: one that recorded a change in a host's address in one place only. Other hosts would learn about the change dynamically through a notification system, thus completing a globally accessible network of all hosts' names and their associated IP Addresses.
Paul Mockapetris invented the DNS in 1983; the original specifications appear in RFC 882 and 883. In 1987, the publication of RFC 1034 and RFC 1035 updated the DNS specification and made RFC 882 and RFC 883 obsolete. Several more-recent RFCs have proposed various extensions to the core DNS protocols.
Mockapetris wrote the first implementation of DNS. The following year (1984), four Berkeley students — Douglas Terry, Mark Painter, David Riggle and Songnian Zhau — wrote the first Unix implementation. Ralph Campbell maintained Terry et al's work after that. In 1985, Kevin Dunlap of Digital Equipment Corporation significantly re-wrote the DNS implementation and renamed it BIND. Mike Kavels, Phil Almquist and Paul Vixie have maintained BIND since then. A port of BIND to the Windows NT platform took place in the early 1990s. Due to its long history of security issues, a number of alternative nameserver/resolver programs have been written and distributed by others in recent years.
How the DNS works in theory
The domain name space consists of a tree of domain names. Each node or leaf in the tree has one or more resource records, which hold information associated with the domain name. The tree sub-divides into zones. A zone consists of a collection of connected nodes authoritatively served by an authoritative DNS nameserver. (Note that a single nameserver can host several zones.)
When a system administrator wants to let another administrator control a part of the domain name space within his or her zone of authority, he or she can delegate control to the other administrator. This splits a part of the old zone off into a new zone, which comes under the authority of the second administrator's nameservers. The old zone becomes no longer authoritative for what comes under the authority of the new zone.
A resolver looks up the information associated with nodes. A resolver knows how to communicate with name servers by sending DNS requests, and heeding DNS responses. Resolving usually entails recursing through several name servers to find the needed information.
Some resolvers function simplistically and can only communicate with a single name server. These simple resolvers rely on a recursing name server to perform the work of finding information for them.
Understanding the parts of a domain name
A domain name usually consists of two or more parts (technically labels), separated by dots. For example wikipedia.org.
- The rightmost label conveys the top-level domain (for example, the address en.wikipedia.org has the top-level domain org).
- Each label to the left specifies a subdivision or subdomain of the domain above it. Note that "subdomain" expresses relative dependence, not absolute dependence: for example, wikipedia.org comprises a subdomain of the org domain, and en.wikipedia.org comprises a subdomain of the domain wikipedia.org. In theory, this subdivision can go down to 127 levels deep, and each label can contain up to 63 characters, as long as the whole domain name does not exceed a total length of 255 characters. But in practice some domain registries have shorter limits than that.
- A hostname refers to a domain name that has one or more associated IP addresses. For example, the en.wikipedia.org and wikipedia.org domains are both hostnames, but the org domain is not.
The DNS consists of a hierarchical set of DNS servers. Each domain or subdomain has one or more authoritative DNS servers that publish information about that domain and the name servers of any domains "beneath" it. The hierarchy of authoritative DNS servers matches the hierarchy of domains. At the top of the hierarchy stand the root servers: the servers to query when looking up (resolving) a top-level domain name (TLD).
The address resolution mechanism
(This description deliberately uses the fictional .example TLD in accordance with the DNS guidelines themselves.)
In theory a full host name may have several name segments, (e.g ahost.ofasubnet.ofabiggernet.inadomain.example). In practice, in the experience of the majority of public users of Internet services, full host names will frequently consist of just three segments (ahost.inadomain.example, and most often www.inadomain.example).
For querying purposes, software interprets the name segment by segment, from right to left, using an iterative search procedure. At each step along the way, the program queries a corresponding DNS server to provide a pointer to the next server which it should consult.
As originally envisaged, the process was as simple as:
- the local system is pre-configured with the known addresses of the root servers in a file of root hints, which need to be updated periodically by the local administrator from a reliable source to be kept up to date with the changes which occur over time.
- query one of the root servers to find the server authoritative for the next level down (so in the case of our simple hostname, a root server would be asked for the address of a server with detailed knowledge of the example top level domain).
- querying this second server for the address of a DNS server with detailed knowledge of the second-level domain (inadomain.example in our example).
- repeating the previous step to progress down the name, until the final step which would, rather than generating the address of the next DNS server, return the final address sought.
The diagram illustrates this process for the real host www.wikipedia.org.
The mechanism in this simple form has a difficulty: it places a huge operating burden on the collective of root servers, with each and every search for an address starting by querying one of them. Being as critical as they are to the overall function of the system such heavy use would create an insurmountable bottleneck for trillions of queries placed every day. In practice there are two key additions to the mechanism.
- Firstly, the DNS resolution process allows for local recording and subsequent consultation of the results of a query (or caching) for a period of time after a successful answer (the server providing the answer initially dictates the period of validity, which may vary from just seconds to days or even weeks). In our illustration, having found a list of addresses of servers capable of answering queries about the .example domain, the local resolver will not need to make the query again until the validity of the currently known list expires, and so on for all subsequent steps. Hence having successfully resolved the address of ahost.inadomain.example it is not necessary to repeat the process for some time since the address already reached will be deemed reliable for a defined period, and resolution of anotherhost.anotherdomain.example can commence with already knowing which servers can answer queries for the .example domain. Caching significantly reduces the rate at which the most critical name servers have to respond to queries, adding the extra benefit that subsequent resolutions are not delayed by network transit times for the queries and responses.
- Secondly, most domestic and small-business clients "hand off" address resolution to their ISP's DNS servers to perform the look-up process, thus allowing for the greatest benefit from those same ISPs having busy local caches serving a wide variety of queries and a large number of users.
For further discussion in greater detail of these additions to the mechanism see below.
Circular dependencies and glue records
Name servers in delegations appear listed by name, rather than by IP address. This means that a resolving name server must issue another DNS request to find out the IP address of the server to which it has been referred. Since this can introduce a circular dependency if the nameserver referred to is under the domain that it is authoritative of, it is occasionally necessary for the nameserver providing the delegation to also provide the IP address of the next nameserver. This record is called a glue record.
For example, assume that the sub-domain en.wikipedia.org contains further sub-domains (such as something.en.wikipedia.org) and that the authoritative nameserver for these lives at ns1.en.wikipedia.org. A computer trying to resolve something.en.wikipedia.org will thus first have to resolve ns1.en.wikipedia.org. Since ns1 is also under the en.wikipedia.org subdomain, resolving something.en.wikipedia.org requires resolving ns1.en.wikipedia.org which is exactly the circular dependency mentioned above. The dependency is broken by the glue record in the nameserver of wikipedia.org that provides the IP address of ns1.en.wikipedia.org directly to the requestor, enabling it to bootstrap the process by figuring out where ns1.en.wikipedia.org is located.
DNS in practice
When an application (such as a web browser) tries to find the IP address of a domain name, it doesn't necessarily follow all of the steps outlined in the Theory section above. We will first look at the concept of caching, and then outline the operation of DNS in "the real world."
Caching and time to live
Because of the huge volume of requests generated by a system like the DNS, the designers wished to provide a mechanism to reduce the load on individual DNS servers. The mechanism devised provided that when a DNS resolver (i.e. client) received a DNS response, it would cache that response for a given period of time. A value (set by the administrator of the DNS server handing out the response) called the time to live (TTL), defines that period of time. Once a response goes into cache, the resolver will consult its cached (stored) answer; only when the TTL expires (or when an administrator manually flushes the response from the resolver's memory) will the resolver contact the DNS server for the same information.
Generally, the Start of Authority (SOA) record specifies the time to live. The SOA record has the parameters:
- Serial — the zone serial number, incremented when the zone file is modified, so the slave and secondary name servers know when the zone has been changed and should be reloaded.
- Refresh — the number of seconds between update requests from secondary and slave name servers.
- Retry — the number of seconds the secondary or slave will wait before retrying when the last attempt has failed.
- Expire — the number of seconds a master or slave will wait before considering the data stale if it cannot reach the primary name server.
- Minimum — previously used to determine the minimum TTL, this offers negative caching.
(Newer versions of BIND (named) will accept the suffixes 'M','H','D' or 'W', indicating a time-interval of minutes, hours, days and weeks respectively.)
As a noteworthy consequence of this distributed and caching architecture, changes to the DNS do not always take effect immediately and globally. This is best explained with an example: If an administrator has set a TTL of 6 hours for the host www.wikipedia.org, and then changes the IP address to which www.wikipedia.org resolves at 12:01pm, the administrator must consider that a person who cached a response with the old IP Address at 12:00pm will not consult the DNS server again until 6:00pm. The period between 12:01pm and 6:00pm in this example is called caching time, which is best defined as a period of time that begins when you make a change to a DNS record and ends after the maximum amount of time specified by the TTL expires. This essentially leads to an important logistical consideration when making changes to the DNS: not everyone is necessarily seeing the same thing you're seeing. RFC1537 helps to convey basic rules for how to set the TTL.
Note that the term "propagation", although very widely used, does not describe the effects of caching well. Specifically, it implies that  when you make a DNS change, it somehow spreads to all other DNS servers (instead, other DNS servers check in with yours as needed), and  that you do not have control over the amount of time the record is cached (you control the TTL values for all DNS records in your domain, except your NS records and any authoritative DNS servers that use your domain name).
Some resolvers may override TTL values, as the protocol supports caching for up to 68 years or no caching at all. Negative caching (the non-existence of records) is determined by name servers authoritative for a zone which MUST include the SOA record when reporting no data of the requested type exists. The MINIMUM field of the SOA record and the TTL of the SOA itself is used to establish the TTL for the negative answer. RFC2308
Many people incorrectly refer to a mysterious 48 hour or 72 hour propagation time when you make a DNS change. When one changes the NS records for one's domain or the IP addresses for hostnames of authoritative DNS servers using one's domain (if any), there can be a lengthy period of time before all DNS servers use the new information. This is because those records are handled by the zone parent DNS servers (for example, the .com DNS servers if your domain is example.com), which typically cache those records for 48 hours. However, those DNS changes will be immediately available for any DNS servers that do not have them cached. And, any DNS changes on your domain other than the NS records and authoritative DNS server names can be nearly instantaneous, if you choose for them to be (by lowering the TTL once or twice ahead of time, and waiting until the old TTL expires before making the change).
DNS in the real world
Users generally do not communicate directly with a DNS resolver. Instead DNS resolution takes place transparently in client applications such as web browsers (like Internet Explorer, Opera, Mozilla Firefox, Safari, Netscape Navigator, etc), mail clients (Outlook Express, Mozilla Thunderbird, etc), and other Internet applications. When a request is made which necessitates a DNS lookup, such programs send a resolution request to the local DNS resolver in the operating system which in turn handles the communications required.
The DNS resolver will almost invariably have a cache (see above) containing recent lookups. If the cache can provide the answer to the request, the resolver will return the value in the cache to the program that made the request. If the cache does not contain the answer, the resolver will send the request to a designated DNS server or servers. In the case of most home users, the Internet service provider to which the machine connects will usually supply this DNS server: such a user will either configure that server's address manually or allow DHCP to set it; however, where systems administrators have configured systems to use their own DNS servers, their DNS resolvers will generally point to their own nameservers. This name server will then follow the process outlined above in DNS in theory, until it either successfully finds a result, or does not. It then returns its results to the DNS resolver; assuming it has found a result, the resolver duly caches that result for future use, and hands the result back to the software which initiated the request.
An additional level of complexity emerges when resolvers violate the rules of the DNS protocol. Some people have suggested that a number of large ISPs have configured their DNS servers to violate rules (presumably to allow them to run on less-expensive hardware than a fully compliant resolver), such as by disobeying TTLs, or by indicating that a domain name does not exist just because one of its name servers does not respond.
As a final level of complexity, some applications such as Web browsers also have their own DNS cache, in order to reduce use of the DNS resolver library itself. This practice can add extra difficulty to DNS debugging, as it obscures which data is fresh, or lies in which cache. These caches typically have very short caching times of the order of 1 minute. A notable exception is Internet Explorer; recent versions cache DNS records for 30 minutes.
Other DNS applications
The system outlined above provides a somewhat simplified scenario. The DNS includes several other functions:
- Hostnames and IP addresses do not necessarily match on a one-to-one basis. Many hostnames may correspond to a single IP address: combined with virtual hosting, this allows a single machine to serve many web sites. Alternatively a single hostname may correspond to many IP addresses: this can facilitate fault tolerance and load distribution, and also allows a site to move physical location seamlessly.
- There are many uses of DNS besides translating names to IP addresses. For instance, Mail transfer agents use DNS to find out where to deliver e-mail for a particular address. The domain to mail exchanger mapping provided by MX records accommodates another layer of fault tolerance and load distribution on top of the name to IP address mapping.
- Sender Policy Framework and DomainKeys instead of creating own record types were designed to take advantage of another DNS record type, the TXT record.
- To provide resilience in the event of computer failure, multiple DNS servers provide coverage of each domain. In particular, thirteen root servers exist worldwide. DNS programs or operating systems have the IP addresses of these servers built in. At least nominally, the USA hosts all but three of the root servers. However, because many root servers actually implement anycast, where many different computers can share the same IP address to deliver a single service over a large geographic region, most of the physical (rather than nominal) root servers now operate outside the USA.
The DNS uses TCP and UDP on port 53 to serve requests. Almost all DNS queries consist of a single UDP request from the client followed by a single UDP reply from the server. TCP typically comes into play only when the response data size exceeds 512 bytes, or for such tasks as zone transfer. Some operating systems such as HP-UX are known to have resolver implementations that use TCP for all queries, even when UDP would suffice.
Extensions to DNS
EDNS is an extension of the DNS protocol which enhances the transport of DNS data in UDP packages, and adds support for expanding the space of request and response codes. It is described in RFC 2671.
Implementations of DNS
For a commented list of DNS server-side implementations, see Comparison of DNS server software.
- RFC 882 Concepts and Facilities (Deprecated by RFC 1034)
- RFC 883 Domain Names: Implementation specification (Deprecated by RFC 1035)
- RFC 1032 Domain administrators guide
- RFC 1033 Domain administrators operations guide
- RFC 1034 Domain Names - Concepts and Facilities.
- RFC 1035 Domain Names - Implementation and Specification
- RFC 1101 DNS Encodings of Network Names and Other Types
- RFC 1123 Requirements for Internet Hosts -- Application and Support
- RFC 1183 New DNS RR Definitions
- RFC 1706 DNS NSAP Resource Records
- RFC 1876 Location Information in the DNS (LOC)
- RFC 1886 DNS Extensions to support IP version 6
- RFC 1912 Common DNS Operational and Configuration Errors
- RFC 1995 Incremental Zone Transfer in DNS
- RFC 1996 A Mechanism for Prompt Notification of Zone Changes (DNS NOTIFY)
- RFC 2136 Dynamic Updates in the domain name system (DNS UPDATE)
- RFC 2181 Clarifications to the DNS Specification
- RFC 2182 Selection and Operation of Secondary DNS Servers
- RFC 2308 Negative Caching of DNS Queries (DNS NCACHE)
- RFC 2317 Classless IN-ADDR.ARPA delegation
- RFC 2671 Extension Mechanisms for DNS (EDNS0)
- RFC 2672 Non-Terminal DNS Name Redirection (DNAME record)
- RFC 2782 A DNS RR for specifying the location of services (DNS SRV)
- RFC 2845 Secret Key Transaction Authentication for DNS (TSIG)
- RFC 2874 DNS Extensions to Support IPv6 Address Aggregation and Renumbering
- RFC 3403 Dynamic Delegation Discovery System (DDDS) (NAPTR records)
- RFC 3696 Application Techniques for Checking and Transformation of Names
- RFC 4398 Storing Certificates in the Domain Name System
- RFC 4408 Sender Policy Framework (SPF) (SPF records)
Types of DNS records
Important categories of data stored in the DNS include the following:
- An A record or address record maps a hostname to a 32-bit IPv4 address.
- An AAAA record or IPv6 address record maps a hostname to a 128-bit IPv6 address.
- A CNAME record or canonical name record is an alias of one name to another. The A record that the alias is pointing to can be either local or remote - on a foreign name server. Useful when running multiple services from a single IP address, where each service has its own entry in DNS.
- An MX record or mail exchange record maps a domain name to a list of mail exchange servers for that domain.
- A PTR record or pointer record maps an IPv4 address to the canonical name for that host. Setting up a PTR record for a hostname in the in-addr.arpa domain that corresponds to an IP address implements reverse DNS lookup for that address. For example (at the time of writing), www.icann.net has the IP address 188.8.131.52, but a PTR record maps 184.108.40.206.in-addr.arpa to its canonical name, referrals.icann.org.
- An NS record or name server record maps a domain name to a list of DNS servers authoritative for that domain. Delegations depend on NS records.
- An SOA record or start of authority record specifies the DNS server providing authoritative information about an Internet domain, the email of the domain administrator, the domain serial number, and several timers relating to refreshing the zone.
- An SRV record is a generalized service location record.
- A TXT record allows an administrator to insert arbitrary text into a DNS record. For example, this record is used to implement the Sender Policy Framework and DomainKeys specifications.
- NAPTR records ("Naming Authority Pointer") are a newer type of DNS record that support regular expression based rewriting.
Other types of records simply provide information (for example, a LOC record gives the physical location of a host), or experimental data (for example, a WKS record gives a list of servers offering some well known service such as HTTP or POP3 for a domain).
Internationalised domain names
While domain names in the DNS have no restrictions on the characters they use and can include non-ASCII characters, the same is not true for host names. Host names are the names most people see and use for things like e-mail and web browsing. Host names are restricted to a small subset of the ASCII character set that includes the Roman alphabet in upper and lower case, the digits 0 through 9, the dot, and the hyphen. (See RFC3696 section 2 for details.) This prevented the representation of names and words of many languages natively. ICANN has approved the Punycode-based IDNA system, which maps Unicode strings into the valid DNS character set, as a workaround to this issue. Some registries have adopted IDNA.
Security issues in DNS
DNS was not originally designed with security in mind, and thus has a number of security issues. DNS responses are traditionally not cryptographically signed, leading to many attack possibilities; DNSSEC modifies DNS to add support for cryptographically signed responses. There are various extensions to support securing zone transfer information as well.
Some domain names can spoof other, similar-looking domain names. For example, "paypal.com" and "paypa1.com" are different names, yet users may be unable to tell the difference. This problem is much more serious in systems that support internationalized domain names, since many characters that are different (from the point of view of ISO 10646) appear identical on typical computer screens.
Legal users of domains
No one in the world really "owns" a domain name except the Network Information Centre (NIC), or domain name registry. Most of the NICs in the world receive an annual fee from a legal user in order for the legal user to utilize the domain name (i.e. a sort of a leasing agreement exists, subject to the registry's terms and conditions). Depending on the various naming convention of the registries, legal users become commonly known as "registrants" or as "domain holders".
ICANN holds a complete list of domain registries in the world. One can find the legal user of a domain name by looking in the WHOIS database held by most domain registries.
For most of the more than 240 country code top-level domains (ccTLDs), the domain registries hold the authoritative WHOIS (Registrant, name servers, expiry dates etc). For instance, DENIC, Germany NIC holds the authoritative WHOIS to a .DE domain name.
However, some domain registries, such as for .COM, .ORG, .INFO, etc., use a registry-registrar model. There are hundreds of Domain Name Registrars that actually perform the domain name registration with the end-user (see lists at ICANN or VeriSign). By using this method of distribution, the registry only has to manage the relationship with the registrar, and the registrar maintains the relationship with the end-users, or 'registrants'. For .COM, .NET domain names, the domain registries, VeriSign holds a basic WHOIS (registrar and name servers etc). One can find the detailed WHOIS (Registrant, name servers, expiry dates etc) at the registrars.
Since about 2001, most gTLD registries (.ORG, .BIZ, .INFO) have adopted a so-called "thick" registry approach, i.e. keeping the authoritative WHOIS with the various registries instead of the registrars.
A registrant usually designates an administrative contact to manage the domain name. In practice, the administrative contact usually has the most immediate power over a domain. Management functions delegated to the administrative contacts may include (for example):
- the obligation to conform to the requirements of the domain registry in order to retain the right to use a domain name
- authorisation to update the physical address, e-mail address and telephone number etc in WHOIS
A technical contact manages the name servers of a domain name. The many functions of a technical contact include:
- making sure the configurations of the domain name conforms to the requirements of the domain registry
- updating the domain zone
- providing the 24x7 functionality of the name servers (that leads to the accessibility of the domain name)
The party whom a NIC invoices.
Namely the authoritative name servers that host the domain name zone of a domain name.
Many investigators have voiced criticism of the methods currently used to control ownership of domains. Critics commonly claim abuse by monopolies or near-monopolies, such as VeriSign, Inc. Particularly noteworthy was the VeriSign Site Finder system which redirected all unregistered .com and .net domains to a VeriSign webpage. Despite widespread criticism, VeriSign only reluctantly removed it after ICANN threatened to revoke its contract to administer the root name servers.
There is also significant disquiet regarding United States political influence over the Internet Corporation for Assigned Names and Numbers (ICANN). This was a significant issue in the attempt to create a .xxx Top-level domain and sparked greater interest in Alternative DNS roots that would be beyond the control of any single country.
Truth in Domain Names Act
In the United States, the "Truth in Domain Names Act", in combination with the PROTECT Act, forbids the use of a misleading domain name with the intention of attracting people into viewing a visual depiction of sexually explicit conduct on the Internet.
- Domain hack
- Dynamic DNS
- DNS cache poisoning
- Root nameserver
- DNS hosting service
- ^ How Internet Explorer uses the cache for DNS host entries. Microsoft (2004). Retrieved on 2006-03-07.
web-based DNS tools at the Open Directory Project