Much has been written about the IETF Draft “Client subnet in DNS requests” that Google started a year or so back also known as The Global Internet Speedup.  Some of it has been good, some of it bad, some of it very misinformed.  I thought I would take a few minutes and pile on a bit more!

In summary, I am in favor of the work but think the problem it tries to solve is small and specific to the users who adopt services like Google’s and OpenDNS – hence them developing it. More importantly I think the Load Balancing DNS systems used by many CDNs are not accurate enough with the current set of Local DNS IPs and might get worse if they are asked to consider device IPs.

The problem this draft address has been around for over 15 years and contemplated for just as long, predating all involved in the current conversation.

The first time a content owner placed the same content on two different servers, the need for local load balancing arose.  And then, when they put those servers in different geographic locations, the need for global load balancing – or selecting between distributed sites – arose. Load Balancing DNS was one of the early solutions developed to solve the issue of directing devices to these distributed servers, and it’s still in use by many today.

First some background on DNS…

DNS maps human-readable hostnames (www.highwinds.com) to device-readable IP addresses (69.16.184.153) in addition to other things.  All Internet-connected devices (clients), regardless of the operating system in use, are configured – either manually or programmatically using protocols like DHCP – with the IP address of a Local DNS (resolver).  Once configured, this bootstrapping allows the device to send any and all questions pertaining to DNS to their defined Local DNS.  The Local DNS then has the more difficult job of navigating the larger DNS system to find an answer to the device’s question by consulting Authoritative DNS servers.

The Local DNS is typically administered by someone the device owner has a relationship with: their ISP, wireless provider, employer, or other third-party DNS they have chosen to use.  The Authoritative DNS is administered by someone the owner of the name being requested has a relationship with: their own servers, hosting provider, CDN, or someone to whom they have contracted DNS services.

The beauty of DNS is that many devices can use a single Local DNS, which makes caching of answers possible; the name-to-IP address mapping learned on behalf of one device is shared with many other devices that later ask for the mapping of that same name.  Moreover, devices need not worry themselves with navigating the millions of Authoritative DNS servers in the world.

Load Balancing DNS adapted things…

Initially the mappings of hostname-to-IP in DNS were simple and static.  At any moment in time, any device anywhere in the world could ask their Local DNS to map a given name to an IP address, and they would all get the same answer as any other device that asked.  Though nothing precluded or required Authoritative DNS servers from giving out different answers to different Local DNS (and, in turn, to the devices), that need did not exist initially.

Companies looking to solve global load balancing issues latched onto this idea (as well as a few other ideas) and leveraged DNS to direct different devices to different server sites via tailored mappings from Authoritative Load Balancing DNS.  For example, European devices were directed to IP 1 (the European server farm) and U.S. devices to IP 2 (the U.S. server farm).

That was the easy part.  The real development in Load Balancing DNS came in knowing when and why to return the U.S. server’s IP versus the European server’s IP.  The DNS request from the Local DNS just asks for a hostname to be mapped to an IP.  It doesn’t say, “Oh, and by the way, I am European, so if you have European servers, I would prefer them.”  The companies developing Load Balancing DNS technologies were left to figure out how to intelligently map the DNS requests to specific server farms.

While nothing explicitly is sent in the DNS request to indicate where the requestor is geographically located, the IP of the Local DNS making the request is the primary input to computing a dynamic answer.  Providers of Load Balancing DNS solutions must develop ways of mapping Local DNS IPs to the best server location.  These mappings are developed in many, many different ways.

Some simply have each of their server farms ping each Local DNS periodically to determine which server farm is geographically closest to which Local DNS.  A table is maintained on the Load Balancing DNS servers with this mapping.  Other systems use geolocation software to map the Local DNS IP to a specific location (country or latitude/longitude) and then do simple distance calculations or static mapping to pick the best server farm.  Some, using Anycast on their Load Balancing DNS servers, use the location of the Load Balancing DNS that received the request as a proxy to the location of the Local DNS making the request.  Others still harvest BGP feeds from each of their server farm sites and compare the ASN of the Local DNS IP to the BGP tables of their server sites.  And some do a hybrid of many of these methods and others.

The proximity problem…

There is one other problem.  All Load Balancing DNS systems – excluding the work in this Internet-Draft and a few other exceptions – assume that the devices initiating the DNS requests are “near” the Local DNS they are configured to use.  This allows a Load Balancing DNS system that determines their U.S.-based server farm to be the best location for the IP of the Local DNS to know it will also be the best location for the device that actually initiated the request in the first place.  Put another way, if the Local DNS is in San Francisco, the device should be as well; it had better not be in India – or the Draft better get very rapid adoption!

For the most part, this assumption is true.  Devices tend to be near – or geographically close, from a networking perspective – their Local DNS.  Or at least it is true enough of the time that Load Balancing DNS is better than the alternative (say, random selection).  There have always been noted exceptions.  Many devices configured by employers are configured to use the same corporate Local DNS regardless of where in the world the device is actually located.  Some ISPs historically have run very centralized groups of Local DNS for widely geographically dispersed users.  And more recently, in the last five years, third-party Local DNS service providers have grown in popularity (for consumers and businesses) by providing a faster and more reliable DNS service (in addition to other value-added services like phishing prevention, blacklisting, etc).  As users migrate to these services and off of their existing Local DNS, a chain of events begins …

  • More devices are grouped behind fewer Local DNS.
  • Dynamic DNS systems lose resolution.
  • A Local DNS IP represents a wider set of devices.As resolution is lost,
  • Optimal load balancing suffers
  • Some devices are sent to sub-optimal PoPs.
  • As load balancing suffers, device performance suffers.

No one wants any of this to happen.

What the draft proposes…

What the Google Internet-Draft proposes – and what has been implemented by Google and others – is to embed the actual device IP (or a masked portion of it) in the request that is sent from the Local DNS to the Authoritative DNS.  If the Authoritative DNS is a Load Balancing DNS server, it can choose to use the device IP to tailor the response rather than that of the Local DNS IP.  Furthermore, the Authoritative Load Balancing DNS server can indicate the range of device IPs the tailored response can use (for the purpose of caching).

That’s basically it.  The idea is open for anyone to use and implement (Local DNS and Authoritative Dynamic DNS operators/implementers); its architecture doesn’t preclude any Load Balancing DNS operators from using it.  Some have written that this might eliminate the need for Akamai’s algorithmic magic … which could not be any further from the truth.

Like the idea of sending different responses to different Local DNS was the easy part of the creation of Load Balancing DNS, so in some ways this is the easy part about improving it.  The hard part might very well be again the engineering of Load Balancing DNS systems to effectively use this new data.

Problems some will have…

For some Load Balancing DNS systems, this will be easy and they will get the increased resolution for free (after implementing the change to read the device IP).  I would actually put Akamai in this camp based on what is known about their mapping techniques, despite what others have alluded to in their reporting on this topic.

Other Load Balancing DNS systems will have more challenges.  Take some of the systems based on Geo-to-IP, for example.  As they transition from using Local DNS IP to device IP, their Geo-to-IP tables need to be accurate for a much larger set of IPs, whereas in the past, the IPs of many devices would have been grouped behind very few Local DNS IPs.  Will the accuracy be there?  Will any shortcomings in Geo accuracy for device IPs actually outweigh the problem this is intended to solve?

The active measurement systems might have the biggest problem using this new information.  CDNetworks’ Load Balancing DNS system pinged my own Local DNS three times from each of 44 locations for a total of 132 pings after I asked for the mapping of a single dynamic hostname (basically, I sent them one packet and I got 133 in return).  These pings are meant to determine which of their 44 PoPs would be best-suited for me.  If they switch to use the device IP for the destination of their pings, their total ping traffic might go up dramatically.  How much ping traffic will this be and how upset will end users get?  Worse still, since a DNS lookup against CDNetworks results in pings, IF they implemented the Google Draft (a quick test suggests they have not implemented it yet), a nefarious user could use the extension to direct CDNetworks to ping-targeted IPs.  As the pings seem to be rate limited, it’s not the biggest threat, but it’s not good either.

Scale of the problem and alternatives…

Also independent of any Load Balancing DNS systems, a ton of requests from devices get directed to different server farms using other methods.  Because many systems don’t use Load Balancing DNS, they don’t suffer from the shortcoming that this Draft proposes to solve.  Specifically, they do not depend on the IP of the Local DNS when selecting a server farm.

While many CDNs use Load Balancing DNS, at Highwinds we use HTTP Anycast (as does Edgecast), one that has implemented the Google system, though they use it in combination with Load Balancing DNS). Anycast makes no assumption about the location of the local DNS and or whether or not the user is near it. Even more than Anycast, a lot of device requests are routed to different server farms based on in-protocol methods: either writing server farm-specific links in HTML, XML, SMIL, etc. or using protocol redirection (like HTTP 302).  Sites like Netflix use customized XML responses to direct users to specific server farms – server farms in their case are different CDNs – as do many companies that do Flash Player-based load balancing (like Conviva).

Finally, you should think of the Google feature as fine-tuning.

It may apply to a great number of users in the future, but it currently represents very few.  Also, depending on the Local DNS that has implemented and the number of different server farms managed by a given Load Balancing DNS system, selection of the optimal PoP might only change in a very small percentage of lookups (in other words, what’s good for the Local DNS will still be good for the device).  So, on today’s Internet, this impacts very few requests.

While I think this is an important development, and I applaud the group for moving it forward, I think many operators of Load Balancing DNS have more important fundamental improvements they should be making to their systems that would lead to bigger impacts on performance – which was the goal of all of this to begin with.

The underlying problem…

Consider the following example.

Using about 50,000 open Local DNS servers all over the world, I mapped the majority of server farms for a 12 CDNs that use Load Balancing DNS.  By requesting name-to-IP mapping via 50,000 different Local DNS all over the world, I was able to see to which server farm each CDN directed each of the 50,000 Local DNS.  This works because the Load Balancing DNS assumes I am ‘near’ the Local DNS I used to make the request (the heart of the Google draft problem). Because the number of Local DNS I used was large and globally distributed, it resulted in a fairly complete map of server farms for each provider.  I repeated the test ten times from each Local DNS to each CDN over the course of a few hours.

The results were very interesting.  The data shows areas of inconsistency within each single CDN where their Load Balancing DNS system will direct a single Local DNS to many different server farms all over the world over the course of a few hours.  One might expect some variation within a region, but not across multiple continents.  The data also shows inconsistency across CDNs where different CDNs will send the same Local DNS to different continents as if they can’t all agree as to the location of the Local DNS and, as such, where in the world the best server farm may be.

Now, some of these inconsistencies can be explained away by peering, load, different server farm locations, or transient issues.  But some are so striking it appears as if the CDN is flailing about looking for an answer.  Every CDN would be better served by improving the Load Balancing DNS system they have today rather than – and in many cases, it might be a requirement – to make using this particular Internet-Draft a possibility.

Questions, feedback? Rich.Day@highwinds.com

More on the gory (specific) details of mapping shortcomings I saw in a follow-up post next week (as well as the obligatory Highwinds plug)!

By Rich Day, President, Highwinds
Rich Day