The following article is re-posted from one of our premier partners, Cedexis.

In this article Chris Haag, Cedexis’ VP of Operations, discusses one of the most difficult problems in employing an effective multi-platform traffic management solution — deciding when to make a change.

While the mechanics of a good traffic management platform are not easy to implement, such a solution is no better than the decision-making processes that sit behind it, analyzing current conditions and moving the levers that drive actual benefits for the customer.  If the brain behind that platform is unable to see through the noise and find the “signal” — the true actionable data — then the platform can’t provide maximum value, and in extreme cases could actually degrade performance.

A big part of being able to support such detailed decision making processes is the enormous amount of data that the Cedexis platform collects from all over the world — over 1 Billion samples a day.  Most importantly, this data is collected from the “last mile” — which means that it represents Real User Measurements (RUM), rather than older and less accurate server-to-server measurements.

At Highwinds, we use this information on a regular basis to help ensure that we are delivering the best service possible to our customers.  The graph below shows our recent performance in our largest market — the United States.  The “Multi-CDN Optimized” line is what customers can expect to see with a blended delivery approach using the Cedexis platform, and is typically achieved by selecting the best routes from top CDNs like Highwinds, Akamai, and others.

CDN response time

As you can see, the independent data from Cedexis validates the hard work that our engineering and operations teams perform daily, and gives us the confidence to truthfully assert to our customers that they’ve made a great choice with Highwinds.  We are a big believer in the Cedexis approach, due greatly to the extremely thoughtful focus that they place on this critical area.  We are proud to call them a partner, and have included their technology in our SelectPath CDN load balancing solution. If you’d like to know more about SelectPath or the Cedexis platform in general, please don’t hesitate to contact us today.

Signal from Noise: Handling outliers in Radar Data
Each day Cedexis receives upwards of a billion Radar measurements from a plethora of web browsers, tablets and mobile devices. We group these measurements into CIDR blocks based on the  Antonymous System Number (ASN) and country of origin, we see measurements from over 80,000 of these groupings.

Even after grouping the measurements, the data is very noisy. Finding the signal in all that noise is key to delivering on the promise of latency based load balancing. For example, here is a graph of 5,000 seconds of raw Radar Measurements  (about 90 minutes) taken of two global CDNs in Australia from Telstra’s main ASN 1221. We’ll call them “Blue” and “Red”.  Blue has nodes in Australia and is usually strong there, Red has not yet built out an Australia presence. At the start of this graph Blue is having an outage affecting all of their Australian nodes, about 18,000 seconds in their Australian Point of Presence comes back on line.

The Y axis is HTTP Round Trip Time for a 50 byte object in milliseconds

The X axis is in seconds

Cedexis highwinds 1

To the human eye, it is quite clear that you are better off serving your content from Red and then switching to Blue about half way through. Like a lot of problems, the human eye turns out to make easy what computational algorithms find challenging.  Successful latency-based load balancing requires real-time, big data processing. There are plenty of ways to process these data in a manner which delivers suboptimal results. For example, what if we employ a moving average with an exponential decay.

Note: This graph encompasses 8 hours of data including the 90 minutes shown above:

Cedexis highwinds 2

This approach is closer, but due to all the outliers inherent in taking Real User Measurements, your Load Balancer will frequently choose Blue erroneously in the first 2/3rd of this graph and even after Blue fixes their issues, Blue’s moving average jumps up and you’ll choose Red even when Blue is clearly better. Worse, this approach will cause you to frequently switch between CDNs when the switching offers no value. Increasing the chance of a cache miss without directing the consumer to a better performing CDN.

Cedexis has been working on this problem for more than three years. Here is what our current, best algorithm makes of these data and corresponds to the decisions our customers made during this time period.

Cedexis Highwinds 3

With the noise removed we can see that there are short time periods when Blue is the right choice even before their Australian PoP comes back on line.

Thanks to Cedexis’s data handling, OpenMix has seen through the noise, only reacting to statistically significant outperformance by Blue. It is also interesting that Blue appears to have had some short issues even after fixing the larger problem. Cedexis customers have the benefit of our real-time, big data processing and sleep well knowing that we’ll always send their web visitors to the best possible CDN even during rapidly changing conditions.