How the Weather Company Is Using Big Data to Stay ahead of the Storm

Although we are well into April, about half the country is still experiencing temperatures barely above freezing, and some parts continue to get nights colder than that. The winter of 2013-2014 was worse than usual for many cities across the country. This had people turning to weather forecasting services on the Internet—that is, when they had Internet access—more than usual.

In an effort to further improve its prediction abilities, The Weather Company is completing a major technology overhaul: consolidating its thirteen data centers down to four, shifting to rely on public cloud providers, and moving to a NoSQL-powered big-data platform, which can gather some twenty terabytes of weather data a day.

The Weather Company is the parent organization of the Weather Channel, Weather Underground, WeatherFX, and Intellicast; serves hundreds of thousands of people; and provides the information for thousands of mobile weather apps. That translates to billions of data requests a day, and performance has to be fast.

Bryson Koehler, CIO of the Weather Company, says its SUN (Storage Utility Network) system captures 2.25 billion weather data points fifteen times an hour, up from 2.2 million data points four times an hour that the company's legacy on-site platform produced.

The NoSQL environment’s ability to scale to extremely large sizes helps with the intake of all this additional data, and faster queries mean quicker, more accurate weather forecasts for users worldwide.

Built on Basho's Riak NoSQL databasethe new big-data platform is running in the Amazon Web Services cloud, with backup resources on the Google Compute Cloud. Riak beat out Cassandra, MongoDB, and Hadoop, mainly for its ease of use on a grand scale.

Koehler told InformationWeek:

When you're globally distributing massive amounts of data across Amazon nodes or Google Compute nodes, you want something that's simple to use and configure. Cassandra, for example, is great at distributing data, but it's complicated and complex to run. Riak was built to handle massive data movement, replication, and data-synchronization on a cloud-based, globally distributed data platform.

Of course, switching to this new platform has not been without challenges. The primary obstacle, Koehler says, has been predicting costs.

"We have to make sure that we engineer [the system] so we understand the exact cost per transaction," Koehler explains. By year's end the company expects to handle more than 15 billion transactions per day on the platform, "so every 100th of a penny starts to add up." Those transactions are mostly web- and mobile-app service calls against the company's hundreds of APIs.

The Weather Company also has to decide how much data to cache and how frequently to refresh data and generate new forecasts. Those choices are not static—they will need to change if the country is experiencing a sudden, unpredictable snowstorm, like this past season.

"We're still on the learning curve on how to best tune the system, how we monitor, and how we respond when things go wrong," Koehler said.

Up Next

About the Author

TechWell Insights To Go

(* Required fields)

Get the latest stories delivered to your inbox every month.