hadoop

Close-up of database servers Breaking Down Apache’s Hadoop Distributed File System

Apache Hadoop is a framework for big data. One of its main components is HDFS, Hadoop Distributed File System, which stores that data. You might expect that a storage framework that holds large quantities of data requires state-of-the-art infrastructure for a file system that does not fail, but quite the contrary is true.

Deepak Vohra's picture
Deepak Vohra
Shelves storing many books and files Comparing Apache Hadoop Data Storage Formats

Apache Hadoop can store data in several supported file formats. To decide which one you should use, analyze their properties and the type of data you want to store. Let's look at query time, data serialization, whether the file format is splittable, and whether it supports compression, then review some common use cases.

Deepak Vohra's picture
Deepak Vohra
The Software behind the PRISM Intelligence-Gathering Program

News of the National Security Agency’s PRISM intelligence-gathering program has reverberated throughout the media. This sophisticated computer system has the capability to sift through enormous amounts of data and extrapolate meaning, giving the NSA a way to track people and their behaviors.

Jonathan Vanian's picture
Jonathan Vanian