Are Tech Companies Keeping Up with the Digitization of Data?
We are living in a world where digital data is booming at an extreme pace. Between 2010 and 2020, a fifty-fold growth is expected. Fortunately, architectures and technologies also have been rapidly evolving to keep up with the exponential growth of data by creating a new big data species.
Let's briefly walk through prospective technologies that have already proven their effectiveness in real life.
Hadoop-Based Platforms
This direction is driven by top vendors such as Cloudera, Hortonworks, and MapR, with an ambitious goal to become the ultimate data operating system. They integrate emerging open source technologies, as well as support a high variety of use cases larger than petabytes of data, from acquiring and storing all enterprise data in a data lake (such as the Hadoop Distributed File System) to data warehousing and SQL-like interactive analytics (such as Impala, Spark SQL, Hive Stinger).
Extended Database Management Systems
Vendors driving further evolution of analytical database management systems include Teradata, HP Vertica, MS PDW, IBM Netezza, and Oracle Exadata. They continue leveraging relational principles that are unbeaten for data warehousing, achieving high data quality, and fast ad-hoc data analysis. At the same time, this group managed to extend SQL compatible scalability to tens and hundreds of terabytes using massive parallel processing and column-oriented architecture.
NoSQL Databases
The NoSQL (or "not only SQL") movement is driven primarily by open source communities, although some technologies, such as MongoDB or Cassandra, can already offer commercial support. Being highly scalable by design, some NoSQL implementations continue to increase capabilities in real-time analytics as well as integrate with BI Platforms such as Jaspersoft, Tableau, or Microstrategy.
Distributed Search Engines
The top two open source distributed search engines, Elasticsearch and Apache Solr, have become valuable tools not only for document indexing, but also for data analysis and exploration. After Splunk demonstrated the power of its search engine indexer that, together with data visualization, creates a highly scalable search and analytics platform, Elasticsearch actively pursued improving its integrated ELK stack (Elasticsearch, Logstash, Kibana) for solving similar business problems with the help of open source tools.
After the relational database management system dominance era ended, the "one-size-fits-all" solution disappeared. The concept of big data à la carte offers a wide selection of overlapping modern technologies, each having strengths and weaknesses.
Only a thorough analysis of business needs with the help of industry reference architectures enables the optimal technologies trade-off and provides the right tool for the right job.