big data
Breaking Down Apache’s Hadoop Distributed File System Apache Hadoop is a framework for big data. One of its main components is HDFS, Hadoop Distributed File System, which stores that data. You might expect that a storage framework that holds large quantities of data requires state-of-the-art infrastructure for a file system that does not fail, but quite the contrary is true. |
||
When to Use MapReduce with Big Data MapReduce is a programming model for distributed computation on big data sets in parallel. It's a module in the Apache Hadoop open source ecosystem, and a range of queries may be done based on the algorithms available. Here's when it's suitable (and not suitable) to use MapReduce for generating and processing data. |
||
Before Data Analysis, You Need Data Preparation One of the prerequisites for any type of analytics in data science is data preparation. Raw data usually has several shortcomings in structure, format, and consistency, so first it has to be converted to a usable form. These are some types of data preparation you can conduct to make your data useful for analysis. |
||
Exploring Big Data Options in the Apache Hadoop Ecosystem With the emergence of the World Wide Web came the need to manage large, web-scale quantities of data, or “big data.” The most notable tool to manage big data has been Apache Hadoop. Let’s explore some of the open source Apache projects in the Hadoop ecosystem, including what they're used for and how they interact. |
||
Data-Driven Testing Skills in an Agile and DevOps World For agile and DevOps, an understanding of the role of data analysis in the test strategy is helping teams accelerate development, testing, and deployments. As we continue to enhance our testing effectiveness, data analytics skills are an important dimension in managing risks in a “continuous everything” world. |
||
Test Your Data Quality to Increase the Return on Your QA Investment With the high volume of data coming into your organization, it’s important that it be complete, correct, and timely. But considering the velocity at which this data is moving, how do you measure its current quality? You must be able to test it wherever it sits still enough to be viewable, without altering it. |
||
What You Should Consider to Make the Best Use of Your Collected Data We live in a world where data is constantly being recorded. In software, determining the timing of when to use that data is critical to making the most of the information. You should take into account data freshness, the data-gathering processes and any dependencies between them, and when to distribute information. |
||
Here There Be Monsters: The Value of Data Profiling Monsters appeared on medieval maps to identify the unknown dangers of the sea. Likewise, the data profiles for an organization identify the points within its data. A robust data-profiling strategy can provide a more accurate picture of an organization’s data systems and find risks before they become monsters. |