Related Content
When to Use MapReduce with Big Data MapReduce is a programming model for distributed computation on big data sets in parallel. It's a module in the Apache Hadoop open source ecosystem, and a range of queries may be done based on the algorithms available. Here's when it's suitable (and not suitable) to use MapReduce for generating and processing data. |
||
Migrating to the Cloud: Which Model Is Right for You? Cloud computing is a relatively recent trend, and several organizations have opted to migrate their services and data to the cloud. Which of the cloud computing models available is right for which situation? Let’s look at the three options—public, private, and hybrid—and discuss when it's a good idea to use each one. |
||
Trusting Your Data: Garbage In, Garbage Out Poor quality input will always produce faulty output. Improper validation of data input can affect more than just security; it can also affect your ability to make effective business decisions. Bad data can have impacts on how you make quantitative decisions or create reports, if you can’t trust the data you receive. |
||
Migrating a Database? Consider These Factors First Database migration is usually performed with a migration tool or service. Migrating one database to another actually involves migrating the schemas, tables, and data; the software itself is not migrated. Whatever the reason for migration, before you start, explore the options and take these considerations into account. |
||
Before Data Analysis, You Need Data Preparation One of the prerequisites for any type of analytics in data science is data preparation. Raw data usually has several shortcomings in structure, format, and consistency, so first it has to be converted to a usable form. These are some types of data preparation you can conduct to make your data useful for analysis. |
||
Getting Support for the Tests You Need Done It’s often hard for teams to get sufficient time and resources for the amount and quality of tests they think are needed. It’s like management wants testing done but at the same time doesn’t want to commit what’s needed to do it. If that's your case, look at the business side, rank priorities, and negotiate resources. |
||
Exploring Big Data Options in the Apache Hadoop Ecosystem With the emergence of the World Wide Web came the need to manage large, web-scale quantities of data, or “big data.” The most notable tool to manage big data has been Apache Hadoop. Let’s explore some of the open source Apache projects in the Hadoop ecosystem, including what they're used for and how they interact. |
||
When to Use Different Types of NoSQL Databases Web-scale data requirements are greater than at a single organization, and data is not always in a structured format. NoSQL databases are a good choice for a larger scale because they're flexible in format, structure, and schema. Let’s explore different kinds of NoSQL databases and when it’s appropriate to use each. |