Trusting Your Data: Garbage In, Garbage Out
The saying “Garbage in, garbage out” has long been used in software engineering to express the idea that poor quality input will always produce faulty output.
Yet many applications still fail to apply the axiom. This can result in anything from stored cross-site scripting attacks, to SQL injections, to buffer overflows, or more benign malformed output that can still reduce the application’s quality and usability.
Improper validation of input can affect more than just the security of your application; it can adversely affect your ability to make effective business decisions as well. Bad data can have impacts on how you make quantitative decisions or create reports, if you can’t trust the dirty data you receive.
Any time data is sent to or used for an application, it should be treated as coming from an untrusted data source. Whether that data comes from the end user or the database, the data should be validated before it is used or saved to a trusted data store.
Developers should not assume that data in the database is correct, because human error in input or improper validation happens all the time. In fact, APIs being used by the application may have become compromised, or even the most trustworthy users can make simple mistakes that could potentially put your application into an error state or be open to compromise.
Developers also should not assume that they must only guard against malicious attackers. Anyone and anything, no matter how well-intentioned, could cause problems if you trust all the input implicitly.
Consider validating data from a variety of sources, including URL parameters, databases, internal and external APIs, other applications, and end-users. Input validation should be applied in both syntactic and semantic ways. Syntactic validation enforces correct syntax of structured fields (such as Social Security numbers, the date, and currency symbols), while semantic validation enforces correctness of their values. By treating all data as potentially bad, you will ensure your application is resistant to both errors and attacks.
When it comes to verifying inputs, there is no theoretical limitation to the lengths to which a validator can go. However, by leveraging built-in input validation libraries, utilizing boundary testing, and implementing fuzzing with a risk-based approach, you can make sure your inputs are sufficiently validated and tested.
Applying a consistent approach to input validation across your application will not only reduce the security risk to your company and customers, but also increase quality by reducing the chance of unforseen bugs caused by bad data.