I had discussion on PHP dev mailing list. It appeared that not few developers misunderstand “What is Input Validations“. Therefore, I summarize computer programming fundamentals and basics to help understanding validations.
Without fundamentals and basics understanding, discussion is meaningless. “Input Validation” is one of the most important countermeasure for cyber attacks. The idea is strongly supported by notable computer scientists (e.g. CMU’s CERT) and security specialists, yet it is misunderstood by majority of developers. Therefore, almost all web applications are missed to have proper “Input Validation” currently.
I came across this discussion too often, so I summarize why, “Input Validation”, “Input Data Validation” or “Application Input Data Validation” especially, is fundamental and mandatory requirement for applications.
TL;DR;
“Application Input Data Validation” and “Business Logic Validation” are 2 different validations by fundamentals and principles. Except few exceptions, ALL web application input data can be validated by “Application Input Data Validation” without user interactions. This achieves more secure state and cleaner/manageable code structure. Thus, applications should implement “Application Input Data Validation” always.
Note: “Application Input Data Validation” is NOT a replacement of “Business Logic Validation”.
Fundamentals and Basic Principles for Programming
Fundamentals:
- Computer programs can NOT execute code correctly without “Valid” input data.
- Any programs have “Input” -> “Program Logic” -> “Output” structure.
- Any inputs can be categorized to mutually exclusive 3 groups, that are “Correct Inputs”, “Input mistakes (by user, etc)” and “Invalid Inputs”.
Basic Principles:
- Fail Fast – If something fail later, it should fail as fast as it can.
- Separation of Concerns – Concerns should be separated according to structure/design/task.
- Boundary Protection – This could be categorized as a fundamental. Anything that requires security needs “Boundary Protection(s)”. It requires to verify/assure things that across the boundaries.
- Defense in Depth – This does not mean security defence must be done in deep core, but “Multiple Layer Protections”.
Relations between “Fundamentals / Principles” and “Input Validations”
Fundamentals:
Since computer programs require “Valid” input data, ALL input data must be valid before processing. i.e. “Input Validation” is mandatory.
Since any programs have common “Input” -> “Program Logic” -> “Output” structure, “Input” handling code is suitable place for validation.
Since any inputs will fall into mutually exclusive 3 groups,
- “Correct Inputs” – Valid and Correct input data that programs can proceed.
- “Input mistakes” – Valid and Acceptable input data, but programs cannot proceed.
- “Invalid Inputs” – Invalid input data that programs MUST NOT accept at all.
Application Level “Input Validation” MUST accept only “Correct Inputs” and “Input mistakes”. (Whitelisting is required by number of security standards/guidelines) Remember that programs can only work correctly with “Valid” data.
Principles:
Since, “Fail Fast” suggests anything fails later should fail as fast as possible, “Invalid Input”, that has no ways to handle it correctly, should fail as fast as possible. Any programs have common structure that is “Input” -> “Program Logic” -> “Output”. Fastest place programs can invalidate “Invalid Input” is “Input” part of programs.
Since “Separation of Concerns” (SoC) principle requires to divide concerns 1 , structured design requires to separate “Validations”(concerns) according to the basic program structure, that is “Input” -> “Program Logic” -> “Output” structure at least.
- “Input“ – Validate all inputs are “Correct Data” or “Input Mistakes”.
- “Program Logic“ – Validate logical correctness of input data. Handle “Input Mistakes”.
- “Output“ – Validate data when data validation is needed/required for “Output Sanitization”. See CERT Secure Coding Practices for output sanitization details.
Since outermost “Boundary Protection” is very effective protection for any security defense, one should deploy outermost “Boundary Protection” whenever it is possible. In case of software applications, “Application Boundary” is the outermost boundary, therefore “Application Boundary Protection” should be deployed. This means “Application Level Input Validation” at “Application Boundary” is the best practice.
Since “Defense in Depth (Multiple Layer Protections)” requires multi layered protection approach, programs should deploy multiple layers of protection according to common basic program structure, that is Input” -> “Program Logic” -> “Output” structure at least.
As you can see from these Fundamentals and Principles, they suggest comprehensive “Application Level Input Validation” implementation with “White Listing” (Known good approach). Note that they suggest “Multiple Layer Protections” and “Separation of Concerns” also. “Application Level Input Validations” (and other level input validations) are NOT the sole task of software security, but it is simply one of mandatory requirement/task for programs to work as it should.
Who suggests/requires “Input Validation” as the most important feature for software security
Following 3 references and lists “Input Validation” as the first elements of software security.
Note: ISO 27000/ISMS, PCI DSS, NIST SP800 and many others require this practices as the standard. It may be considered as “Principles”.
CWE/SANS Top 25 – Monster Mitigations
Note: PCI DSS requires it.
OWASP Secure Coding – Quick Reference Guide
Note: PCI DSS requires it.
2017 OWASP Top 10 has introduced new #10 (A10) vulnerability that “Web applications lack capability to detect/log/notify/repond attacks via DAST (Dynamic Application Security Test) tools is categorized as vulnerable. To handle DAST tool attacks as OWASP suggests, “Application Level Input Data Validation” is mandatory. OWASP TOP 10 lists “Input Validation” as defense for #1 (A1) vulnerabilities for years. In fact, they listed “Unvalidated Inputs” as the #1 vulnerability for the first edition. (Many sloppy developers misunderstood “Input Data Validation is the whole task for data security”, and they modified later top 10 list. CERT Secure Coding Practices explicitly explains “Input” and “Output” security measure is independent.)
OWASP Code Review Guide explains “Application Level Input Validation” and “Business Logic Validation” in “7.6 Input Validation”. (They use “Data Validation” and “Business Validation” as the term respectively.)
“Input Validations” is fundamental requirement for softwares
“Input Validations” are fundamental requirement for secure software. Applications must have proper “Application Level Input Validations” and “Business Logic Validations”.
Even if “Input Validations” is mandatory for application, it is not mandatory for all types of softwares.
For instance, libraries can/may/should omit “Input Validations” for reasons. Libraries can require users to make sure to pass “Valid” data for it. There are many such libraries. e.g. free() in C. “Input Validations” in libraries could be too inefficient. For example, PCRE can perform regex match few hundred times faster when encoding validation is omitted. Library is just a part of program and it can/may/should omit “Input Validations” for good reasons. (Never trust APIs! unless you’re 120% sure data is processed securely)
Even when libraries do “Input Validations” properly (Reject Invalid), applications cannot make sure it works correctly without “Application Level Input Validation”.
These are example vulnerabilities need proper Application Level Input Validations.
This vulnerability execute code by malformed type and data.
Too large password, such as 1MB password data, results in DoS.
MongoDB can be injection attack target. With PHP, making target parameter to array achieves successful attack.
These are trivial mistakes that don’t validate inputs. Yet, this kind of vulnerabilities are created over and over. Why is this happening?
It is natural because “Programmers are making application that do the jobs.”, NOT “Handling strange and/or bad inputs from attackers”. Latter is important, but it’s not a primary task for building applications. Here comes “SoC” principle again. Programmer should separate concerns so that, “Input” makes sure parameters have acceptable form, and “Logic” do the other tasks for it.
Applications are extremely vulnerable without “Application Level Input Validations”
Why? It’s easy to understand. There are 2 types of APIs that are “Encoding aware API that reject invalid chars” and “Binary safe API that must accept any data“. e.g. File functions must accept any data.
This fact should ring security specialists’ alarm. 2 types of APIs has been produced countless vulnerabilities, such as “Null Byte Injections” – Binary safe API vs. Non-binary safe API.
If conditions are met, “Invalid Encoding Data Injections” can execute attacker’s commands. This is difficult, but it can perform DoS attacks very easily. It’s script kiddy’s task. It even does not require “script” for kiddies. Attacker can inject “Invalid Encoding Data” into some storage that is reused later. (e.g. cache, database, file, remote API, etc. “Comment” may be good place to attack, for example.)
When victim access to the web, they may see entirely blank screen (nothing rendered) or partially rendered web pages. (Every web developer should know browsers can reject to render page with invalid char encoding data. Functions such as PHP’s htmlspecialchars() returns NULL for invalid char encoding, etc, etc) Errors raised by libraries and/or external services that perform validations can put an web application into DoS state also.
“Proper Char Encoding Handling” is very easy task. Validate ALL inputs so that they have valid char encoding. Note that attacker’s invalid char encoding data may be contained in HTTP headers and system may store and reuse it later. Developer must validate ALL inputs literally. BTW, “Invalid Encoding Attack” is not the only one of this type of attacks. Attackers can performs any type of “Invalid DATA Attacks”.
Another easy attack is “Too large data”. Many applications simply lacks too large data validations in both “Application Level Input Validation” and “Business Logic Validation”. DoS attacks by too large data may not be serious issue and many part of application can handle it with reasonable resources. However, question is, for example, “Accepting 1MB or even 1GB data for an address is correct application behavior or not?”. There should be predefined upper limits for data even when your Web application UI does not limit them. Too large data can easily be a cause of DoS attacks.
That said, almost all web applications exist today do not have proper “Input Validations”, “Application Level Input Validation” especially, even if many standards require it as the first software security measure for many years.
GDPR will start in this coming May 25. I wander what happens if an application lacks fundamental security measure had massive personal information leak? Is “massive info leak like Struts2’s Content-Type/Length/Disposition attack” is considered as unrelated issue for GDPR?
※ In this discussion, the developer seems (I’m not sure) to validate “Web Form” values. In the mainframe era, it was OK to validate only “Form” value since the “Form” values are predefined. i.e. Proprietary fixed protocols. We are living in the Web world, HTTP does not define fixed “Form”. Values (GET/POST/COOKIE and including other HTTP Headers) from client can be anything, attackers’ request especially. Validating only “Web Form” value is not proper validation at all today. Every developer should realize this.
- “Divide and Conquer” principle is yet another famous principle similar to SoC. Keeping a application in secure state is extremely complex task (if it is simple, we already should overcome software security issues). This fact also suggests that developers should adopt “Divide and Conquer” principle for application security. ↩