Secure input and output handling
Secure input and output handling are secure programming techniques designed to prevent security bugs and the exploitation thereof.
Objective[edit]
Input and output handling is to ensure that data passed between computing systems do not yield unexpected behaviour on the systems, i.e. safe to use, preventing code injection bugs.
Methods[edit]
There a few methods which can make data safe for handling, with different level of security and applicability:
Validation[edit]
Validating data is to ensure that data is safe prior to use.
The most secure way[according to whom?] to do this is to Terminate on suspicious data and use a Whitelist strategy to determine if execution should be terminated or not. This behaviour is however not always preferred[by whom?] from a usability point of view.
Whitelists and blacklists[edit]
In computer security, there are often known good data — data the developer is completely certain is safe. There are also known bad characters; data the developer is certain is unsafe (can cause Code injection etc.). Based on this, two different approaches to how data should be managed exists:
- Whitelist (known goods)
- A Whitelist is a list of "known good data". A Whitelist is basically a list which says "A, B and C is good (and everything else is bad)".
- Blacklist (known bads)
- A Blacklist is a list of "known bad data". A Blacklist is basically a list which says "A, B and C is bad (and everything else is good)".
Security professionals[who?] tend to prefer Whitelists, because Blacklists may accidentally treat bad data as safe. However, in some cases a whitelist solution may not be easily implemented.
Handling non-safe data[edit]
Terminate/stop/abort execution[edit]
This is a very safe strategy. If unexpected characters occur, abort execution. But if implemented poorly, it can lead to a denial-of-service attack in which the attacker floods the system with unexpected input, forcing the system to expend scarce processing and communication resources on rejecting it.[citation needed]
Filter the data[edit]
Filtering input is used as a less orthodox security principle than Terminate/stop/abort on input problems.
- The benefit of the filter approach is that to end-users, the security mechanism often behaves in a less intrusive manner. For example, if "
*
" is illegal, then "I ***LOVE*** you
" will just become "I LOVE you
", which is experienced as a minor but acceptable oddity. - The downside is that the filter approach is a bit difficult to get right — in practice many applications have the filter applied at one place in the code, but the programmer accidentally uses the unfiltered input at another place.
- The filter may change the meaning of the data, potentially leading to other unexpected behaviours.
- Automatic taint checking
Some programming languages[specify] have built-in support for taint checking. These languages throw compile time or run time exceptions whenever a variable derived from user input is used in a risky way, e.g. to execute a shell command.
- White list example
- An input filter which expects all characters to be of charset
A-Za-z
is used to protect a UNIX application from shell injection. - Attacker supplies input
; ls -l /
to attempt shell injection. - Filter is applied to input.
- Characters
; - /
are thrown away by filter because they are not in whitelist. - Characters
lsl
are kept by filter because they are in whitelist. - Exploit attempt fails because only safe input remains.
- Black list example
A strategy that is usually insufficient is to filter out known bads. If the characters in the set [:;.-/] are known to be bad, but ; ls -l / is received, the original input is replaced with ls l (;-/ are thrown away). This strategy has several problems:
- It does not protect against unknown threats. There may be other "bad" inputs that the developer did not consider.
- It does not protect against future threats. Inputs that are safe at present may obtain a dangerous interpretation if the underlying language changes. For example, a UNIX command line security filter designed to stop attacks against C shell will be insecure if the software is moved to an environment using bash.
Encode (escape) data[edit]
"Encoding" processes content that is about to be used in another application so that any characters which have potentially special meanings to the receiving application are made safe. Characters from a typical known safe charset for the particular destination medium are often left as they are. A simple encoding might leave alone alphanumerics a–z, A–Z and 0–9. Any other characters could be possibly interpreted in an unexpected manner, and are therefore replaced with the appropriate "encoded" representation.
The data is unconditionally encoded according to the syntax of the receiving application, making the data safe for it. For example, when inserting data to SQL database, ' OR 1=1 --' is encoded to \ \'\ OR\ 1\=1\ \-\-' , making the string a valid SQL string rather than modifying the behaviour of the intended SQL statement, when generating HTML page, <script> is encoded to <script>
Encoding is language specific, therefore care need to be taken to encode the input according to the intended usage, and not passing raw (unencoded) input into other languages.
Other solutions[edit]
There may be other solutions, depending on which programming language is used and what type of code injection is being prevented. E.g., the htmLawed PHP script can be used to remove cross-site scripting code.
In particular, to prevent SQL injection, parameterized queries (also known as prepared statements and bind variables) are excellent for improving security while also improving code clarity and performance.
See also[edit]
References[edit]
This article "Secure input and output handling" is from Wikipedia. The list of its authors can be seen in its historical. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.