Hash Algorithm and Secure Hash Algorithm

       "A hash function is a reproducible method of turning some kind of data into a (relatively) small number that may serve as a digital "fingerprint" of the data. The algorithm "chops and mixes" (i.e., substitutes or transposes) the data to create such fingerprints. The fingerprints are called hash sums, hash values, hash codes or simply hashes. (Note that hashes can also mean the hash functions.) Hash sums are commonly used as indices into hash tables or hash files. Cryptographic hash functions are used for various purposes in information security applications." ¹

       "Hash functions are designed to be fast and to yield few hash collisions in expected input domains. In hash tables and data processing, collisions inhibit the distinguishing of data, making records more costly to find.

       A fundamental property of all hash functions is that if two hashes (according to the same function) are different, then the two inputs are different in some way. This property is a consequence of hash functions being deterministic. On the other hand, a function is not injective, i.e., the equality of two hash values ideally strongly suggests, but does not guarantee, the equality of the two inputs. If a hash value is calculated for a piece of data, and then one bit of that data is changed, a hash function with a strong mixing property usually produces a completely different hash value.

       Typical hash functions have an infinite domain, such as byte strings of arbitrary length, and a finite range, such as bit sequences of some fixed length. In certain cases, hash functions can be designed with one-to-one mapping between identically sized domain and range. Hash functions that are one-to-one are also called permutations. Reversibility is achieved by using a series of reversible "mixing" operations on the function input." ¹

       "Because of the variety of applications for hash functions (details below), they are often tailored to the application. For example, cryptographic hash functions assume the existence of an adversary who can deliberately try to find inputs with the same hash value. A well designed cryptographic hash function is a "one-way" operation: there is no practical way to calculate a particular data input that will result in a desired hash value, so it is also very difficult to forge. Functions intended for cryptographic hashing, such as MD5, are commonly used as stock hash functions.

       Functions for error detection and correction focus on distinguishing cases in which data has been disturbed by random processes. When hash functions are used for checksums, the relatively small hash value can be used to verify that a data file of any size has not been altered." ¹

       "Hash tables, a major application for hash functions, enable fast lookup of a data record given its key. (Note: Keys are not usually secret as in cryptography, but both are used to "unlock" or access information.) For example, keys in an English dictionary would be English words, and their associated records would contain definitions. In this case, the hash function must map alphabetic strings to indexes for the hash table's internal array.

       The ideal for a hash table's hash function is to map each key to a unique index (see perfect hashing), because this guarantees access to each data record in the first probe into the table. However, this is often impossible or impractical.

       Hash functions that are truly random with uniform output (including most cryptographic hash functions) are good in that, on average, only one or two probes will be needed (depending on the load factor). Perhaps as important is that excessive collision rates with random hash functions are highly improbable—if not computationally infeasible for an adversary. However, a small, predictable number of collisions is virtually inevitable (see birthday paradox).

       In many cases, a heuristic hash function can yield many fewer collisions than a random hash function. Heuristic functions take advantage of regularities in likely sets of keys. For example, one could design a heuristic hash function such that file names such as FILE0000.CHK, FILE0001.CHK, FILE0002.CHK, etc. map to successive indices of the table, meaning that such sequences will not collide. Beating a random hash function on "good" sets of keys usually means performing much worse on "bad" sets of keys, which can arise naturally—not just through attacks. Bad performance of a hash table's hash function means that lookup can degrade to a costly linear search.

       Aside from minimizing collisions, the hash function for a hash table should also be fast relative to the cost of retrieving a record in the table, as the goal of minimizing collisions is minimizing the time needed to retrieve a desired record. Consequently, the optimal balance of performance characteristics depends on the application.

       One of the most respected hash functions for use in typical hash tables is Bob Jenkins' LOOKUP3 hash function, published in an article in Dr. Dobb's Journal. The hash function performs well as long as there is no adversary, for it is trivially reversible and useless as a cryptographic hash function." ¹

       "Using a hash function to detect errors in transmission is straightforward. The hash function is computed for the data at the sender, and the value of this hash is sent with the data. The hash function is performed again at the receiving end, and if the hash values do not match, an error has occurred at some point during the transmission. This is called a redundancy check.

       For error correction, a distribution of likely perturbations is assumed at least approximately. Perturbations to a string are then classified into large (improbable) and small (probable) errors. The second criterion is then restated so that if we are given H(x) and x+s, then we can compute x efficiently if s is small. Such hash functions are known as error correction codes. Important sub-class of these correction codes are cyclic redundancy checks and Reed-Solomon codes." ¹

       "The SHA (Secure Hash Algorithm) hash functions refer to five FIPS-approved algorithms for computing a condensed digital representation (known as a message digest) that is, to a high degree of probability, unique for a given input data sequence (the message). These algorithms are called "secure" because (in the words of the standard), "for a given algorithm, it is computationally infeasible 1) to find a message that corresponds to a given message digest, or 2) to find two different messages that produce the same message digest. Any change to a message will, with a very high probability, result in a different message digest." The five algorithms, denoted SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512, are cryptographic hash functions designed by the National Security Agency (NSA) and published by the NIST as a U. S. government standard. The latter four variants are sometimes collectively referred to as SHA-2.

       SHA-1 is employed in several widely used security applications and protocols, including TLS and SSL, PGP, SSH, S/MIME, and IPsec. It was considered to be the successor to MD5, an earlier, widely-used hash function.

       The security of SHA-1 has been somewhat compromised by cryptography researchers. Although no attacks have yet been reported on the SHA-2 variants, they are algorithmically similar to SHA-1 and so efforts are underway to develop improved alternative hashing algorithms. Due to recent attacks on the SHA-1, "NIST is initiating an effort to develop one or more additional hash algorithms through a public competition, similar to the development process for the Advanced Encryption Standard (AES)." [4]. It's tentatively scheduled to proclaim winner and publish final standard in 2012." &sup4;

       "NIST has published four additional hash functions in the SHA family, each with longer digests, collectively known as SHA-2. The individual variants are named after their digest lengths (in bits): SHA-224, SHA-256, SHA-384, and SHA-512. The latter three were first published in 2001 in the draft FIPS PUB 180-2, at which time review and comment were accepted. FIPS PUB 180-2, which also includes SHA-1, was released as an official standard in 2002. In February 2004, a change notice was published for FIPS PUB 180-2, specifying an additional variant, SHA-224, defined to match the key length of two-key Triple DES. These variants are patented in US patent 6829355.

       SHA-256 and SHA-512 are novel hash functions computed with 32- and 64-bit words, respectively. They use different shift amounts and additive constants, but their structures are otherwise virtually identical, differing only in the number of rounds. SHA-224 and SHA-384 are simply truncated versions of the first two, computed with different initial values.

       These new hash functions have not received as much scrutiny by the public cryptographic community as SHA-1 has, and so their cryptographic security is not yet as well-established. Gilbert and Handschuh (2003) have studied the newer variants and found no weaknesses." &sup4; "SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512 are the secure hash algorithms required by law for use in certain U. S. Government applications, including use within other cryptographic algorithms and protocols, for the protection of sensitive unclassified information. FIPS PUB 180-1 also encouraged adoption and use of SHA-1 by private and commercial organizations. A prime motivation for the publication of the Secure Hash Algorithm was the Digital Signature Standard, in which it is incorporated. The SHA hash functions have been used as the basis for the SHACAL block ciphers." &sup4;

       "Cryptographic hash ciphers are designed to quickly process large quantities of data; for example, to hash data and append hashes to packet headers on the fly as the packets are sent over the network. The processing rate of cryptographic hash ciphers in MB/sec is generally comparable to the processing rate of stream ciphers such as RC4 and is 1.5 to 2 times above the processing rate of AES. Obviously, there is a performance penalty for using more secure, larger hashes, and MD5 would have a higher data throughput than Tiger (on 32-bit CPUs) or SHA-1.

       Cryptographic hashes are fine to sustain data integrity via data fingerprinting or to identify users against databases of hashed passwords. However, by themselves they do not authenticate the data itself; the attacker can alter the original data before hashing takes place. One possible solution for this problem is using a HMAC, also called a keyed message digest. A HMAC is nothing more than a cryptographic hash and shared secret key combined. Thus, the data gets encrypted before it is hashed, and the attacker would have to break the symmetric cipher key after generating the original message from the hash or break the symmetric cipher key if he or she has access to data before hashing takes place. An example of message authentication code specifically designed for improving wireless security is Michael (MIC)." &sup5;

       "The main problem encountered in the design of MIC was developing a HMAC that would run on legacy hardware without imposing significant penalties on network throughput and latency. The client hosts can offload the HMAC computation to the sufficiently powerful laptop or even PDA CPU, even though it is still undesirable! What if a company decides to design and manufacture a tiny 802.11-enabled mobile phone? Besides, many access points do not boast high processing power. Yet, the AP or a wireless bridge should be able to verify both integrity and authenticity of the bypassing packets. Recall the structure of SHA with its 80 iteration rounds and imagine generating such a hash for every packet sent over the wireless network. Would a common access point or a PDA be able to implement that process without significant resource exhaustion? Not very likely!

       Thus, an entirely new algorithm called MIC was designed by Niels Ferguson to provide packet integrity checking and forgery detection on TKIP-enabled WLANs. It was designed as a third attempt, after two previous designs called Mickey and Michelle. MIC is a trade-off between security and resource consumption and implementation capability. It runs on older wireless access points and client hardware without imposing a significant performance penalty, but the security level it provides is only 20 bits. As you should understand by now, in modern cryptographic terms this is not a lot." &sup5;

       "Before discussing the trade-off and its practical outcome possibilities, learning how MIC works is helpful. The MIC secret key consists of 64 bits and is represented as an 8-byte sequence k0...k7. This sequence is converted to two 32-bit little-Endian words, K0 and K1. Throughout the MIC design, all conversions between bytes and 32-bit words use the Little-Endian conventions, because the cipher is expected to run on Little-Endian CPUs. In fact, the majority of access points now majority of access points now manufactured use older Intel line chips such as i386 or i486. MIC operates on the data field, as well as source and destination address fields of the wireless frame. The integrity of IVs is not protected and the data field is not interpreted. Before the cipher runs, the frame is padded at the end with a single byte (value 0x5a), followed by 4 to 7 zero bytes. The number of zero bytes is selected to ensure that the overall length of the padded frame is always a multiple of four. The padding is never transmitted with the frame; it is used only to simplify the computation over the final block. After the padding, the frame is converted into a sequence of 32-bit words M0...MN-1, where N = [(n+5)/4]. By design, MN-1 = 0 and MN-2 != 0.

       The MIC value is computed starting with the key value and applying a block function b for every message word. The cipher loop runs a total of N times (i includes 0 to N-1 values), where N is the number of 32-bit words making up the padded frame. The algorithm produces two words (l,r), which are converted into a sequence of eight Little-Endian octets, the MIC value:

Input: Key (K0, K1) and padded frame (represented as 32-bit words) M0...MN Output: MIC

value (V0, V1) MIC <= ((K 0, K1) , (M0,...,MN)) (l,r) <=(K0, K1) for i = 0 to N-1 do l <= l ^= Mi (l,r) <= b(l,r) return (l,r) The MIC value is appended to the frame as data to be sent.

       The block function b used by MIC is a tiny Feistel algorithm that employs alternating additions and XORing. The <<< signifies left rotation and the >>> indicates right rotation of 32-bit values, and XSWAP is a function that exchanges the position of the two least significant bytes with the position of the two most significant bytes in a word:

Input: (l,r) Output: (l,r) b(L,R) 35 r <= r ^= (l <<< 17) l <= (l + r) mod 232 r <= r ^= XSWAP(l) l <= (l + r) mod 232 r <= r ^= (l <<< 3) l <= (l + r) mod 232 r <= r ^= (l >>> 2) l <= (l + r) mod 232 return (l, r)

       As you can see, the cipher is neither sophisticated nor strong. It was estimated that an attacker has one chance in a million of sneaking in a frame with a compromised payload but correct MIC. One might argue that significant damage can be done by inserting a single modified frame after 1 million frames sent. However, the old WEP ICV (CRC-32) is still used as well, and has to be faked together with MIC. Thus, such attacks are neither easy nor have a high probability of success. Nevertheless, to mitigate their success the so-called TKIP countermeasures were introduced. When more than a single forgery attempt in a second has been detected, the host deletes the groupwise or pairwise key (depending on whenever a unicast or multicast frame was affected), de-associates, and waits for a minute before the re-association. Thus, the possibility of an evil Joe Cracker sending a few million modified frames to sneak in a few of them undetected is eliminated.

       However, the same Joe Cracker might turn desperate and try to send forged frames to trigger the countermeasures and cause a DoS attack, employing not a bug, but a feature. The possibility of such DoS attacks introduced by a new security feature was widely argued. The best example of such discussion is a thread at the Cryptography mail list (http://www.mail-archive.com/cryptography@wasabisystems.com/msg03070.html is the first message in a thread). In this thread Niels Ferguson, the creator of MIC, answers questions considering the possibility of a DoS attack abusing MIC countermeasures. Despite the hullabaloo around the likelihood of this DoS attack and the countermeasures' imperfections, such an attack might not be as realistic and easy to launch as many would think. Remember that the TSC will drop all out-of-sequence frames; the attacker thus has to send a frame with a "future," yet unused, IV. However, recall that the IV is actively used by the TKIP per-packet key generation function. If the IV is changed, the frame will not be decrypted correctly. Because the CRC-32 is still there, it would not give a proper value, leading to the forged frame being eventually dropped. Thus, the attacker has to sniff out valid frames, delete them to prevent them from reaching the receiver, corrupt the MIC, recalculate the CRC-32 to reflect the changes in MIC, and only then forward the "MIC-of-Death" frames to the target (desirably every 59 seconds). Although possible, it is by no means an easy task.

       Because the final 802.11i release-compatible hardware will have to be optimized for running AES, using a CBC-MAC HMAC implementing AES as a one-way hash would be more practical and secure than employing some form of MIC or a well-known message digest like SHA. It will also remove all possible problems with MIC just discussed. Thus, in some specific cases, it could be preferable to use symmetric block ciphers for data integrity preservation as well as for data encryption and message authentication." &sup5.;

References

1. Hash Function, Wikipedia (www.wikipedia.org ), Available from visit this site.

2. Hash Table, Wikipedia (www.wikipedia.org), Available from visit this site.

3. Hash Table, Wikipedia (www.wikipedia.org), Available from visit this site.

4. SHA Hash Function,Wikipedia (www.wikipedia.org), Available from visit this site.

5. Hash Functions, Their Performance and HMACS, available from visit this site.

Done By: Eng. Saleh Alajji

Amazon Kindle

Cryptography

Software Encryption
Multiprocessor Hash
The Correctness Proof ..
The Adaptive-Hash
An O
Software Encryption
 
© Copyright Eng. Saleh Alajji 2007
Reproduction without permission is strictly prohibited.