Password Cracking and Time-Memory Trade Off
Password Cracking and Time-Memory Trade Off
Written by: xXloginXx (Jason R. Davis)
Published Date: 3/13/2005
This paper has a single intention, and that is: "Define 'Time-Memory Trade Off' and it's perfections/flaws, applications, and significance in the future"�.
Every time I go on line, I usually am up to no good. My intentions are "often"� never hostile, but I do take part in the shady business of password cracking. Meaning I actively use unorthodox methodology, that I know for a fact the FBI frowns down upon, to obtain hashes. Once obtained I usually spend a few hours cracking these hashes via good old fashion bruteforcing. Now, bruteforcing is the most reliable method of password cracking in existence today. Theoretically, there is no password you cannot break, but the catch to all this, is time. You can spend 5 seconds or 10,000 years bruteforcing a password depending on it's weaknesses or strengths in this reference. The driving force behind bruteforcing is the ability to crack short passwords in minutes or hours instead of years. The other wonderful thing about bruteforcing is the "dictionary"� aspect. If a password, 10 characters long, it will take quite a while - I'm not going to compute the time - to crack, except if it's a dictionary word. I can simply run a word list through my bruteforcing utility and for lack of a better word, "shazam"�, I break the password.
// 180 Degrees
Now, after 5-10 years of refining, bruteforcing has proved it's success. Meanwhile the world of encryption has grown aware of it's weaknesses. Hashing algorithms have been revamped to overcome certain problems, such as collisions, to ensure their existence and usability. A result of this effort made by the world of encryption has decreased the usefulness of bruteforcing. To make it worse, even the corporations have caught on, enforcing password policies and expirations. So now, by the time I crack a password, its most likely been changed already. This makes bruteforcing look extremely unappetizing, to say the least. Now I ask myself, why spend 3-4 hours of cumulative effort to break-in, if the password I break is only good for 2 weeks? Hence, the 180 degrees, as the world of bruteforcing has begun it's inevitable turn from success to worthless existence.
// Preconception of "Time-Memory Trade off"�
A few years back, a talented individual with great foresight, thought about the scenario we are in today. Where processing power would be excessive and abundant, not to mention storage space even more so. Then with even more foresight, predicts the strengths of algorithms available in this day and age. After much thought, generated the theory of "Time-Memory Trade Off"�. Which when developed was not feasible, and was simply that, a theory. But the principal is the core of what will drive the password cracking of the entire world into the future as bruteforcing becomes a favorite past time.
The actual principal can be summed up as follows (not verbatim): Instead of generating all possibilities every time; generate and store them one time. Once generated and stored, recalling a specific value takes seconds regardless of the original password's length and/or complexity.
This is the utopia of password cracking.
There is a wonderful resource at:
Credit: NullAck - Author and owner of all material on the above link.
That resource details some of the actual byte-by-byte tradeoffs required to develop such a project in the LM realm. Now that link does not apply to MD5, SHA-x, or any other algorithm, just LM hashes. Regardless of algorithm, the idea behind "Time-Memory Trade Off"� still stands.
// Hashed/Encrypted Password Reverse Lookup
I'm not going to explain the difference between encryption and hashing, as this paper only applies to algorithms that return an exact value each time. The diversity of encryption (elliptic, symmetric, asymmetric, etc...) has resulted in a gray area between the two. Simply keep this in mind: If I process the letter "a"� through an algorithm and get the same result every time, then this concept applies; regardless of whether it is an encryption or hashing algorithm.
I like to call the end result - a finished product - of "Time-Memory Trade Off"� , where all possibilities have been generated and stored a "Reverse Lookup"� database rather than a "TMTO"� database. Essentially, that is what it becomes. Once all values have been computed and stored, you supply a hashed or encrypted version of some text and the table returns the original text. Liken this concept to DNS, you supply a FQDN and it returns the associated IP address(es).
// Still out of reach
A complete Hashed/Encrypted Password Reverse Lookup database is still out of reach, even today. In my own research I've run into 1 simple problem: storage shortage. I have been focusing on MD5 as my algorithm of choice, so for my sake I'll use MD5 as the algorithm of example, from now on.
I have processed all possibilities of a-z (lowercase) to the length of 7 for the MD5 algorithm. That results in 26^7 possibilities. Now, a-z to the length of 7 is not that impressive. But it in fact is, considering the database is just over 170GB before indexing and optimizations. This database size also includes a special technique of removing the filename from the file, originally brought to light by NullAck from his link above, whom deserves credit. So without that technique you can add 26^7*2 bytes to 170GB, making it even more intimidatingly large. Now on such a small scale, 170GB is not so small, as I try to move up to 8 length or 26^8. A rough database size can be calculated as follows: if 26^7 = ~170GB then 26^8 or 26*170GB = ~4.5TB. For each length I add to the database you can take the current size and multiply it by the number of characters utilized (a-z or 26 in my case). This exponential database size increase puts a damper on the physical feasibility of TMTO. And has resulted in many headaches as I develop and scale my database to longer lengths or add characters to the running character set.
// Rainbow Tables - Salvation or Poor Substitute?
Under extreme scrutiny of storage limitations, the concept of rainbow tables was developed. Instead of computing all values and storing them, Rainbow Tables compute statistically common combinations and leave out the rest. This has proved very effective in reducing the storage requirements of a TMTO database. The only downfall is it's purpose. In order to reduce the size of the database it leaves out possibilities. This is OK, because statistically these possibilities will never be asked for, so why put them in the database? I agree in some part, but largely I disagree with the concept of reducing the database size by removing valid possibilities that could be used, not whether statistically they will be used. The statistics are sound, but the efficiency of a Rainbow Table isn't even close to that of a true TMTO database. I believe the math behind Rainbow Tables is the stepping stone that needs to be taken in order to reach the end goal of storing all possibilities.
I am not bashing Rainbow Tables, they are effective in most cases. In my research I've cracked 94.4% of a predetermined set of MD5 hashes I ran through Rainbow Tables generated using RainbowCrack. Which is 94.4% efficient, and for password cracking better than average, if not above average in all cases. Now, I've noticed that certain passwords are cracked in different time-frames also; some faster than others. Also, it matters which character set you utilize when generating the tables.
// MD5 Reverse Lookup
I've been developing an MD5 Reverse Lookup database as stated above. And with any password hashed with MD5 utilizing the characters a-z, 1-7 length is cracked 100% of the time. Not to mention in average of 30 milliseconds.
In a comparison I built the same database using the same character set and length with RainbowCrack. I found 94.4% of the 1000 hashes I tried to crack were indeed broken. That's a mere 5.6% disadvantage to my database. Which gives praise to Rainbow Tables for the time being. Keyword being "time"�, as character sets grow, RBCrack falls greatly short of perfect, but proves its effectiveness better than bruteforcing.
Side Note: I'm currently working on a project to determine the "rare"� possibilities that RBCrack doesn't compute and make a database of just those values. This would make RBCrack the tool of choice, until full-scale TMTO databases become feasible. So if RBCrack cannot break the password, then use the reverse lookup database.
To make the MD5 Reverse Lookup more effective I developed a method of inserting word lists into the database. This results in something extremely similar, if not identical, to a dictionary bruteforce. But you only need to run the word list through that database once, meaning a 1,000,000 word, word list once inserted will never have to be inserted again nor will the process have to be ran again. Those values are stored along with all the others and can be recalled in the same time frame. This makes the database more efficient.
Lets say I have a MD5 Reverse Lookup database, character set a-z, with 1-8 length. This will crack a lot of passwords, but anything with a single capital letter or number cannot be broken. This is when I insert my word list, which can contain numbers, caps, and special characters. This boost performance when cracking complex passwords, that can only be broken within a reasonable time frame through dictionary based bruteforcing. Of course, this does not remedy the problem. I still can't break passwords 100% of the time, nor do I have the storage to make a database that can, YET.