I've written about the need for computer users to employ better passwords over the years. Since I try to keep the content light, I've tried to stay away from math. On the other hand, the math in this case is not very hard. In fact, it's much easier to digest than the math behind data encryption software like AlertBoot.
If you're a user of Gmail, you've probably run across this at some point: I only use the above to give you an example. I'm not saying that Gmail's password checker is optimal or anything (in fact, here's one critic of the Gmail password strength bar). Anyhow, how does Google know whether the password you've entered is poor, fair, or good, criticisms not withstanding? Well, it harkens back to a concept known as password entropy. Entropy, of course, is a physical property that's associated with disorder. The classic example is of a salad being tossed: as you toss it, the ingredients -- which were dumped in a bowl, sliced but in their respective forms -- takes on the form of a tossed salad, and chances are it won't magically reconstitute itself back into their original, albeit sliced, forms. When referring to password entropy, we're talking about the disorder of a password, i.e., how random the password can be. The more random it is, the more secure it is. How random, though? Over the years, different ways of calculating a password's entropy have been developed. However, it all appears to link back to Claude Shannon, the father of information theory. His formula for figuring out the entropy of a password is based on the password's length and the entropy (essentially possibilities) of each character on that password: Password entropy = L * log2(n) Where "n" is the pool size of characters and "L" is the length of the password The log function represents the "entropy per character." For example, if your password can only contain letters in small cap, your n is 26 (a through z). Using the log2 (n), function, you'll find that the entropy per character is 4.7 (with a unit of bits). Upper and lower case gives you n = 52 (5.7 bits) Upper and lower case, and numbers: n = 62 (5.95 bits) Using all keyboard characters: n = 94 (6.55 bits) Your password entropy is dependent on how many of these you use together. Obviously, the longer the password, the more entropy, and, hence, the more secure your password. In theory. Just set some kind of limit to delineate what's poor, fair, good, etc. and you've got yourself a password strength checker.
If you're a user of Gmail, you've probably run across this at some point:
I only use the above to give you an example. I'm not saying that Gmail's password checker is optimal or anything (in fact, here's one critic of the Gmail password strength bar).
Anyhow, how does Google know whether the password you've entered is poor, fair, or good, criticisms not withstanding? Well, it harkens back to a concept known as password entropy.
Entropy, of course, is a physical property that's associated with disorder. The classic example is of a salad being tossed: as you toss it, the ingredients -- which were dumped in a bowl, sliced but in their respective forms -- takes on the form of a tossed salad, and chances are it won't magically reconstitute itself back into their original, albeit sliced, forms.
When referring to password entropy, we're talking about the disorder of a password, i.e., how random the password can be. The more random it is, the more secure it is. How random, though? Over the years, different ways of calculating a password's entropy have been developed.
However, it all appears to link back to Claude Shannon, the father of information theory.
His formula for figuring out the entropy of a password is based on the password's length and the entropy (essentially possibilities) of each character on that password:
Password entropy = L * log2(n) Where "n" is the pool size of characters and "L" is the length of the password
Password entropy = L * log2(n)
Where "n" is the pool size of characters and "L" is the length of the password
The log function represents the "entropy per character." For example, if your password can only contain letters in small cap, your n is 26 (a through z). Using the log2 (n), function, you'll find that the entropy per character is 4.7 (with a unit of bits).
Your password entropy is dependent on how many of these you use together. Obviously, the longer the password, the more entropy, and, hence, the more secure your password. In theory.
Just set some kind of limit to delineate what's poor, fair, good, etc. and you've got yourself a password strength checker.
I say in theory because there are clear instances where this is not true. Consider, for example, the RockYou hack from 2010. In that hacking incident, a list of passwords stored by RockYou was published online. The big controversy at the time was that these passwords were stored in plaintext. But, every cloud has its silver-lining: it was a chance to see what types of passwords were used by real people. The top ten passwords were: 123456 12345 123456789 Password iloveyou princess rockyou 1234567 12345678 abc123 Based on our handy formula, #4 would be considered more secure than #10. The truth, is though, that #10 is probably more secure than #4 (the latter being a plain dictionary word). Of course, neither of them are actually secure. Or, take this example: let's say that you're comparing password A vs. password B: A. 111111111111111111111111111111111111111111111111111111111111111111111B. axC398zzz Under the formula, A is the stronger password due to its length. In reality, B would be considered the stronger formula. (Actually, there is some room for debate here since the point of contention would be whether a hacker would stick around to brute-forcing scenario A).
I say in theory because there are clear instances where this is not true. Consider, for example, the RockYou hack from 2010. In that hacking incident, a list of passwords stored by RockYou was published online. The big controversy at the time was that these passwords were stored in plaintext. But, every cloud has its silver-lining: it was a chance to see what types of passwords were used by real people.
The top ten passwords were:
Based on our handy formula, #4 would be considered more secure than #10. The truth, is though, that #10 is probably more secure than #4 (the latter being a plain dictionary word). Of course, neither of them are actually secure.
Or, take this example: let's say that you're comparing password A vs. password B:
A. 111111111111111111111111111111111111111111111111111111111111111111111B. axC398zzz
Under the formula, A is the stronger password due to its length. In reality, B would be considered the stronger formula. (Actually, there is some room for debate here since the point of contention would be whether a hacker would stick around to brute-forcing scenario A).
Brute forcing is when a given set of passwords are tried in sequence. If "a" doesn't work, then try "b". If that doesn't work, try "c". Then "d". Once you finish that round, you try "aa", "ab", ac", and so on. By doing it this way, you'll eventually try nonsensical passwords like "ddddddddddcacac" as well as perfectly good works like "invincible". This is but one way to trying to guess a password. Hackers know that many people use passwords based on a word. Often times, it is a word. Based on this, some brute-force using a dictionary (a list of words), where nonsensical words are stricken. "Dictionaries" can also include word-number combinations and other passwords that hackers come across, such as in the RockYou scenario. Hence the warning that users change their passwords immediately in such events.
Brute forcing is when a given set of passwords are tried in sequence. If "a" doesn't work, then try "b". If that doesn't work, try "c". Then "d". Once you finish that round, you try "aa", "ab", ac", and so on. By doing it this way, you'll eventually try nonsensical passwords like "ddddddddddcacac" as well as perfectly good works like "invincible". This is but one way to trying to guess a password.
Hackers know that many people use passwords based on a word. Often times, it is a word. Based on this, some brute-force using a dictionary (a list of words), where nonsensical words are stricken.
"Dictionaries" can also include word-number combinations and other passwords that hackers come across, such as in the RockYou scenario. Hence the warning that users change their passwords immediately in such events.
As I noted, any formulas for determining the strength of a password started with Shannon's insight into entropy. This formula is not used as-is. For example, if you go to sign up for a new Gmail account and type in my passwords A and B from above, you'll find that axC398zzz is "strong" while the repeating 1 (one) is labeled as "fair." Other sites might think of it differently.
As I noted, any formulas for determining the strength of a password started with Shannon's insight into entropy. This formula is not used as-is.
For example, if you go to sign up for a new Gmail account and type in my passwords A and B from above, you'll find that axC398zzz is "strong" while the repeating 1 (one) is labeled as "fair." Other sites might think of it differently.
Related Articles and Sites:http://en.wikipedia.org/wiki/Password_strength#Entropy_as_a_measure_of_password_strengthhttp://csrc.nist.gov/publications/nistpubs/800-63/SP800-63V1_0_2.pdf