The code used here is on github. https://github.com/matsulib/bcrypt-dictionary-attack
table of contents
If you want to use the hash function bcrypt suitable for password storage from Python, use the bcrypt module (https://pypi.python.org/pypi/bcrypt/3.1.1).
** Installation method **:
** Features of bcrypt **
bcrypt is a Blowfish crypto-based hash function.
The decisive difference between bcrypt and hash functions such as the MD5 and SHA families is that the former is a fast hash, while the latter is a slow hash.
Since fast hash is fast, it is convenient to get a fixed length digest from a large file, but when used for password management, there is a problem that if the hash value is leaked, it is easily cracked by an offline attack. .. Therefore, there is a technique called stretching (repeating hashing) that intentionally slows down the process.
On the other hand, bcrypt is a slow hash originally designed to iterate over operations, and is not only slow, but also designed to make fast hardware implementation difficult.
This system hashes passwords using a version of Bruce Schneier's Blowfish block cipher with modifications designed to raise the cost of off-line password cracking and frustrate fast hardware implementation.
Basically, computational power can be parallelized cheaply and easily, but memory cannot. This is the cornerstone of bcrypt and scrypt. Obviously, they can still be broken by sheer brute force, and you could just use hardware with integrated memory units to circumvent the problem, but it's much harder and much, much more expensive to do so.
To be honest, I'm not sure if the above explanation is correct or still valid due to lack of understanding.
It seems that bcrypt was also used in the Ashley Madison registrant information leak incident that occurred in July 2015. However, it seems that it was MD5 (and it was also known that it was 26 letters of the lowercase alphabet) that was fatal, and it seems that bcrypt itself is still a painful existence for the attacker.
Instead of cracking the slow bcrypt hashes directly, which is the hot topic at the moment, we took a more efficient approach and simply attacked the md5(lc($username).”::”.lc($pass)) and md5(lc($username).”::”.lc($pass).”:”.lc($email).”:73@^bhhs&#@&^@8@*$”) tokens instead. Having cracked the token, we simply then had to case correct it against its bcrypt counterpart.
** Basic usage of bcrypt **
--bcrypt.gensalt (rounds = 12, prefix = b'2b') # Generate salt --bcrypt.hashpw (password, salt) #Hash password --bcrypt.checkpw (password, hashed_password) #verify password
Hash the password as follows:
python
>>> import bcrypt
>>> salt = bcrypt.gensalt(rounds=10, prefix=b'2a')
>>> salt
b'$2a$10$VpsqBArIfdhGzJY1YO/xyO'
>>> password = b'password'
>>> bcrypt.hashpw(password, salt)
b'$2a$10$VpsqBArIfdhGzJY1YO/xyOiOsCrQc9BZAaonPt3QDsL0HWzoaHXgG'
It has the following structure.
$(2-character version)$(Two-letter work-factor)$(22 character salt)(31-character hash)
** About the argument of bcrypt.gensalt () **
bcrypt.gensalt () will generate a different salt for each call. You can also specify two arguments when calling.
Currently, "2a" seems to be the mainstream.
Below, we will look at the calculation time when changing the work-factor value of bcrypt.gensalt ().
** Execution environment: **
** Code and result **
python
In [1]: import bcrypt
In [2]: password = b'password'
In [3]: for i in range(5, 18):
   ...:     print('rounds: {:02d}'.format(i), end=' => ')
   ...:     %timeit -r1 bcrypt.hashpw(password, bcrypt.gensalt(rounds=i))
   ...:
rounds: 05 => 100 loops, best of 1: 2.73 ms per loop
rounds: 06 => 100 loops, best of 1: 5.4 ms per loop
rounds: 07 => 100 loops, best of 1: 10.7 ms per loop
rounds: 08 => 10 loops, best of 1: 21 ms per loop
rounds: 09 => 10 loops, best of 1: 42.1 ms per loop
rounds: 10 => 10 loops, best of 1: 84.4 ms per loop
rounds: 11 => 10 loops, best of 1: 172 ms per loop
rounds: 12 => 1 loop, best of 1: 369 ms per loop
rounds: 13 => 1 loop, best of 1: 703 ms per loop
rounds: 14 => 1 loop, best of 1: 1.4 s per loop
rounds: 15 => 1 loop, best of 1: 2.71 s per loop
rounds: 16 => 1 loop, best of 1: 5.42 s per loop
rounds: 17 => 1 loop, best of 1: 10.8 s per loop
Not surprisingly, each increment of work-factor doubled the calculation time.
** Execution environment **
** SHA256 (+ salt) code **
hashing_sha256.py
import uuid
import hashlib
# Reference: http://pythoncentral.io/hashing-strings-with-python/
def hash_password(password):
    # uuid is used to generate a random number
    salt = uuid.uuid4().hex
    return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ':' + salt
    
def check_password(hashed_password, user_password):
    password, salt = hashed_password.split(':')
    return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest()
if __name__ == '__main__':
    new_pass = input('Please enter a password: ')
    hashed_password = hash_password(new_pass)
    print('The string to store in the db is: ' + hashed_password)
    old_pass = input('Now please enter the password again to check: ')
    if check_password(hashed_password, old_pass):
        print('You entered the right password')
    else:
        print('I am sorry but the password does not match')
Execution result
> python hashing_sha256.py
Please enter a password: piyo
The string to store in the db is: 25bcf883839511cdb493b3ba25bc0ad3fc809a3688076d198ef13128f5883a66:f0ac11a13a314336951392ff52c9c2d6
Now please enter the password again to check: piyo
You entered the right password
** bcrypt code ** The execution result is the same as the SHA256 code.
hashing_bcrypt.py
import bcrypt
def hash_password(password, rounds=12):
    return bcrypt.hashpw(password.encode(), bcrypt.gensalt(rounds)).decode()
def check_password(hashed_password, user_password):
    return bcrypt.checkpw(user_password.encode(), hashed_password.encode())
** Dictionary attack code **
dictionary_attack.py
import time
import hashing_sha256 as sha256
import hashing_bcrypt as bcrypt
def attack(leaked_hashed_password, hashing):
    # https://www.teamsid.com/worst-passwords-2015/
    dictionary = ['123456', 'password', '12345678', 'qwerty', '12345', 
                '123456789', 'football', '1234', '1234567', 'baseball', 
                'welcome', '1234567890', 'abc123', '111111', '1qaz2wsx', 
                'dragon', 'master', 'monkey', 'letmein', 'login', 'princess', 
                'qwertyuiop', 'solo', 'passw0rd', 'starwars']
            
    for p in dictionary:
        if hashing.check_password(leaked_hashed_password, p):
            return 'Hit! The password is{}is.'.format(p)
    else:
        return 'Off(´ ・ ω ・ `)'
passwords = ['complex.password'] * 9 + ['passw0rd']
print('sha256')
leaked_sha256 = [sha256.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_sha256):
    result = attack(v, sha256)
    print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))
print('bcrypt-5')
leaked_bcrypt = [bcrypt.hash_password(p, 5) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))
print('bcrypt-12')
leaked_bcrypt = [bcrypt.hash_password(p) for p in passwords]
st = time.time()
for i, v in enumerate(leaked_bcrypt):
    result = attack(v, bcrypt)
    #print('user{:02d}: {}'.format(i, result))
print('Total time: {:.3f} s'.format(time.time()-st))
result
sha256
user00:Off(´ ・ ω ・ `)
user01:Off(´ ・ ω ・ `)
user02:Off(´ ・ ω ・ `)
user03:Off(´ ・ ω ・ `)
user04:Off(´ ・ ω ・ `)
user05:Off(´ ・ ω ・ `)
user06:Off(´ ・ ω ・ `)
user07:Off(´ ・ ω ・ `)
user08:Off(´ ・ ω ・ `)
user09:Hit! The password is passw0rd.
Total time: 0.005 s
bcrypt-5
Total time: 0.715 s
bcrypt-12
Total time: 90.040 s
You can see that dictionary attacks on bcrypt (and brute force attacks, of course) take much longer than SHA256. The actual dictionaries and the number of users are larger, so it should be effective in harassing attackers.
** 1. If you want to use bcrypt from Python, use bcrypt module **
** 2. Choose the right work-factor **
By setting a large value for work-factor, it is possible to save time for an attacker's offline attack, but this also affects the performance during normal operation. It is a trade-off between security and convenience.
Determine the number of iterations the server will check the password within the desired time (10, 200ms, etc.) and use it.
I don’t believe that there is a “correct” work factor; it depends on how strong you want your hashes to be and how much computational power you want to reserve for the hashing process.
end.
Recommended Posts