В Python хеш-функция принимает вводную последовательность с переменной длиной в байтах и конвертирует ее в последовательность с фиксированной длиной. Данная функция односторонняя.
Содержание статьи
- Что такое хеш-функция Python
- Популярные хеш-функции Python
- Примеры кода с хеш-функциями в Python
- MD5 — пример хеширования
- SHA1 — пример хеширования
- Хеширование на SHA224
- Хеширование на SHA256
- Пример хеширования на SHA384
- Пример хеширования на SHA512
- Использование алгоритмов OpenSSL
- Реальный пример хеширования паролей Python
Это значит, что если f
является функцией хеширования, f(x)
вычисляется довольно быстро и без лишних сложностей, однако на повторное получение х
потребуется очень много времени. Значение, что возвращается хеш-функцией, обычно называют хешем, дайджестом сообщения, значением хеша или контрольной суммой. В подобающем большинстве случаев для предоставленного ввода хеш-функция создает уникальный вывод. Однако, в зависимости от алгоритма, есть вероятность возникновения конфликта, вызванного особенностями математических теорий, что лежат в основе этих функций.
Хеш-функции используются в криптографических алгоритмах, электронных подписях, кодах аутентификации сообщений, обнаружении манипуляций, сканировании отпечатков пальцев, контрольных суммах (проверка целостности сообщений), хеш-таблицах, хранении паролей и многом другом.
Как Python-разработчику, вам могут понадобиться эти функции для проверки дубликатов данных и файлов, проверки целостности данных при передаче информации по сети, безопасного хранения паролей в базах данных или, возможно, для какой-либо работы, связанной с криптографией.
Есть вопросы по Python?
На нашем форуме вы можете задать любой вопрос и получить ответ от всего нашего сообщества!
Telegram Чат & Канал
Вступите в наш дружный чат по Python и начните общение с единомышленниками! Станьте частью большого сообщества!
Паблик VK
Одно из самых больших сообществ по Python в социальной сети ВК. Видео уроки и книги для вас!
Обратите внимание, что хеш-функции не являются криптографическим протоколом, они не шифруют и не дешифруют информацию, но являются фундаментальной частью многих криптографических протоколов и инструментов.
Популярные хеш-функции Python
Некоторые часто используемые хеш-функции:
- MD5: Алгоритм производит хеш со значением в 128 битов. Широко используется для проверки целостности данных. Не подходит для использования в иных областях по причине уязвимости в безопасности MD5.
- SHA: Группа алгоритмов, что были разработаны NSA Соединенных Штатов. Они являются частью Федерального стандарта обработки информации США. Эти алгоритмы широко используются в нескольких криптографических приложениях. Длина сообщения варьируется от 160 до 512 бит.
Модуль hashlib
, включенный в стандартную библиотеку Python, представляет собой модуль, содержащий интерфейс для самых популярных алгоритмов хеширования. hashlib
реализует некоторые алгоритмы, однако, если у вас установлен OpenSSL, hashlib
также может использовать эти алгоритмы.
Данный код предназначен для работы в Python 3.5 и выше. При желании запустить эти примеры в Python 2.x, просто удалите вызовы attributems_available
и algorithms_guaranteed
.
Сначала импортируется модуль hashlib
:
Теперь для списка доступных алгоритмов используются algorithms_available
и algorithms_guaranteed
.
print(hashlib.algorithms_available) print(hashlib.algorithms_guaranteed) |
Метод algorithms_available
создает список всех алгоритмов, доступных в системе, включая те, что доступны через OpenSSl. В данном случае в списке можно заметить дубликаты названий. algorithms_guaranteed
перечисляет только алгоритмы модуля. Всегда присутствуют md5
, sha1
, sha224
, sha256
, sha384
, sha512
.
Примеры кода с хеш-функциями в Python
Код ниже принимает строку "Hello World"
и выводит дайджест HEX данной строки. hexdigest
возвращает строку HEX, что представляет хеш, и в случае, если вам нужна последовательность байтов, нужно использовать дайджест.
MD5 — пример хеширования
import hashlib hash_object = hashlib.md5(b‘Hello World’) print(hash_object.hexdigest()) |
Обратите внимание, что "b"
предшествует литералу строки, происходит конвертация строки в байты, оттого, что функция хеширования принимает только последовательность байтов в качестве параметра. В предыдущей версии библиотеки принимался литерал строки.
Итак, если вам нужно принять какой-то ввод с консоли и хешировать его, не забудьте закодировать строку в последовательности байтов:
import hashlib mystring = input(‘Enter String to hash: ‘) # Предположительно по умолчанию UTF-8 hash_object = hashlib.md5(mystring.encode()) print(hash_object.hexdigest()) |
Предположим, нам нужно хешировать строку "Hello Word"
с помощью функции MD5
. Тогда результатом будет 0a4d55a8d778e5022fab701977c5d840bbc486d0
.
SHA1 — пример хеширования
import hashlib hash_object = hashlib.sha1(b‘Hello World’) hex_dig = hash_object.hexdigest() print(hex_dig) |
Хеширование на SHA224
import hashlib hash_object = hashlib.sha224(b‘Hello World’) hex_dig = hash_object.hexdigest() print(hex_dig) |
Хеширование на SHA256
import hashlib hash_object = hashlib.sha256(b‘Hello World’) hex_dig = hash_object.hexdigest() print(hex_dig) |
Пример хеширования на SHA384
import hashlib hash_object = hashlib.sha384(b‘Hello World’) hex_dig = hash_object.hexdigest() print(hex_dig) |
Пример хеширования на SHA512
import hashlib hash_object = hashlib.sha512(b‘Hello World’) hex_dig = hash_object.hexdigest() print(hex_dig) |
Использование алгоритмов OpenSSL
Предположим, вам нужен алгоритм, предоставленный OpenSSL. Используя algorithms_available
, можно найти название необходимого алгоритма.
В данном случае, на моем компьютере доступен «DSA». Вы можете использовать методы new
и update
:
import hashlib hash_object = hashlib.new(‘DSA’) hash_object.update(b‘Hello World’) print(hash_object.hexdigest()) |
Реальный пример хеширования паролей Python
В следующем примере пароли будут хешироваться для последующего сохранения в базе данных. Здесь мы будем использовать salt
. salt
является случайной последовательностью, добавленной к строке пароля перед использованием хеш-функции. salt
используется для предотвращения перебора по словарю (dictionary attack) и атак радужной таблицы (rainbow tables attacks).
Тем не менее, если вы занимаетесь реально функционирующим приложением и работаете над паролями пользователей, следите за последними зафиксированными уязвимостями в данной области. Для более подробного ознакомления с темой защиты паролей можете просмотреть следующую статью.
Код для Python 3.x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 |
import uuid import hashlib def hash_password(password): # uuid используется для генерации случайного числа salt = uuid.uuid4().hex return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ‘:’ + salt def check_password(hashed_password, user_password): password, salt = hashed_password.split(‘:’) return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest() new_pass = input(‘Введите пароль: ‘) hashed_password = hash_password(new_pass) print(‘Строка для хранения в базе данных: ‘ + hashed_password) old_pass = input(‘Введите пароль еще раз для проверки: ‘) if check_password(hashed_password, old_pass): print(‘Вы ввели правильный пароль’) else: print(‘Извините, но пароли не совпадают’) |
Код для Python 2.x
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 |
import uuid import hashlib def hash_password(password): # uuid используется для генерации случайного числа salt = uuid.uuid4().hex return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ‘:’ + salt def check_password(hashed_password, user_password): password, salt = hashed_password.split(‘:’) return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest() new_pass = raw_input(‘Введите пароль: ‘) hashed_password = hash_password(new_pass) print(‘Строка для сохранения в базе данных: ‘ + hashed_password) old_pass = raw_input(‘Введите пароль еще раз для проверки: ‘) if check_password(hashed_password, old_pass): print(‘Вы ввели правильный пароль’) else: print(‘Извините, но пароли не совпадают’) |
Являюсь администратором нескольких порталов по обучению языков программирования Python, Golang и Kotlin. В составе небольшой команды единомышленников, мы занимаемся популяризацией языков программирования на русскоязычную аудиторию. Большая часть статей была адаптирована нами на русский язык и распространяется бесплатно.
E-mail: vasile.buldumac@ati.utm.md
Образование
Universitatea Tehnică a Moldovei (utm.md)
- 2014 — 2018 Технический Университет Молдовы, ИТ-Инженер. Тема дипломной работы «Автоматизация покупки и продажи криптовалюты используя технический анализ»
- 2018 — 2020 Технический Университет Молдовы, Магистр, Магистерская диссертация «Идентификация человека в киберпространстве по фотографии лица»
Source code: Lib/hashlib.py
This module implements a common interface to many different secure hash and
message digest algorithms. Included are the FIPS secure hash algorithms SHA1,
SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s MD5
algorithm (defined in Internet RFC 1321). The terms “secure hash” and
“message digest” are interchangeable. Older algorithms were called message
digests. The modern term is secure hash.
Note
If you want the adler32 or crc32 hash functions, they are available in
the zlib
module.
Warning
Some algorithms have known hash collision weaknesses, refer to the “See
also” section at the end.
15.1.1. Hash algorithms¶
There is one constructor method named for each type of hash. All return
a hash object with the same simple interface. For example: use sha256()
to
create a SHA-256 hash object. You can now feed this object with bytes-like
objects (normally bytes
) using the update()
method.
At any point you can ask it for the digest of the
concatenation of the data fed to it so far using the digest()
or
hexdigest()
methods.
Note
For better multithreading performance, the Python GIL is released for
data larger than 2047 bytes at object creation or on update.
Note
Feeding string objects into update()
is not supported, as hashes work
on bytes, not on characters.
Constructors for hash algorithms that are always present in this module are
sha1()
, sha224()
, sha256()
, sha384()
,
sha512()
, blake2b()
, and blake2s()
.
md5()
is normally available as well, though it
may be missing if you are using a rare “FIPS compliant” build of Python.
Additional algorithms may also be available depending upon the OpenSSL
library that Python uses on your platform. On most platforms the
sha3_224()
, sha3_256()
, sha3_384()
, sha3_512()
,
shake_128()
, shake_256()
are also available.
New in version 3.6: SHA3 (Keccak) and SHAKE constructors sha3_224()
, sha3_256()
,
sha3_384()
, sha3_512()
, shake_128()
, shake_256()
.
New in version 3.6: blake2b()
and blake2s()
were added.
For example, to obtain the digest of the byte string b'Nobody inspects the
:
spammish repetition'
>>> import hashlib >>> m = hashlib.sha256() >>> m.update(b"Nobody inspects") >>> m.update(b" the spammish repetition") >>> m.digest() b'x03x1exdd}Aex15x93xc5xfe\x00oxa5u+7xfdxdfxf7xbcNx84:xa6xafx0cx95x0fKx94x06' >>> m.digest_size 32 >>> m.block_size 64
More condensed:
>>> hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest() 'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
-
hashlib.
new
(name[, data])¶ -
Is a generic constructor that takes the string name of the desired
algorithm as its first parameter. It also exists to allow access to the
above listed hashes as well as any other algorithms that your OpenSSL
library may offer. The named constructors are much faster thannew()
and should be preferred.
Using new()
with an algorithm provided by OpenSSL:
>>> h = hashlib.new('ripemd160') >>> h.update(b"Nobody inspects the spammish repetition") >>> h.hexdigest() 'cc4a5ce1b3df48aec5d22d1f16b894a0b894eccc'
Hashlib provides the following constant attributes:
-
hashlib.
algorithms_guaranteed
¶ -
A set containing the names of the hash algorithms guaranteed to be supported
by this module on all platforms. Note that ‘md5’ is in this list despite
some upstream vendors offering an odd “FIPS compliant” Python build that
excludes it.New in version 3.2.
-
hashlib.
algorithms_available
¶ -
A set containing the names of the hash algorithms that are available in the
running Python interpreter. These names will be recognized when passed to
new()
.algorithms_guaranteed
will always be a subset. The
same algorithm may appear multiple times in this set under different names
(thanks to OpenSSL).New in version 3.2.
The following values are provided as constant attributes of the hash objects
returned by the constructors:
-
hash.
digest_size
¶ -
The size of the resulting hash in bytes.
-
hash.
block_size
¶ -
The internal block size of the hash algorithm in bytes.
A hash object has the following attributes:
-
hash.
name
¶ -
The canonical name of this hash, always lowercase and always suitable as a
parameter tonew()
to create another hash of this type.Changed in version 3.4: The name attribute has been present in CPython since its inception, but
until Python 3.4 was not formally specified, so may not exist on some
platforms.
A hash object has the following methods:
-
hash.
update
(arg)¶ -
Update the hash object with the object arg, which must be interpretable as
a buffer of bytes. Repeated calls are equivalent to a single call with the
concatenation of all the arguments:m.update(a); m.update(b)
is
equivalent tom.update(a+b)
.Changed in version 3.1: The Python GIL is released to allow other threads to run while hash
updates on data larger than 2047 bytes is taking place when using hash
algorithms supplied by OpenSSL.
-
hash.
digest
()¶ -
Return the digest of the data passed to the
update()
method so far.
This is a bytes object of sizedigest_size
which may contain bytes in
the whole range from 0 to 255.
-
hash.
hexdigest
()¶ -
Like
digest()
except the digest is returned as a string object of
double length, containing only hexadecimal digits. This may be used to
exchange the value safely in email or other non-binary environments.
-
hash.
copy
()¶ -
Return a copy (“clone”) of the hash object. This can be used to efficiently
compute the digests of data sharing a common initial substring.
15.1.2. SHAKE variable length digests¶
The shake_128()
and shake_256()
algorithms provide variable
length digests with length_in_bits//2 up to 128 or 256 bits of security.
As such, their digest methods require a length. Maximum length is not limited
by the SHAKE algorithm.
-
shake.
digest
(length)¶ -
Return the digest of the data passed to the
update()
method so far.
This is a bytes object of sizelength
which may contain bytes in
the whole range from 0 to 255.
-
shake.
hexdigest
(length)¶ -
Like
digest()
except the digest is returned as a string object of
double length, containing only hexadecimal digits. This may be used to
exchange the value safely in email or other non-binary environments.
15.1.3. Key derivation¶
Key derivation and key stretching algorithms are designed for secure password
hashing. Naive algorithms such as sha1(password)
are not resistant against
brute-force attacks. A good password hashing function must be tunable, slow, and
include a salt.
-
hashlib.
pbkdf2_hmac
(hash_name, password, salt, iterations, dklen=None)¶ -
The function provides PKCS#5 password-based key derivation function 2. It
uses HMAC as pseudorandom function.The string hash_name is the desired name of the hash digest algorithm for
HMAC, e.g. ‘sha1’ or ‘sha256’. password and salt are interpreted as
buffers of bytes. Applications and libraries should limit password to
a sensible length (e.g. 1024). salt should be about 16 or more bytes from
a proper source, e.g.os.urandom()
.The number of iterations should be chosen based on the hash algorithm and
computing power. As of 2013, at least 100,000 iterations of SHA-256 are
suggested.dklen is the length of the derived key. If dklen is
None
then the
digest size of the hash algorithm hash_name is used, e.g. 64 for SHA-512.>>> import hashlib, binascii >>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000) >>> binascii.hexlify(dk) b'0394a2ede332c9a13eb82e9b24631604c31df978b4e2f0fbd2c549944f9d79a5'
New in version 3.4.
Note
A fast implementation of pbkdf2_hmac is available with OpenSSL. The
Python implementation uses an inline version ofhmac
. It is about
three times slower and doesn’t release the GIL.
-
hashlib.
scrypt
(password, *, salt, n, r, p, maxmem=0, dklen=64)¶ -
The function provides scrypt password-based key derivation function as
defined in RFC 7914.password and salt must be bytes-like objects. Applications and
libraries should limit password to a sensible length (e.g. 1024). salt
should be about 16 or more bytes from a proper source, e.g.os.urandom()
.n is the CPU/Memory cost factor, r the block size, p parallelization
factor and maxmem limits memory (OpenSSL 1.1.0 defaults to 32 MB).
dklen is the length of the derived key.Availability: OpenSSL 1.1+
New in version 3.6.
15.1.4. BLAKE2¶
BLAKE2 is a cryptographic hash function defined in RFC-7693 that comes in two
flavors:
- BLAKE2b, optimized for 64-bit platforms and produces digests of any size
between 1 and 64 bytes, - BLAKE2s, optimized for 8- to 32-bit platforms and produces digests of any
size between 1 and 32 bytes.
BLAKE2 supports keyed mode (a faster and simpler replacement for HMAC),
salted hashing, personalization, and tree hashing.
Hash objects from this module follow the API of standard library’s
hashlib
objects.
15.1.4.1. Creating hash objects¶
New hash objects are created by calling constructor functions:
-
hashlib.
blake2b
(data=b», digest_size=64, key=b», salt=b», person=b», fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False)¶
-
hashlib.
blake2s
(data=b», digest_size=32, key=b», salt=b», person=b», fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False)¶
These functions return the corresponding hash objects for calculating
BLAKE2b or BLAKE2s. They optionally take these general parameters:
- data: initial chunk of data to hash, which must be interpretable as buffer
of bytes. - digest_size: size of output digest in bytes.
- key: key for keyed hashing (up to 64 bytes for BLAKE2b, up to 32 bytes for
BLAKE2s). - salt: salt for randomized hashing (up to 16 bytes for BLAKE2b, up to 8
bytes for BLAKE2s). - person: personalization string (up to 16 bytes for BLAKE2b, up to 8 bytes
for BLAKE2s).
The following table shows limits for general parameters (in bytes):
Hash | digest_size | len(key) | len(salt) | len(person) |
---|---|---|---|---|
BLAKE2b | 64 | 64 | 16 | 16 |
BLAKE2s | 32 | 32 | 8 | 8 |
Note
BLAKE2 specification defines constant lengths for salt and personalization
parameters, however, for convenience, this implementation accepts byte
strings of any size up to the specified length. If the length of the
parameter is less than specified, it is padded with zeros, thus, for
example, b'salt'
and b'saltx00'
is the same value. (This is not
the case for key.)
These sizes are available as module constants described below.
Constructor functions also accept the following tree hashing parameters:
- fanout: fanout (0 to 255, 0 if unlimited, 1 in sequential mode).
- depth: maximal depth of tree (1 to 255, 255 if unlimited, 1 in
sequential mode). - leaf_size: maximal byte length of leaf (0 to 2**32-1, 0 if unlimited or in
sequential mode). - node_offset: node offset (0 to 2**64-1 for BLAKE2b, 0 to 2**48-1 for
BLAKE2s, 0 for the first, leftmost, leaf, or in sequential mode). - node_depth: node depth (0 to 255, 0 for leaves, or in sequential mode).
- inner_size: inner digest size (0 to 64 for BLAKE2b, 0 to 32 for
BLAKE2s, 0 in sequential mode). - last_node: boolean indicating whether the processed node is the last
one (False for sequential mode).
See section 2.10 in BLAKE2 specification for comprehensive review of tree
hashing.
15.1.4.2. Constants¶
-
blake2b.
SALT_SIZE
¶
-
blake2s.
SALT_SIZE
¶
Salt length (maximum length accepted by constructors).
-
blake2b.
PERSON_SIZE
¶
-
blake2s.
PERSON_SIZE
¶
Personalization string length (maximum length accepted by constructors).
-
blake2b.
MAX_KEY_SIZE
¶
-
blake2s.
MAX_KEY_SIZE
¶
Maximum key size.
-
blake2b.
MAX_DIGEST_SIZE
¶
-
blake2s.
MAX_DIGEST_SIZE
¶
Maximum digest size that the hash function can output.
15.1.4.3. Examples¶
15.1.4.3.1. Simple hashing¶
To calculate hash of some data, you should first construct a hash object by
calling the appropriate constructor function (blake2b()
or
blake2s()
), then update it with the data by calling update()
on the
object, and, finally, get the digest out of the object by calling
digest()
(or hexdigest()
for hex-encoded string).
>>> from hashlib import blake2b >>> h = blake2b() >>> h.update(b'Hello world') >>> h.hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'
As a shortcut, you can pass the first chunk of data to update directly to the
constructor as the first argument (or as data keyword argument):
>>> from hashlib import blake2b >>> blake2b(b'Hello world').hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'
You can call hash.update()
as many times as you need to iteratively
update the hash:
>>> from hashlib import blake2b >>> items = [b'Hello', b' ', b'world'] >>> h = blake2b() >>> for item in items: ... h.update(item) >>> h.hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'
15.1.4.3.2. Using different digest sizes¶
BLAKE2 has configurable size of digests up to 64 bytes for BLAKE2b and up to 32
bytes for BLAKE2s. For example, to replace SHA-1 with BLAKE2b without changing
the size of output, we can tell BLAKE2b to produce 20-byte digests:
>>> from hashlib import blake2b >>> h = blake2b(digest_size=20) >>> h.update(b'Replacing SHA1 with the more secure function') >>> h.hexdigest() 'd24f26cf8de66472d58d4e1b1774b4c9158b1f4c' >>> h.digest_size 20 >>> len(h.digest()) 20
Hash objects with different digest sizes have completely different outputs
(shorter hashes are not prefixes of longer hashes); BLAKE2b and BLAKE2s
produce different outputs even if the output length is the same:
>>> from hashlib import blake2b, blake2s >>> blake2b(digest_size=10).hexdigest() '6fa1d8fcfd719046d762' >>> blake2b(digest_size=11).hexdigest() 'eb6ec15daf9546254f0809' >>> blake2s(digest_size=10).hexdigest() '1bf21a98c78a1c376ae9' >>> blake2s(digest_size=11).hexdigest() '567004bf96e4a25773ebf4'
15.1.4.3.3. Keyed hashing¶
Keyed hashing can be used for authentication as a faster and simpler
replacement for Hash-based message authentication code (HMAC).
BLAKE2 can be securely used in prefix-MAC mode thanks to the
indifferentiability property inherited from BLAKE.
This example shows how to get a (hex-encoded) 128-bit authentication code for
message b'message data'
with key b'pseudorandom key'
:
>>> from hashlib import blake2b >>> h = blake2b(key=b'pseudorandom key', digest_size=16) >>> h.update(b'message data') >>> h.hexdigest() '3d363ff7401e02026f4a4687d4863ced'
As a practical example, a web application can symmetrically sign cookies sent
to users and later verify them to make sure they weren’t tampered with:
>>> from hashlib import blake2b >>> from hmac import compare_digest >>> >>> SECRET_KEY = b'pseudorandomly generated server secret key' >>> AUTH_SIZE = 16 >>> >>> def sign(cookie): ... h = blake2b(digest_size=AUTH_SIZE, key=SECRET_KEY) ... h.update(cookie) ... return h.hexdigest().encode('utf-8') >>> >>> cookie = b'user:vatrogasac' >>> sig = sign(cookie) >>> print("{0},{1}".format(cookie.decode('utf-8'), sig)) user:vatrogasac,349cf904533767ed2d755279a8df84d0 >>> compare_digest(cookie, sig) True >>> compare_digest(b'user:policajac', sig) False >>> compare_digest(cookie, b'0102030405060708090a0b0c0d0e0f00') False
Even though there’s a native keyed hashing mode, BLAKE2 can, of course, be used
in HMAC construction with hmac
module:
>>> import hmac, hashlib >>> m = hmac.new(b'secret key', digestmod=hashlib.blake2s) >>> m.update(b'message') >>> m.hexdigest() 'e3c8102868d28b5ff85fc35dda07329970d1a01e273c37481326fe0c861c8142'
15.1.4.3.4. Randomized hashing¶
By setting salt parameter users can introduce randomization to the hash
function. Randomized hashing is useful for protecting against collision attacks
on the hash function used in digital signatures.
Randomized hashing is designed for situations where one party, the message
preparer, generates all or part of a message to be signed by a second
party, the message signer. If the message preparer is able to find
cryptographic hash function collisions (i.e., two messages producing the
same hash value), then she might prepare meaningful versions of the message
that would produce the same hash value and digital signature, but with
different results (e.g., transferring $1,000,000 to an account, rather than
$10). Cryptographic hash functions have been designed with collision
resistance as a major goal, but the current concentration on attacking
cryptographic hash functions may result in a given cryptographic hash
function providing less collision resistance than expected. Randomized
hashing offers the signer additional protection by reducing the likelihood
that a preparer can generate two or more messages that ultimately yield the
same hash value during the digital signature generation process — even if
it is practical to find collisions for the hash function. However, the use
of randomized hashing may reduce the amount of security provided by a
digital signature when all portions of the message are prepared
by the signer.(NIST SP-800-106 “Randomized Hashing for Digital Signatures”)
In BLAKE2 the salt is processed as a one-time input to the hash function during
initialization, rather than as an input to each compression function.
Warning
Salted hashing (or just hashing) with BLAKE2 or any other general-purpose
cryptographic hash function, such as SHA-256, is not suitable for hashing
passwords. See BLAKE2 FAQ for more
information.
>>> import os >>> from hashlib import blake2b >>> msg = b'some message' >>> # Calculate the first hash with a random salt. >>> salt1 = os.urandom(blake2b.SALT_SIZE) >>> h1 = blake2b(salt=salt1) >>> h1.update(msg) >>> # Calculate the second hash with a different random salt. >>> salt2 = os.urandom(blake2b.SALT_SIZE) >>> h2 = blake2b(salt=salt2) >>> h2.update(msg) >>> # The digests are different. >>> h1.digest() != h2.digest() True
15.1.4.3.5. Personalization¶
Sometimes it is useful to force hash function to produce different digests for
the same input for different purposes. Quoting the authors of the Skein hash
function:
We recommend that all application designers seriously consider doing this;
we have seen many protocols where a hash that is computed in one part of
the protocol can be used in an entirely different part because two hash
computations were done on similar or related data, and the attacker can
force the application to make the hash inputs the same. Personalizing each
hash function used in the protocol summarily stops this type of attack.(The Skein Hash Function Family,
p. 21)
BLAKE2 can be personalized by passing bytes to the person argument:
>>> from hashlib import blake2b >>> FILES_HASH_PERSON = b'MyApp Files Hash' >>> BLOCK_HASH_PERSON = b'MyApp Block Hash' >>> h = blake2b(digest_size=32, person=FILES_HASH_PERSON) >>> h.update(b'the same content') >>> h.hexdigest() '20d9cd024d4fb086aae819a1432dd2466de12947831b75c5a30cf2676095d3b4' >>> h = blake2b(digest_size=32, person=BLOCK_HASH_PERSON) >>> h.update(b'the same content') >>> h.hexdigest() 'cf68fb5761b9c44e7878bfb2c4c9aea52264a80b75005e65619778de59f383a3'
Personalization together with the keyed mode can also be used to derive different
keys from a single one.
>>> from hashlib import blake2s >>> from base64 import b64decode, b64encode >>> orig_key = b64decode(b'Rm5EPJai72qcK3RGBpW3vPNfZy5OZothY+kHY6h21KM=') >>> enc_key = blake2s(key=orig_key, person=b'kEncrypt').digest() >>> mac_key = blake2s(key=orig_key, person=b'kMAC').digest() >>> print(b64encode(enc_key).decode('utf-8')) rbPb15S/Z9t+agffno5wuhB77VbRi6F9Iv2qIxU7WHw= >>> print(b64encode(mac_key).decode('utf-8')) G9GtHFE1YluXY1zWPlYk1e/nWfu0WSEb0KRcjhDeP/o=
15.1.4.3.6. Tree mode¶
Here’s an example of hashing a minimal tree with two leaf nodes:
This example uses 64-byte internal digests, and returns the 32-byte final
digest:
>>> from hashlib import blake2b >>> >>> FANOUT = 2 >>> DEPTH = 2 >>> LEAF_SIZE = 4096 >>> INNER_SIZE = 64 >>> >>> buf = bytearray(6000) >>> >>> # Left leaf ... h00 = blake2b(buf[0:LEAF_SIZE], fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=0, node_depth=0, last_node=False) >>> # Right leaf ... h01 = blake2b(buf[LEAF_SIZE:], fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=1, node_depth=0, last_node=True) >>> # Root node ... h10 = blake2b(digest_size=32, fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=0, node_depth=1, last_node=True) >>> h10.update(h00.digest()) >>> h10.update(h01.digest()) >>> h10.hexdigest() '3ad2a9b37c6070e374c7a8c508fe20ca86b6ed54e286e93a0318e95e881db5aa'
15.1.4.4. Credits¶
BLAKE2 was designed by Jean-Philippe Aumasson, Samuel Neves, Zooko
Wilcox-O’Hearn, and Christian Winnerlein based on SHA-3 finalist BLAKE
created by Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and
Raphael C.-W. Phan.
It uses core algorithm from ChaCha cipher designed by Daniel J. Bernstein.
The stdlib implementation is based on pyblake2 module. It was written by
Dmitry Chestnykh based on C implementation written by Samuel Neves. The
documentation was copied from pyblake2 and written by Dmitry Chestnykh.
The C code was partly rewritten for Python by Christian Heimes.
The following public domain dedication applies for both C hash function
implementation, extension code, and this documentation:
To the extent possible under law, the author(s) have dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.You should have received a copy of the CC0 Public Domain Dedication along
with this software. If not, see
http://creativecommons.org/publicdomain/zero/1.0/.
The following people have helped with development or contributed their changes
to the project and the public domain according to the Creative Commons Public
Domain Dedication 1.0 Universal:
- Alexandr Sokolovskiy
A Cryptographic hash function is a function that takes in input data and produces a statistically unique output, which is unique to that particular set of data. The hash is a fixed-length byte stream used to ensure the integrity of the data. In this article, you will learn to use the hashlib module to obtain the hash of a file in Python. The hashlib module is preinstalled in most Python distributions. If it doesn’t exist in your environment, then you can install the module by using pip command:
pip install hashlib
What is the Hashlib Module?
The hashlib module implements a common interface for many secure cryptographic hash and message digest algorithms. There is one constructor method named for each type of hash. All return a hash object with the same simple interface. Constructors for hash algorithms are always present in this module.
hashlib.algorithms_guaranteed
A set containing the names of the hash algorithms is guaranteed to be supported by this module on all platforms.
>>> print(hashlib.algorithms_guaranteed)
{‘sha3_512’, ‘sha1’, ‘sha224’, ‘shake_256’, ‘sha3_384’, ‘sha512’, ‘sha384’, ‘blake2s’, ‘md5’, ‘sha3_224’, ‘sha256’, ‘blake2b’, ‘sha3_256’, ‘shake_128’}
hashlib.algorithms_available
A set containing the names of the hash algorithms available in the running Python interpreter. The same algorithm may appear multiple times in this set under different names (due to OpenSSL).
>>> print(hashlib.algorithms_available)
{‘sha384’, ‘sha3_224’, ‘whirlpool’, ‘ripemd160’, ‘blake2s’, ‘md5-sha1’, ‘sm3’, ‘sha256’, ‘shake_256’, ‘sha1’, ‘sha3_384’,
‘sha512’, ‘blake2b’, ‘sha512_256’, ‘sha3_256’, ‘shake_128’, ‘sha3_512’, ‘sha224’, ‘md5’, ‘mdc2’, ‘sha512_224’, ‘md4’}
Explanation of SHA-256 Algorithm and its Features
This article will use the FIPS secure hash algorithm SHA-256 to obtain the file hash. Other secure hash algorithms include:
- MD5 (Message Digest 5)
- SHA-512 (Secure Hashing Algorithm 512 bits)
- RC4 (Rivest Cipher 4)
The reason for the usage of SHA-256 is it is one of the most renowned and secure hashing algorithms currently used while offering less time required to compute a hash. The algorithm belongs to the SHA-2 Family, which is succeeded by the SHA-3 family based on sponge construction structure.
Obtaining a Cryptographic Hash of a File
In the following example, a path to a file would be provided as a command line argument. Then the SHA 256 (Secured Hashing Algorithm-256bits) hash would be obtained for the file and displayed.
Hash of the following file:
test.txt
Firstly the hashlib and sys modules are imported. The sys module is imported to allow command-line arguments in the code. Then the function that would be used to obtain the SHA-256 hash of the file is defined. In the function, a Buffer size is defined (65536 in our case). This buffer size is the number of bytes read from the file (at a time) and fed into the SHA-256 hash function. This allows larger files to be operated without producing memory constraints. At the end of the function, the hexdigest function is called on the hash to produce its hexadecimal representation. The function call to the above function (hashfile) contains the first argument (sys.argv[1]) that is provided while calling the function from the command line (the 0th argument is the Python file name). In the end, the hash of the file is displayed.
Python3
import
sys
import
hashlib
def
hashfile(
file
):
BUF_SIZE
=
65536
sha256
=
hashlib.sha256()
with
open
(
"test.txt"
,
'rb'
) as f:
while
True
:
data
=
f.read(BUF_SIZE)
if
not
data:
break
sha256.update(data)
return
sha256.hexdigest()
file_hash
=
hashfile(sys.argv[
1
])
print
(f
"Hash:{file_hash}"
)
Output:
Obtaining a Cryptographic Hash of a String
The above method could also be used to obtain the hash of a finite-length string. For that, the string needs to be converted to a byte stream before it is sent as an argument. For short strings, the process could be accomplished in a single call. The following example demonstrates this in practice:
Firstly a byte literal is initialized and is stored to a variable (due to the b prefix of the string). Then the sha256 function is initialized, and the byte literal is passed as an argument to the update function. This updates the sha256 algorithm with the data. After which, the hash digest is computed, and its hexadecimal equivalent is requested using the hexdigest function. At the end, this hash value is displayed.
Python3
import
hashlib
string
=
b
"My name is apple and I am a vegetable?"
sha256
=
hashlib.sha256()
sha256.update(string)
string_hash
=
sha256.hexdigest()
print
(f
"Hash:{string_hash}"
)
Output:
Hash:252f8ca07a6fcaae293e5097151c803a7f16504e48c4eb60f651c11341e83217
Last Updated :
06 Feb, 2023
Like Article
Save Article
:mod:`hashlib` — Secure hashes and message digests
.. module:: hashlib :synopsis: Secure hash and message digest algorithms.
.. moduleauthor:: Gregory P. Smith <greg@krypto.org>
.. sectionauthor:: Gregory P. Smith <greg@krypto.org>
Source code: :source:`Lib/hashlib.py`
.. index:: single: message digest, MD5 single: secure hash algorithm, SHA1, SHA224, SHA256, SHA384, SHA512
.. testsetup:: import hashlib
This module implements a common interface to many different secure hash and
message digest algorithms. Included are the FIPS secure hash algorithms SHA1,
SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s MD5
algorithm (defined in internet RFC 1321). The terms «secure hash» and
«message digest» are interchangeable. Older algorithms were called message
digests. The modern term is secure hash.
Note
If you want the adler32 or crc32 hash functions, they are available in
the :mod:`zlib` module.
Warning
Some algorithms have known hash collision weaknesses, refer to the «See
also» section at the end.
Hash algorithms
There is one constructor method named for each type of :dfn:`hash`. All return
a hash object with the same simple interface. For example: use :func:`sha256` to
create a SHA-256 hash object. You can now feed this object with :term:`bytes-like
objects <bytes-like object>` (normally :class:`bytes`) using the :meth:`update` method.
At any point you can ask it for the :dfn:`digest` of the
concatenation of the data fed to it so far using the :meth:`digest` or
:meth:`hexdigest` methods.
Note
For better multithreading performance, the Python :term:`GIL` is released for
data larger than 2047 bytes at object creation or on update.
Note
Feeding string objects into :meth:`update` is not supported, as hashes work
on bytes, not on characters.
.. index:: single: OpenSSL; (use in module hashlib)
Constructors for hash algorithms that are always present in this module are
:func:`sha1`, :func:`sha224`, :func:`sha256`, :func:`sha384`,
:func:`sha512`, :func:`blake2b`, and :func:`blake2s`.
:func:`md5` is normally available as well, though it
may be missing or blocked if you are using a rare «FIPS compliant» build of Python.
Additional algorithms may also be available depending upon the OpenSSL
library that Python uses on your platform. On most platforms the
:func:`sha3_224`, :func:`sha3_256`, :func:`sha3_384`, :func:`sha3_512`,
:func:`shake_128`, :func:`shake_256` are also available.
.. versionadded:: 3.6 SHA3 (Keccak) and SHAKE constructors :func:`sha3_224`, :func:`sha3_256`, :func:`sha3_384`, :func:`sha3_512`, :func:`shake_128`, :func:`shake_256`.
.. versionadded:: 3.6 :func:`blake2b` and :func:`blake2s` were added.
.. versionchanged:: 3.9 All hashlib constructors take a keyword-only argument *usedforsecurity* with default value ``True``. A false value allows the use of insecure and blocked hashing algorithms in restricted environments. ``False`` indicates that the hashing algorithm is not used in a security context, e.g. as a non-cryptographic one-way compression function. Hashlib now uses SHA3 and SHAKE from OpenSSL 1.1.1 and newer.
For example, to obtain the digest of the byte string b"Nobody inspects the
:
spammish repetition"
>>> import hashlib >>> m = hashlib.sha256() >>> m.update(b"Nobody inspects") >>> m.update(b" the spammish repetition") >>> m.digest() b'x03x1exdd}Aex15x93xc5xfe\x00oxa5u+7xfdxdfxf7xbcNx84:xa6xafx0cx95x0fKx94x06' >>> m.hexdigest() '031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406'
More condensed:
>>> hashlib.sha256(b"Nobody inspects the spammish repetition").hexdigest() '031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406'
.. function:: new(name[, data], *, usedforsecurity=True) Is a generic constructor that takes the string *name* of the desired algorithm as its first parameter. It also exists to allow access to the above listed hashes as well as any other algorithms that your OpenSSL library may offer. The named constructors are much faster than :func:`new` and should be preferred.
Using :func:`new` with an algorithm provided by OpenSSL:
>>> h = hashlib.new('sha256') >>> h.update(b"Nobody inspects the spammish repetition") >>> h.hexdigest() '031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406'
Hashlib provides the following constant attributes:
.. data:: algorithms_guaranteed A set containing the names of the hash algorithms guaranteed to be supported by this module on all platforms. Note that 'md5' is in this list despite some upstream vendors offering an odd "FIPS compliant" Python build that excludes it. .. versionadded:: 3.2
.. data:: algorithms_available A set containing the names of the hash algorithms that are available in the running Python interpreter. These names will be recognized when passed to :func:`new`. :attr:`algorithms_guaranteed` will always be a subset. The same algorithm may appear multiple times in this set under different names (thanks to OpenSSL). .. versionadded:: 3.2
The following values are provided as constant attributes of the hash objects
returned by the constructors:
.. data:: hash.digest_size The size of the resulting hash in bytes.
.. data:: hash.block_size The internal block size of the hash algorithm in bytes.
A hash object has the following attributes:
.. attribute:: hash.name The canonical name of this hash, always lowercase and always suitable as a parameter to :func:`new` to create another hash of this type. .. versionchanged:: 3.4 The name attribute has been present in CPython since its inception, but until Python 3.4 was not formally specified, so may not exist on some platforms.
A hash object has the following methods:
.. method:: hash.update(data) Update the hash object with the :term:`bytes-like object`. Repeated calls are equivalent to a single call with the concatenation of all the arguments: ``m.update(a); m.update(b)`` is equivalent to ``m.update(a+b)``. .. versionchanged:: 3.1 The Python GIL is released to allow other threads to run while hash updates on data larger than 2047 bytes is taking place when using hash algorithms supplied by OpenSSL.
.. method:: hash.digest() Return the digest of the data passed to the :meth:`update` method so far. This is a bytes object of size :attr:`digest_size` which may contain bytes in the whole range from 0 to 255.
.. method:: hash.hexdigest() Like :meth:`digest` except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.
.. method:: hash.copy() Return a copy ("clone") of the hash object. This can be used to efficiently compute the digests of data sharing a common initial substring.
SHAKE variable length digests
The :func:`shake_128` and :func:`shake_256` algorithms provide variable
length digests with length_in_bits//2 up to 128 or 256 bits of security.
As such, their digest methods require a length. Maximum length is not limited
by the SHAKE algorithm.
.. method:: shake.digest(length) Return the digest of the data passed to the :meth:`update` method so far. This is a bytes object of size *length* which may contain bytes in the whole range from 0 to 255.
.. method:: shake.hexdigest(length) Like :meth:`digest` except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments.
File hashing
The hashlib module provides a helper function for efficient hashing of
a file or file-like object.
.. function:: file_digest(fileobj, digest, /) Return a digest object that has been updated with contents of file object. *fileobj* must be a file-like object opened for reading in binary mode. It accepts file objects from builtin :func:`open`, :class:`~io.BytesIO` instances, SocketIO objects from :meth:`socket.socket.makefile`, and similar. The function may bypass Python's I/O and use the file descriptor from :meth:`~io.IOBase.fileno` directly. *fileobj* must be assumed to be in an unknown state after this function returns or raises. It is up to the caller to close *fileobj*. *digest* must either be a hash algorithm name as a *str*, a hash constructor, or a callable that returns a hash object. Example: >>> import io, hashlib, hmac >>> with open(hashlib.__file__, "rb") as f: ... digest = hashlib.file_digest(f, "sha256") ... >>> digest.hexdigest() # doctest: +ELLIPSIS '...' >>> buf = io.BytesIO(b"somedata") >>> mac1 = hmac.HMAC(b"key", digestmod=hashlib.sha512) >>> digest = hashlib.file_digest(buf, lambda: mac1) >>> digest is mac1 True >>> mac2 = hmac.HMAC(b"key", b"somedata", digestmod=hashlib.sha512) >>> mac1.digest() == mac2.digest() True .. versionadded:: 3.11
Key derivation
Key derivation and key stretching algorithms are designed for secure password
hashing. Naive algorithms such as sha1(password)
are not resistant against
brute-force attacks. A good password hashing function must be tunable, slow, and
include a salt.
.. function:: pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None) The function provides PKCS#5 password-based key derivation function 2. It uses HMAC as pseudorandom function. The string *hash_name* is the desired name of the hash digest algorithm for HMAC, e.g. 'sha1' or 'sha256'. *password* and *salt* are interpreted as buffers of bytes. Applications and libraries should limit *password* to a sensible length (e.g. 1024). *salt* should be about 16 or more bytes from a proper source, e.g. :func:`os.urandom`. The number of *iterations* should be chosen based on the hash algorithm and computing power. As of 2022, hundreds of thousands of iterations of SHA-256 are suggested. For rationale as to why and how to choose what is best for your application, read *Appendix A.2.2* of NIST-SP-800-132_. The answers on the `stackexchange pbkdf2 iterations question`_ explain in detail. *dklen* is the length of the derived key. If *dklen* is ``None`` then the digest size of the hash algorithm *hash_name* is used, e.g. 64 for SHA-512. >>> from hashlib import pbkdf2_hmac >>> our_app_iters = 500_000 # Application specific, read above. >>> dk = pbkdf2_hmac('sha256', b'password', b'bad salt' * 2, our_app_iters) >>> dk.hex() '15530bba69924174860db778f2c6f8104d3aaf9d26241840c8c4a641c8d000a9' Function only available when Python is compiled with OpenSSL. .. versionadded:: 3.4 .. versionchanged:: 3.12 Function now only available when Python is built with OpenSSL. The slow pure Python implementation has been removed.
.. function:: scrypt(password, *, salt, n, r, p, maxmem=0, dklen=64) The function provides scrypt password-based key derivation function as defined in :rfc:`7914`. *password* and *salt* must be :term:`bytes-like objects <bytes-like object>`. Applications and libraries should limit *password* to a sensible length (e.g. 1024). *salt* should be about 16 or more bytes from a proper source, e.g. :func:`os.urandom`. *n* is the CPU/Memory cost factor, *r* the block size, *p* parallelization factor and *maxmem* limits memory (OpenSSL 1.1.0 defaults to 32 MiB). *dklen* is the length of the derived key. .. versionadded:: 3.6
BLAKE2
.. sectionauthor:: Dmitry Chestnykh
.. index:: single: blake2b, blake2s
BLAKE2 is a cryptographic hash function defined in RFC 7693 that comes in two
flavors:
- BLAKE2b, optimized for 64-bit platforms and produces digests of any size
between 1 and 64 bytes, - BLAKE2s, optimized for 8- to 32-bit platforms and produces digests of any
size between 1 and 32 bytes.
BLAKE2 supports keyed mode (a faster and simpler replacement for HMAC),
salted hashing, personalization, and tree hashing.
Hash objects from this module follow the API of standard library’s
:mod:`hashlib` objects.
Creating hash objects
New hash objects are created by calling constructor functions:
.. function:: blake2b(data=b'', *, digest_size=64, key=b'', salt=b'', person=b'', fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False, usedforsecurity=True)
.. function:: blake2s(data=b'', *, digest_size=32, key=b'', salt=b'', person=b'', fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False, usedforsecurity=True)
These functions return the corresponding hash objects for calculating
BLAKE2b or BLAKE2s. They optionally take these general parameters:
- data: initial chunk of data to hash, which must be
:term:`bytes-like object`. It can be passed only as positional argument. - digest_size: size of output digest in bytes.
- key: key for keyed hashing (up to 64 bytes for BLAKE2b, up to 32 bytes for
BLAKE2s). - salt: salt for randomized hashing (up to 16 bytes for BLAKE2b, up to 8
bytes for BLAKE2s). - person: personalization string (up to 16 bytes for BLAKE2b, up to 8 bytes
for BLAKE2s).
The following table shows limits for general parameters (in bytes):
Hash | digest_size | len(key) | len(salt) | len(person) |
---|---|---|---|---|
BLAKE2b | 64 | 64 | 16 | 16 |
BLAKE2s | 32 | 32 | 8 | 8 |
Note
BLAKE2 specification defines constant lengths for salt and personalization
parameters, however, for convenience, this implementation accepts byte
strings of any size up to the specified length. If the length of the
parameter is less than specified, it is padded with zeros, thus, for
example, b'salt'
and b'saltx00'
is the same value. (This is not
the case for key.)
These sizes are available as module constants described below.
Constructor functions also accept the following tree hashing parameters:
- fanout: fanout (0 to 255, 0 if unlimited, 1 in sequential mode).
- depth: maximal depth of tree (1 to 255, 255 if unlimited, 1 in
sequential mode). - leaf_size: maximal byte length of leaf (0 to
2**32-1
, 0 if unlimited or in
sequential mode). - node_offset: node offset (0 to
2**64-1
for BLAKE2b, 0 to2**48-1
for
BLAKE2s, 0 for the first, leftmost, leaf, or in sequential mode). - node_depth: node depth (0 to 255, 0 for leaves, or in sequential mode).
- inner_size: inner digest size (0 to 64 for BLAKE2b, 0 to 32 for
BLAKE2s, 0 in sequential mode). - last_node: boolean indicating whether the processed node is the last
one (False
for sequential mode).
See section 2.10 in BLAKE2 specification for comprehensive review of tree
hashing.
Constants
.. data:: blake2b.SALT_SIZE
.. data:: blake2s.SALT_SIZE
Salt length (maximum length accepted by constructors).
.. data:: blake2b.PERSON_SIZE
.. data:: blake2s.PERSON_SIZE
Personalization string length (maximum length accepted by constructors).
.. data:: blake2b.MAX_KEY_SIZE
.. data:: blake2s.MAX_KEY_SIZE
Maximum key size.
.. data:: blake2b.MAX_DIGEST_SIZE
.. data:: blake2s.MAX_DIGEST_SIZE
Maximum digest size that the hash function can output.
Examples
Simple hashing
To calculate hash of some data, you should first construct a hash object by
calling the appropriate constructor function (:func:`blake2b` or
:func:`blake2s`), then update it with the data by calling :meth:`update` on the
object, and, finally, get the digest out of the object by calling
:meth:`digest` (or :meth:`hexdigest` for hex-encoded string).
>>> from hashlib import blake2b >>> h = blake2b() >>> h.update(b'Hello world') >>> h.hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'
As a shortcut, you can pass the first chunk of data to update directly to the
constructor as the positional argument:
>>> from hashlib import blake2b >>> blake2b(b'Hello world').hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'
You can call :meth:`hash.update` as many times as you need to iteratively
update the hash:
>>> from hashlib import blake2b >>> items = [b'Hello', b' ', b'world'] >>> h = blake2b() >>> for item in items: ... h.update(item) ... >>> h.hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'
Using different digest sizes
BLAKE2 has configurable size of digests up to 64 bytes for BLAKE2b and up to 32
bytes for BLAKE2s. For example, to replace SHA-1 with BLAKE2b without changing
the size of output, we can tell BLAKE2b to produce 20-byte digests:
>>> from hashlib import blake2b >>> h = blake2b(digest_size=20) >>> h.update(b'Replacing SHA1 with the more secure function') >>> h.hexdigest() 'd24f26cf8de66472d58d4e1b1774b4c9158b1f4c' >>> h.digest_size 20 >>> len(h.digest()) 20
Hash objects with different digest sizes have completely different outputs
(shorter hashes are not prefixes of longer hashes); BLAKE2b and BLAKE2s
produce different outputs even if the output length is the same:
>>> from hashlib import blake2b, blake2s >>> blake2b(digest_size=10).hexdigest() '6fa1d8fcfd719046d762' >>> blake2b(digest_size=11).hexdigest() 'eb6ec15daf9546254f0809' >>> blake2s(digest_size=10).hexdigest() '1bf21a98c78a1c376ae9' >>> blake2s(digest_size=11).hexdigest() '567004bf96e4a25773ebf4'
Keyed hashing
Keyed hashing can be used for authentication as a faster and simpler
replacement for Hash-based message authentication code (HMAC).
BLAKE2 can be securely used in prefix-MAC mode thanks to the
indifferentiability property inherited from BLAKE.
This example shows how to get a (hex-encoded) 128-bit authentication code for
message b'message data'
with key b'pseudorandom key'
:
>>> from hashlib import blake2b >>> h = blake2b(key=b'pseudorandom key', digest_size=16) >>> h.update(b'message data') >>> h.hexdigest() '3d363ff7401e02026f4a4687d4863ced'
As a practical example, a web application can symmetrically sign cookies sent
to users and later verify them to make sure they weren’t tampered with:
>>> from hashlib import blake2b >>> from hmac import compare_digest >>> >>> SECRET_KEY = b'pseudorandomly generated server secret key' >>> AUTH_SIZE = 16 >>> >>> def sign(cookie): ... h = blake2b(digest_size=AUTH_SIZE, key=SECRET_KEY) ... h.update(cookie) ... return h.hexdigest().encode('utf-8') >>> >>> def verify(cookie, sig): ... good_sig = sign(cookie) ... return compare_digest(good_sig, sig) >>> >>> cookie = b'user-alice' >>> sig = sign(cookie) >>> print("{0},{1}".format(cookie.decode('utf-8'), sig)) user-alice,b'43b3c982cf697e0c5ab22172d1ca7421' >>> verify(cookie, sig) True >>> verify(b'user-bob', sig) False >>> verify(cookie, b'0102030405060708090a0b0c0d0e0f00') False
Even though there’s a native keyed hashing mode, BLAKE2 can, of course, be used
in HMAC construction with :mod:`hmac` module:
>>> import hmac, hashlib >>> m = hmac.new(b'secret key', digestmod=hashlib.blake2s) >>> m.update(b'message') >>> m.hexdigest() 'e3c8102868d28b5ff85fc35dda07329970d1a01e273c37481326fe0c861c8142'
Randomized hashing
By setting salt parameter users can introduce randomization to the hash
function. Randomized hashing is useful for protecting against collision attacks
on the hash function used in digital signatures.
Randomized hashing is designed for situations where one party, the message
preparer, generates all or part of a message to be signed by a second
party, the message signer. If the message preparer is able to find
cryptographic hash function collisions (i.e., two messages producing the
same hash value), then they might prepare meaningful versions of the message
that would produce the same hash value and digital signature, but with
different results (e.g., transferring $1,000,000 to an account, rather than
$10). Cryptographic hash functions have been designed with collision
resistance as a major goal, but the current concentration on attacking
cryptographic hash functions may result in a given cryptographic hash
function providing less collision resistance than expected. Randomized
hashing offers the signer additional protection by reducing the likelihood
that a preparer can generate two or more messages that ultimately yield the
same hash value during the digital signature generation process — even if
it is practical to find collisions for the hash function. However, the use
of randomized hashing may reduce the amount of security provided by a
digital signature when all portions of the message are prepared
by the signer.(NIST SP-800-106 «Randomized Hashing for Digital Signatures»)
In BLAKE2 the salt is processed as a one-time input to the hash function during
initialization, rather than as an input to each compression function.
Warning
Salted hashing (or just hashing) with BLAKE2 or any other general-purpose
cryptographic hash function, such as SHA-256, is not suitable for hashing
passwords. See BLAKE2 FAQ for more
information.
>>> import os >>> from hashlib import blake2b >>> msg = b'some message' >>> # Calculate the first hash with a random salt. >>> salt1 = os.urandom(blake2b.SALT_SIZE) >>> h1 = blake2b(salt=salt1) >>> h1.update(msg) >>> # Calculate the second hash with a different random salt. >>> salt2 = os.urandom(blake2b.SALT_SIZE) >>> h2 = blake2b(salt=salt2) >>> h2.update(msg) >>> # The digests are different. >>> h1.digest() != h2.digest() True
Personalization
Sometimes it is useful to force hash function to produce different digests for
the same input for different purposes. Quoting the authors of the Skein hash
function:
We recommend that all application designers seriously consider doing this;
we have seen many protocols where a hash that is computed in one part of
the protocol can be used in an entirely different part because two hash
computations were done on similar or related data, and the attacker can
force the application to make the hash inputs the same. Personalizing each
hash function used in the protocol summarily stops this type of attack.(The Skein Hash Function Family,
p. 21)
BLAKE2 can be personalized by passing bytes to the person argument:
>>> from hashlib import blake2b >>> FILES_HASH_PERSON = b'MyApp Files Hash' >>> BLOCK_HASH_PERSON = b'MyApp Block Hash' >>> h = blake2b(digest_size=32, person=FILES_HASH_PERSON) >>> h.update(b'the same content') >>> h.hexdigest() '20d9cd024d4fb086aae819a1432dd2466de12947831b75c5a30cf2676095d3b4' >>> h = blake2b(digest_size=32, person=BLOCK_HASH_PERSON) >>> h.update(b'the same content') >>> h.hexdigest() 'cf68fb5761b9c44e7878bfb2c4c9aea52264a80b75005e65619778de59f383a3'
Personalization together with the keyed mode can also be used to derive different
keys from a single one.
>>> from hashlib import blake2s >>> from base64 import b64decode, b64encode >>> orig_key = b64decode(b'Rm5EPJai72qcK3RGBpW3vPNfZy5OZothY+kHY6h21KM=') >>> enc_key = blake2s(key=orig_key, person=b'kEncrypt').digest() >>> mac_key = blake2s(key=orig_key, person=b'kMAC').digest() >>> print(b64encode(enc_key).decode('utf-8')) rbPb15S/Z9t+agffno5wuhB77VbRi6F9Iv2qIxU7WHw= >>> print(b64encode(mac_key).decode('utf-8')) G9GtHFE1YluXY1zWPlYk1e/nWfu0WSEb0KRcjhDeP/o=
Tree mode
Here’s an example of hashing a minimal tree with two leaf nodes:
10 / 00 01
This example uses 64-byte internal digests, and returns the 32-byte final
digest:
>>> from hashlib import blake2b >>> >>> FANOUT = 2 >>> DEPTH = 2 >>> LEAF_SIZE = 4096 >>> INNER_SIZE = 64 >>> >>> buf = bytearray(6000) >>> >>> # Left leaf ... h00 = blake2b(buf[0:LEAF_SIZE], fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=0, node_depth=0, last_node=False) >>> # Right leaf ... h01 = blake2b(buf[LEAF_SIZE:], fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=1, node_depth=0, last_node=True) >>> # Root node ... h10 = blake2b(digest_size=32, fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=0, node_depth=1, last_node=True) >>> h10.update(h00.digest()) >>> h10.update(h01.digest()) >>> h10.hexdigest() '3ad2a9b37c6070e374c7a8c508fe20ca86b6ed54e286e93a0318e95e881db5aa'
Credits
BLAKE2 was designed by Jean-Philippe Aumasson, Samuel Neves, Zooko
Wilcox-O’Hearn, and Christian Winnerlein based on SHA-3 finalist BLAKE
created by Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and
Raphael C.-W. Phan.
It uses core algorithm from ChaCha cipher designed by Daniel J. Bernstein.
The stdlib implementation is based on pyblake2 module. It was written by
Dmitry Chestnykh based on C implementation written by Samuel Neves. The
documentation was copied from pyblake2 and written by Dmitry Chestnykh.
The C code was partly rewritten for Python by Christian Heimes.
The following public domain dedication applies for both C hash function
implementation, extension code, and this documentation:
To the extent possible under law, the author(s) have dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.You should have received a copy of the CC0 Public Domain Dedication along
with this software. If not, see
https://creativecommons.org/publicdomain/zero/1.0/.
The following people have helped with development or contributed their changes
to the project and the public domain according to the Creative Commons Public
Domain Dedication 1.0 Universal:
- Alexandr Sokolovskiy
.. seealso:: Module :mod:`hmac` A module to generate message authentication codes using hashes. Module :mod:`base64` Another way to encode binary hashes for non-binary environments. https://www.blake2.net Official BLAKE2 website. https://csrc.nist.gov/csrc/media/publications/fips/180/2/archive/2002-08-01/documents/fips180-2.pdf The FIPS 180-2 publication on Secure Hash Algorithms. https://en.wikipedia.org/wiki/Cryptographic_hash_function#Cryptographic_hash_algorithms Wikipedia article with information on which algorithms have known issues and what that means regarding their use. https://www.ietf.org/rfc/rfc8018.txt PKCS #5: Password-Based Cryptography Specification Version 2.1 https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-132.pdf NIST Recommendation for Password-Based Key Derivation.
Learn to calculate the Hash of a file in Python, with examples. It is also called the file checksum or digest. A checksum hash is an encrypted sequence of characters obtained after applying certain algorithms and manipulations on user-provided content.
Read More: File checksum in Java
1. Hash Algorithms
Hash functions are cryptographic functions that produces a fixed length/format encoded output for any given input. For any given input, a given hash function will always generate the same output – no matter how many times we execute the hash function.
Hash functions are generally applied over passwords and store then encrypted strings in database. In this way, user password is never stored in any plain text form in the system and makes security robust.
Similarly, when want to transfer a secure file to another host – we also pass the file digest information so remote host can generate the digest from transferred file and match with digest sent from source host. It both matches, file content has not been tempered.
we can read more about individual hashing algorithms in detail. In this post, we will only see the examples of various hashing algorithms.
2. Python File Hash Algorithms
We can use the hashlib module which provides necessary methods for generating various kind of hashes.
- To know all available algorithms provided by this module, use its
algorithms_guaranteed
property. - The
update()
function to append byte message to the secure hash value. - The
digest()
orhexdigest()
functions we can get the secure hash.
import hashlib print(hashlib.algorithms_guaranteed)
Program output.
{'shake_256', 'sha1', 'blake2s', 'sha3_224', 'sha512', 'sha3_256', 'sha224', 'sha256', 'sha3_384', 'sha384', 'blake2b', 'sha3_512', 'shake_128', 'md5'}
3. Pyhton File Hash Examples
Lets see few examples to generate hash of a given file in Python.
Example 1: Find SHA256 Hash of a File in Python
Python program to find the SHA256 hash of a given file.
import hashlib # hashlib module import os.path # For file handling from os import path def hash_file(filename): if path.isfile(filename) is False: raise Exception("File not found for hash operation") # make a hash object h_sha256 = hashlib.sha256() # open file for reading in binary mode with open(filename,'rb') as file: # read file in chunks and update hash chunk = 0 while chunk != b'': chunk = file.read(1024) h_sha256.update(chunk) # return the hex digest return h_sha256.hexdigest() ####### Example Usage message = hash_file("Python") print(message)
Program output.
8964ed69cac034a6bc88ad33089500b6a26a62f19e1574f2f8fbfddeb9a30667
Example 2: Find SHA Hash of a File in Python (Not recommended)
Python program to find the SHA1 hash of a given file.
import hashlib # hashlib module import os.path # For file handling from os import path def hash_file(filename): if path.isfile(filename) is False: raise Exception("File not found for hash operation") # make a hash object h = hashlib.sha1() # open file for reading in binary mode with open(filename,'rb') as file: # read file in chunks and update hash chunk = 0 while chunk != b'': chunk = file.read(1024) h.update(chunk) # return the hex digest return h.hexdigest() ####### Example Usage message = hash_file("Python") print(message)
Program output.
498fd2d318447f9d0fac30c6e0997c03861c679b
Example 3: Find MD5 Hash of a File in Python (Not recommended)
Python program to find MD5 SHA1 hash of a given file.
import hashlib # hashlib module import os.path # For file handling from os import path def hash_file(filename): if path.isfile(filename) is False: raise Exception("File not found for hash operation") # make a hash object md5_h = hashlib.md5() # open file for reading in binary mode with open(filename,'rb') as file: # read file in chunks and update hash chunk = 0 while chunk != b'': chunk = file.read(1024) md5_h.update(chunk) # return the hex digest return md5_h.hexdigest() ####### Example Usage message = hash_file("Python") print(message)
Program output.
ee6dcaf64b270667125a33f9b7bebb75
Happy Learning !!