Как найти хеш python

В Python хеш-функция принимает вводную последовательность с переменной длиной в байтах и конвертирует ее в последовательность с фиксированной длиной. Данная функция односторонняя.

Содержание статьи

  • Что такое хеш-функция Python
  • Популярные хеш-функции Python
  • Примеры кода с хеш-функциями в Python
  • MD5 — пример хеширования
  • SHA1 — пример хеширования
  • Хеширование на SHA224
  • Хеширование на SHA256
  • Пример хеширования на SHA384
  • Пример хеширования на SHA512
  • Использование алгоритмов OpenSSL
  • Реальный пример хеширования паролей Python

Это значит, что если f является функцией хеширования, f(x) вычисляется довольно быстро и без лишних сложностей, однако на повторное получение х потребуется очень много времени. Значение, что возвращается хеш-функцией, обычно называют хешем, дайджестом сообщения, значением хеша или контрольной суммой. В подобающем большинстве случаев для предоставленного ввода хеш-функция создает уникальный вывод. Однако, в зависимости от алгоритма, есть вероятность возникновения конфликта, вызванного особенностями математических теорий, что лежат в основе этих функций.

Хеш-функции используются в криптографических алгоритмах, электронных подписях, кодах аутентификации сообщений, обнаружении манипуляций, сканировании отпечатков пальцев, контрольных суммах (проверка целостности сообщений), хеш-таблицах, хранении паролей и многом другом.

Как Python-разработчику, вам могут понадобиться эти функции для проверки дубликатов данных и файлов, проверки целостности данных при передаче информации по сети, безопасного хранения паролей в базах данных или, возможно, для какой-либо работы, связанной с криптографией.

Есть вопросы по Python?

На нашем форуме вы можете задать любой вопрос и получить ответ от всего нашего сообщества!

Telegram Чат & Канал

Вступите в наш дружный чат по Python и начните общение с единомышленниками! Станьте частью большого сообщества!

Паблик VK

Одно из самых больших сообществ по Python в социальной сети ВК. Видео уроки и книги для вас!

Обратите внимание, что хеш-функции не являются криптографическим протоколом, они не шифруют и не дешифруют информацию, но являются фундаментальной частью многих криптографических протоколов и инструментов.

Популярные хеш-функции Python

Некоторые часто используемые хеш-функции:

  • MD5: Алгоритм производит хеш со значением в 128 битов. Широко используется для проверки целостности данных. Не подходит для использования в иных областях по причине уязвимости в безопасности MD5.
  • SHA: Группа алгоритмов, что были разработаны NSA Соединенных Штатов. Они являются частью Федерального стандарта обработки информации США. Эти алгоритмы широко используются в нескольких криптографических приложениях. Длина сообщения варьируется от 160 до 512 бит.

Модуль hashlib, включенный в стандартную библиотеку Python, представляет собой модуль, содержащий интерфейс для самых популярных алгоритмов хеширования. hashlib реализует некоторые алгоритмы, однако, если у вас установлен OpenSSL, hashlib также может использовать эти алгоритмы.

Данный код предназначен для работы в Python 3.5 и выше. При желании запустить эти примеры в Python 2.x, просто удалите вызовы attributems_available и algorithms_guaranteed.

Сначала импортируется модуль hashlib:

Теперь для списка доступных алгоритмов используются algorithms_available и algorithms_guaranteed.

print(hashlib.algorithms_available)

print(hashlib.algorithms_guaranteed)

Метод algorithms_available создает список всех алгоритмов, доступных в системе, включая те, что доступны через OpenSSl. В данном случае в списке можно заметить дубликаты названий. algorithms_guaranteed перечисляет только алгоритмы модуля. Всегда присутствуют md5, sha1, sha224, sha256, sha384, sha512.

Примеры кода с хеш-функциями в Python

Код ниже принимает строку "Hello World" и выводит дайджест HEX данной строки. hexdigest возвращает строку HEX, что представляет хеш, и в случае, если вам нужна последовательность байтов, нужно использовать дайджест.

MD5 — пример хеширования

import hashlib

hash_object = hashlib.md5(b‘Hello World’)

print(hash_object.hexdigest())

Обратите внимание, что "b" предшествует литералу строки, происходит конвертация строки в байты, оттого, что функция хеширования принимает только последовательность байтов в качестве параметра. В предыдущей версии библиотеки принимался литерал строки.

Итак, если вам нужно принять какой-то ввод с консоли и хешировать его, не забудьте закодировать строку в последовательности байтов:

import hashlib

mystring = input(‘Enter String to hash: ‘)

# Предположительно по умолчанию UTF-8

hash_object = hashlib.md5(mystring.encode())

print(hash_object.hexdigest())

Предположим, нам нужно хешировать строку "Hello Word" с помощью функции MD5. Тогда результатом будет 0a4d55a8d778e5022fab701977c5d840bbc486d0.

SHA1 — пример хеширования

import hashlib

hash_object = hashlib.sha1(b‘Hello World’)

hex_dig = hash_object.hexdigest()

print(hex_dig)

Хеширование на SHA224

import hashlib

hash_object = hashlib.sha224(b‘Hello World’)

hex_dig = hash_object.hexdigest()

print(hex_dig)

Хеширование на SHA256

import hashlib

hash_object = hashlib.sha256(b‘Hello World’)

hex_dig = hash_object.hexdigest()

print(hex_dig)

Пример хеширования на SHA384

import hashlib

hash_object = hashlib.sha384(b‘Hello World’)

hex_dig = hash_object.hexdigest()

print(hex_dig)

Пример хеширования на SHA512

import hashlib

hash_object = hashlib.sha512(b‘Hello World’)

hex_dig = hash_object.hexdigest()

print(hex_dig)

Использование алгоритмов OpenSSL

Предположим, вам нужен алгоритм, предоставленный OpenSSL. Используя algorithms_available, можно найти название необходимого алгоритма.

В данном случае,  на моем компьютере доступен «DSA». Вы можете использовать методы new и update:

import hashlib

hash_object = hashlib.new(‘DSA’)

hash_object.update(b‘Hello World’)

print(hash_object.hexdigest())

Реальный пример хеширования паролей Python

В следующем примере пароли будут хешироваться для последующего сохранения в базе данных. Здесь мы будем использовать salt. salt является случайной последовательностью, добавленной к строке пароля перед использованием хеш-функции. salt используется для предотвращения перебора по словарю (dictionary attack) и атак радужной таблицы (rainbow tables attacks).

Тем не менее, если вы занимаетесь реально функционирующим приложением и работаете над паролями пользователей, следите за последними зафиксированными уязвимостями в данной области. Для более подробного ознакомления с темой защиты паролей можете просмотреть следующую статью.

Код для Python 3.x

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

21

import uuid

import hashlib

def hash_password(password):

    # uuid используется для генерации случайного числа

    salt = uuid.uuid4().hex

    return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ‘:’ + salt

def check_password(hashed_password, user_password):

    password, salt = hashed_password.split(‘:’)

    return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest()

new_pass = input(‘Введите пароль: ‘)

hashed_password = hash_password(new_pass)

print(‘Строка для хранения в базе данных: ‘ + hashed_password)

old_pass = input(‘Введите пароль еще раз для проверки: ‘)

if check_password(hashed_password, old_pass):

    print(‘Вы ввели правильный пароль’)

else:

    print(‘Извините, но пароли не совпадают’)

Код для Python 2.x

1

2

3

4

5

6

7

8

9

10

11

12

13

14

15

16

17

18

19

20

import uuid

import hashlib

def hash_password(password):

    # uuid используется для генерации случайного числа

    salt = uuid.uuid4().hex

    return hashlib.sha256(salt.encode() + password.encode()).hexdigest() + ‘:’ + salt

def check_password(hashed_password, user_password):

    password, salt = hashed_password.split(‘:’)

    return password == hashlib.sha256(salt.encode() + user_password.encode()).hexdigest()

new_pass = raw_input(‘Введите пароль: ‘)

hashed_password = hash_password(new_pass)

print(‘Строка для сохранения в базе данных: ‘ + hashed_password)

old_pass = raw_input(‘Введите пароль еще раз для проверки: ‘)

if check_password(hashed_password, old_pass):

    print(‘Вы ввели правильный пароль’)

else:

    print(‘Извините, но пароли не совпадают’)

Являюсь администратором нескольких порталов по обучению языков программирования Python, Golang и Kotlin. В составе небольшой команды единомышленников, мы занимаемся популяризацией языков программирования на русскоязычную аудиторию. Большая часть статей была адаптирована нами на русский язык и распространяется бесплатно.

E-mail: vasile.buldumac@ati.utm.md

Образование
Universitatea Tehnică a Moldovei (utm.md)

  • 2014 — 2018 Технический Университет Молдовы, ИТ-Инженер. Тема дипломной работы «Автоматизация покупки и продажи криптовалюты используя технический анализ»
  • 2018 — 2020 Технический Университет Молдовы, Магистр, Магистерская диссертация «Идентификация человека в киберпространстве по фотографии лица»

Source code: Lib/hashlib.py


This module implements a common interface to many different secure hash and
message digest algorithms. Included are the FIPS secure hash algorithms SHA1,
SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s MD5
algorithm (defined in Internet RFC 1321). The terms “secure hash” and
“message digest” are interchangeable. Older algorithms were called message
digests. The modern term is secure hash.

Note

If you want the adler32 or crc32 hash functions, they are available in
the zlib module.

Warning

Some algorithms have known hash collision weaknesses, refer to the “See
also” section at the end.

15.1.1. Hash algorithms¶

There is one constructor method named for each type of hash. All return
a hash object with the same simple interface. For example: use sha256() to
create a SHA-256 hash object. You can now feed this object with bytes-like
objects
(normally bytes) using the update() method.
At any point you can ask it for the digest of the
concatenation of the data fed to it so far using the digest() or
hexdigest() methods.

Note

For better multithreading performance, the Python GIL is released for
data larger than 2047 bytes at object creation or on update.

Note

Feeding string objects into update() is not supported, as hashes work
on bytes, not on characters.

Constructors for hash algorithms that are always present in this module are
sha1(), sha224(), sha256(), sha384(),
sha512(), blake2b(), and blake2s().
md5() is normally available as well, though it
may be missing if you are using a rare “FIPS compliant” build of Python.
Additional algorithms may also be available depending upon the OpenSSL
library that Python uses on your platform. On most platforms the
sha3_224(), sha3_256(), sha3_384(), sha3_512(),
shake_128(), shake_256() are also available.

New in version 3.6: SHA3 (Keccak) and SHAKE constructors sha3_224(), sha3_256(),
sha3_384(), sha3_512(), shake_128(), shake_256().

New in version 3.6: blake2b() and blake2s() were added.

For example, to obtain the digest of the byte string b'Nobody inspects the
spammish repetition'
:

>>> import hashlib
>>> m = hashlib.sha256()
>>> m.update(b"Nobody inspects")
>>> m.update(b" the spammish repetition")
>>> m.digest()
b'x03x1exdd}Aex15x93xc5xfe\x00oxa5u+7xfdxdfxf7xbcNx84:xa6xafx0cx95x0fKx94x06'
>>> m.digest_size
32
>>> m.block_size
64

More condensed:

>>> hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest()
'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2'
hashlib.new(name[, data])

Is a generic constructor that takes the string name of the desired
algorithm as its first parameter. It also exists to allow access to the
above listed hashes as well as any other algorithms that your OpenSSL
library may offer. The named constructors are much faster than new()
and should be preferred.

Using new() with an algorithm provided by OpenSSL:

>>> h = hashlib.new('ripemd160')
>>> h.update(b"Nobody inspects the spammish repetition")
>>> h.hexdigest()
'cc4a5ce1b3df48aec5d22d1f16b894a0b894eccc'

Hashlib provides the following constant attributes:

hashlib.algorithms_guaranteed

A set containing the names of the hash algorithms guaranteed to be supported
by this module on all platforms. Note that ‘md5’ is in this list despite
some upstream vendors offering an odd “FIPS compliant” Python build that
excludes it.

New in version 3.2.

hashlib.algorithms_available

A set containing the names of the hash algorithms that are available in the
running Python interpreter. These names will be recognized when passed to
new(). algorithms_guaranteed will always be a subset. The
same algorithm may appear multiple times in this set under different names
(thanks to OpenSSL).

New in version 3.2.

The following values are provided as constant attributes of the hash objects
returned by the constructors:

hash.digest_size

The size of the resulting hash in bytes.

hash.block_size

The internal block size of the hash algorithm in bytes.

A hash object has the following attributes:

hash.name

The canonical name of this hash, always lowercase and always suitable as a
parameter to new() to create another hash of this type.

Changed in version 3.4: The name attribute has been present in CPython since its inception, but
until Python 3.4 was not formally specified, so may not exist on some
platforms.

A hash object has the following methods:

hash.update(arg)

Update the hash object with the object arg, which must be interpretable as
a buffer of bytes. Repeated calls are equivalent to a single call with the
concatenation of all the arguments: m.update(a); m.update(b) is
equivalent to m.update(a+b).

Changed in version 3.1: The Python GIL is released to allow other threads to run while hash
updates on data larger than 2047 bytes is taking place when using hash
algorithms supplied by OpenSSL.

hash.digest()

Return the digest of the data passed to the update() method so far.
This is a bytes object of size digest_size which may contain bytes in
the whole range from 0 to 255.

hash.hexdigest()

Like digest() except the digest is returned as a string object of
double length, containing only hexadecimal digits. This may be used to
exchange the value safely in email or other non-binary environments.

hash.copy()

Return a copy (“clone”) of the hash object. This can be used to efficiently
compute the digests of data sharing a common initial substring.

15.1.2. SHAKE variable length digests¶

The shake_128() and shake_256() algorithms provide variable
length digests with length_in_bits//2 up to 128 or 256 bits of security.
As such, their digest methods require a length. Maximum length is not limited
by the SHAKE algorithm.

shake.digest(length)

Return the digest of the data passed to the update() method so far.
This is a bytes object of size length which may contain bytes in
the whole range from 0 to 255.

shake.hexdigest(length)

Like digest() except the digest is returned as a string object of
double length, containing only hexadecimal digits. This may be used to
exchange the value safely in email or other non-binary environments.

15.1.3. Key derivation¶

Key derivation and key stretching algorithms are designed for secure password
hashing. Naive algorithms such as sha1(password) are not resistant against
brute-force attacks. A good password hashing function must be tunable, slow, and
include a salt.

hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None)

The function provides PKCS#5 password-based key derivation function 2. It
uses HMAC as pseudorandom function.

The string hash_name is the desired name of the hash digest algorithm for
HMAC, e.g. ‘sha1’ or ‘sha256’. password and salt are interpreted as
buffers of bytes. Applications and libraries should limit password to
a sensible length (e.g. 1024). salt should be about 16 or more bytes from
a proper source, e.g. os.urandom().

The number of iterations should be chosen based on the hash algorithm and
computing power. As of 2013, at least 100,000 iterations of SHA-256 are
suggested.

dklen is the length of the derived key. If dklen is None then the
digest size of the hash algorithm hash_name is used, e.g. 64 for SHA-512.

>>> import hashlib, binascii
>>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000)
>>> binascii.hexlify(dk)
b'0394a2ede332c9a13eb82e9b24631604c31df978b4e2f0fbd2c549944f9d79a5'

New in version 3.4.

Note

A fast implementation of pbkdf2_hmac is available with OpenSSL. The
Python implementation uses an inline version of hmac. It is about
three times slower and doesn’t release the GIL.

hashlib.scrypt(password, *, salt, n, r, p, maxmem=0, dklen=64)

The function provides scrypt password-based key derivation function as
defined in RFC 7914.

password and salt must be bytes-like objects. Applications and
libraries should limit password to a sensible length (e.g. 1024). salt
should be about 16 or more bytes from a proper source, e.g. os.urandom().

n is the CPU/Memory cost factor, r the block size, p parallelization
factor and maxmem limits memory (OpenSSL 1.1.0 defaults to 32 MB).
dklen is the length of the derived key.

Availability: OpenSSL 1.1+

New in version 3.6.

15.1.4. BLAKE2¶

BLAKE2 is a cryptographic hash function defined in RFC-7693 that comes in two
flavors:

  • BLAKE2b, optimized for 64-bit platforms and produces digests of any size
    between 1 and 64 bytes,
  • BLAKE2s, optimized for 8- to 32-bit platforms and produces digests of any
    size between 1 and 32 bytes.

BLAKE2 supports keyed mode (a faster and simpler replacement for HMAC),
salted hashing, personalization, and tree hashing.

Hash objects from this module follow the API of standard library’s
hashlib objects.

15.1.4.1. Creating hash objects¶

New hash objects are created by calling constructor functions:

hashlib.blake2b(data=b», digest_size=64, key=b», salt=b», person=b», fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False)
hashlib.blake2s(data=b», digest_size=32, key=b», salt=b», person=b», fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False)

These functions return the corresponding hash objects for calculating
BLAKE2b or BLAKE2s. They optionally take these general parameters:

  • data: initial chunk of data to hash, which must be interpretable as buffer
    of bytes.
  • digest_size: size of output digest in bytes.
  • key: key for keyed hashing (up to 64 bytes for BLAKE2b, up to 32 bytes for
    BLAKE2s).
  • salt: salt for randomized hashing (up to 16 bytes for BLAKE2b, up to 8
    bytes for BLAKE2s).
  • person: personalization string (up to 16 bytes for BLAKE2b, up to 8 bytes
    for BLAKE2s).

The following table shows limits for general parameters (in bytes):

Hash digest_size len(key) len(salt) len(person)
BLAKE2b 64 64 16 16
BLAKE2s 32 32 8 8

Note

BLAKE2 specification defines constant lengths for salt and personalization
parameters, however, for convenience, this implementation accepts byte
strings of any size up to the specified length. If the length of the
parameter is less than specified, it is padded with zeros, thus, for
example, b'salt' and b'saltx00' is the same value. (This is not
the case for key.)

These sizes are available as module constants described below.

Constructor functions also accept the following tree hashing parameters:

  • fanout: fanout (0 to 255, 0 if unlimited, 1 in sequential mode).
  • depth: maximal depth of tree (1 to 255, 255 if unlimited, 1 in
    sequential mode).
  • leaf_size: maximal byte length of leaf (0 to 2**32-1, 0 if unlimited or in
    sequential mode).
  • node_offset: node offset (0 to 2**64-1 for BLAKE2b, 0 to 2**48-1 for
    BLAKE2s, 0 for the first, leftmost, leaf, or in sequential mode).
  • node_depth: node depth (0 to 255, 0 for leaves, or in sequential mode).
  • inner_size: inner digest size (0 to 64 for BLAKE2b, 0 to 32 for
    BLAKE2s, 0 in sequential mode).
  • last_node: boolean indicating whether the processed node is the last
    one (False for sequential mode).

Explanation of tree mode parameters.

See section 2.10 in BLAKE2 specification for comprehensive review of tree
hashing.

15.1.4.2. Constants¶

blake2b.SALT_SIZE
blake2s.SALT_SIZE

Salt length (maximum length accepted by constructors).

blake2b.PERSON_SIZE
blake2s.PERSON_SIZE

Personalization string length (maximum length accepted by constructors).

blake2b.MAX_KEY_SIZE
blake2s.MAX_KEY_SIZE

Maximum key size.

blake2b.MAX_DIGEST_SIZE
blake2s.MAX_DIGEST_SIZE

Maximum digest size that the hash function can output.

15.1.4.3. Examples¶

15.1.4.3.1. Simple hashing¶

To calculate hash of some data, you should first construct a hash object by
calling the appropriate constructor function (blake2b() or
blake2s()), then update it with the data by calling update() on the
object, and, finally, get the digest out of the object by calling
digest() (or hexdigest() for hex-encoded string).

>>> from hashlib import blake2b
>>> h = blake2b()
>>> h.update(b'Hello world')
>>> h.hexdigest()
'6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'

As a shortcut, you can pass the first chunk of data to update directly to the
constructor as the first argument (or as data keyword argument):

>>> from hashlib import blake2b
>>> blake2b(b'Hello world').hexdigest()
'6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'

You can call hash.update() as many times as you need to iteratively
update the hash:

>>> from hashlib import blake2b
>>> items = [b'Hello', b' ', b'world']
>>> h = blake2b()
>>> for item in items:
...     h.update(item)
>>> h.hexdigest()
'6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'

15.1.4.3.2. Using different digest sizes¶

BLAKE2 has configurable size of digests up to 64 bytes for BLAKE2b and up to 32
bytes for BLAKE2s. For example, to replace SHA-1 with BLAKE2b without changing
the size of output, we can tell BLAKE2b to produce 20-byte digests:

>>> from hashlib import blake2b
>>> h = blake2b(digest_size=20)
>>> h.update(b'Replacing SHA1 with the more secure function')
>>> h.hexdigest()
'd24f26cf8de66472d58d4e1b1774b4c9158b1f4c'
>>> h.digest_size
20
>>> len(h.digest())
20

Hash objects with different digest sizes have completely different outputs
(shorter hashes are not prefixes of longer hashes); BLAKE2b and BLAKE2s
produce different outputs even if the output length is the same:

>>> from hashlib import blake2b, blake2s
>>> blake2b(digest_size=10).hexdigest()
'6fa1d8fcfd719046d762'
>>> blake2b(digest_size=11).hexdigest()
'eb6ec15daf9546254f0809'
>>> blake2s(digest_size=10).hexdigest()
'1bf21a98c78a1c376ae9'
>>> blake2s(digest_size=11).hexdigest()
'567004bf96e4a25773ebf4'

15.1.4.3.3. Keyed hashing¶

Keyed hashing can be used for authentication as a faster and simpler
replacement for Hash-based message authentication code (HMAC).
BLAKE2 can be securely used in prefix-MAC mode thanks to the
indifferentiability property inherited from BLAKE.

This example shows how to get a (hex-encoded) 128-bit authentication code for
message b'message data' with key b'pseudorandom key':

>>> from hashlib import blake2b
>>> h = blake2b(key=b'pseudorandom key', digest_size=16)
>>> h.update(b'message data')
>>> h.hexdigest()
'3d363ff7401e02026f4a4687d4863ced'

As a practical example, a web application can symmetrically sign cookies sent
to users and later verify them to make sure they weren’t tampered with:

>>> from hashlib import blake2b
>>> from hmac import compare_digest
>>>
>>> SECRET_KEY = b'pseudorandomly generated server secret key'
>>> AUTH_SIZE = 16
>>>
>>> def sign(cookie):
...     h = blake2b(digest_size=AUTH_SIZE, key=SECRET_KEY)
...     h.update(cookie)
...     return h.hexdigest().encode('utf-8')
>>>
>>> cookie = b'user:vatrogasac'
>>> sig = sign(cookie)
>>> print("{0},{1}".format(cookie.decode('utf-8'), sig))
user:vatrogasac,349cf904533767ed2d755279a8df84d0
>>> compare_digest(cookie, sig)
True
>>> compare_digest(b'user:policajac', sig)
False
>>> compare_digest(cookie, b'0102030405060708090a0b0c0d0e0f00')
False

Even though there’s a native keyed hashing mode, BLAKE2 can, of course, be used
in HMAC construction with hmac module:

>>> import hmac, hashlib
>>> m = hmac.new(b'secret key', digestmod=hashlib.blake2s)
>>> m.update(b'message')
>>> m.hexdigest()
'e3c8102868d28b5ff85fc35dda07329970d1a01e273c37481326fe0c861c8142'

15.1.4.3.4. Randomized hashing¶

By setting salt parameter users can introduce randomization to the hash
function. Randomized hashing is useful for protecting against collision attacks
on the hash function used in digital signatures.

Randomized hashing is designed for situations where one party, the message
preparer, generates all or part of a message to be signed by a second
party, the message signer. If the message preparer is able to find
cryptographic hash function collisions (i.e., two messages producing the
same hash value), then she might prepare meaningful versions of the message
that would produce the same hash value and digital signature, but with
different results (e.g., transferring $1,000,000 to an account, rather than
$10). Cryptographic hash functions have been designed with collision
resistance as a major goal, but the current concentration on attacking
cryptographic hash functions may result in a given cryptographic hash
function providing less collision resistance than expected. Randomized
hashing offers the signer additional protection by reducing the likelihood
that a preparer can generate two or more messages that ultimately yield the
same hash value during the digital signature generation process — even if
it is practical to find collisions for the hash function. However, the use
of randomized hashing may reduce the amount of security provided by a
digital signature when all portions of the message are prepared
by the signer.

(NIST SP-800-106 “Randomized Hashing for Digital Signatures”)

In BLAKE2 the salt is processed as a one-time input to the hash function during
initialization, rather than as an input to each compression function.

Warning

Salted hashing (or just hashing) with BLAKE2 or any other general-purpose
cryptographic hash function, such as SHA-256, is not suitable for hashing
passwords. See BLAKE2 FAQ for more
information.

>>> import os
>>> from hashlib import blake2b
>>> msg = b'some message'
>>> # Calculate the first hash with a random salt.
>>> salt1 = os.urandom(blake2b.SALT_SIZE)
>>> h1 = blake2b(salt=salt1)
>>> h1.update(msg)
>>> # Calculate the second hash with a different random salt.
>>> salt2 = os.urandom(blake2b.SALT_SIZE)
>>> h2 = blake2b(salt=salt2)
>>> h2.update(msg)
>>> # The digests are different.
>>> h1.digest() != h2.digest()
True

15.1.4.3.5. Personalization¶

Sometimes it is useful to force hash function to produce different digests for
the same input for different purposes. Quoting the authors of the Skein hash
function:

We recommend that all application designers seriously consider doing this;
we have seen many protocols where a hash that is computed in one part of
the protocol can be used in an entirely different part because two hash
computations were done on similar or related data, and the attacker can
force the application to make the hash inputs the same. Personalizing each
hash function used in the protocol summarily stops this type of attack.

(The Skein Hash Function Family,
p. 21)

BLAKE2 can be personalized by passing bytes to the person argument:

>>> from hashlib import blake2b
>>> FILES_HASH_PERSON = b'MyApp Files Hash'
>>> BLOCK_HASH_PERSON = b'MyApp Block Hash'
>>> h = blake2b(digest_size=32, person=FILES_HASH_PERSON)
>>> h.update(b'the same content')
>>> h.hexdigest()
'20d9cd024d4fb086aae819a1432dd2466de12947831b75c5a30cf2676095d3b4'
>>> h = blake2b(digest_size=32, person=BLOCK_HASH_PERSON)
>>> h.update(b'the same content')
>>> h.hexdigest()
'cf68fb5761b9c44e7878bfb2c4c9aea52264a80b75005e65619778de59f383a3'

Personalization together with the keyed mode can also be used to derive different
keys from a single one.

>>> from hashlib import blake2s
>>> from base64 import b64decode, b64encode
>>> orig_key = b64decode(b'Rm5EPJai72qcK3RGBpW3vPNfZy5OZothY+kHY6h21KM=')
>>> enc_key = blake2s(key=orig_key, person=b'kEncrypt').digest()
>>> mac_key = blake2s(key=orig_key, person=b'kMAC').digest()
>>> print(b64encode(enc_key).decode('utf-8'))
rbPb15S/Z9t+agffno5wuhB77VbRi6F9Iv2qIxU7WHw=
>>> print(b64encode(mac_key).decode('utf-8'))
G9GtHFE1YluXY1zWPlYk1e/nWfu0WSEb0KRcjhDeP/o=

15.1.4.3.6. Tree mode¶

Here’s an example of hashing a minimal tree with two leaf nodes:

This example uses 64-byte internal digests, and returns the 32-byte final
digest:

>>> from hashlib import blake2b
>>>
>>> FANOUT = 2
>>> DEPTH = 2
>>> LEAF_SIZE = 4096
>>> INNER_SIZE = 64
>>>
>>> buf = bytearray(6000)
>>>
>>> # Left leaf
... h00 = blake2b(buf[0:LEAF_SIZE], fanout=FANOUT, depth=DEPTH,
...               leaf_size=LEAF_SIZE, inner_size=INNER_SIZE,
...               node_offset=0, node_depth=0, last_node=False)
>>> # Right leaf
... h01 = blake2b(buf[LEAF_SIZE:], fanout=FANOUT, depth=DEPTH,
...               leaf_size=LEAF_SIZE, inner_size=INNER_SIZE,
...               node_offset=1, node_depth=0, last_node=True)
>>> # Root node
... h10 = blake2b(digest_size=32, fanout=FANOUT, depth=DEPTH,
...               leaf_size=LEAF_SIZE, inner_size=INNER_SIZE,
...               node_offset=0, node_depth=1, last_node=True)
>>> h10.update(h00.digest())
>>> h10.update(h01.digest())
>>> h10.hexdigest()
'3ad2a9b37c6070e374c7a8c508fe20ca86b6ed54e286e93a0318e95e881db5aa'

15.1.4.4. Credits¶

BLAKE2 was designed by Jean-Philippe Aumasson, Samuel Neves, Zooko
Wilcox-O’Hearn
, and Christian Winnerlein based on SHA-3 finalist BLAKE
created by Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and
Raphael C.-W. Phan.

It uses core algorithm from ChaCha cipher designed by Daniel J. Bernstein.

The stdlib implementation is based on pyblake2 module. It was written by
Dmitry Chestnykh based on C implementation written by Samuel Neves. The
documentation was copied from pyblake2 and written by Dmitry Chestnykh.

The C code was partly rewritten for Python by Christian Heimes.

The following public domain dedication applies for both C hash function
implementation, extension code, and this documentation:

To the extent possible under law, the author(s) have dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along
with this software. If not, see
http://creativecommons.org/publicdomain/zero/1.0/.

The following people have helped with development or contributed their changes
to the project and the public domain according to the Creative Commons Public
Domain Dedication 1.0 Universal:

  • Alexandr Sokolovskiy

A Cryptographic hash function is a function that takes in input data and produces a statistically unique output, which is unique to that particular set of data. The hash is a fixed-length byte stream used to ensure the integrity of the data. In this article, you will learn to use the hashlib module to obtain the hash of a file in Python. The hashlib module is preinstalled in most Python distributions. If it doesn’t exist in your environment, then you can install the module by using pip command:

pip install hashlib

What is the Hashlib Module?

The hashlib module implements a common interface for many secure cryptographic hash and message digest algorithms. There is one constructor method named for each type of hash. All return a hash object with the same simple interface. Constructors for hash algorithms are always present in this module. 

hashlib.algorithms_guaranteed

A set containing the names of the hash algorithms is guaranteed to be supported by this module on all platforms.

>>> print(hashlib.algorithms_guaranteed)

{‘sha3_512’, ‘sha1’, ‘sha224’, ‘shake_256’, ‘sha3_384’, ‘sha512’, ‘sha384’, ‘blake2s’, ‘md5’, ‘sha3_224’, ‘sha256’, ‘blake2b’, ‘sha3_256’, ‘shake_128’}

hashlib.algorithms_available

A set containing the names of the hash algorithms available in the running Python interpreter.  The same algorithm may appear multiple times in this set under different names (due to OpenSSL).

>>> print(hashlib.algorithms_available)

{‘sha384’, ‘sha3_224’, ‘whirlpool’, ‘ripemd160’, ‘blake2s’, ‘md5-sha1’, ‘sm3’, ‘sha256’, ‘shake_256’, ‘sha1’, ‘sha3_384’, 

‘sha512’, ‘blake2b’, ‘sha512_256’, ‘sha3_256’, ‘shake_128’, ‘sha3_512’, ‘sha224’, ‘md5’, ‘mdc2’, ‘sha512_224’, ‘md4’}

Explanation of SHA-256 Algorithm and its Features

This article will use the FIPS secure hash algorithm SHA-256 to obtain the file hash. Other secure hash algorithms include:

  • MD5 (Message Digest 5)
  • SHA-512 (Secure Hashing Algorithm 512 bits)
  • RC4 (Rivest Cipher 4)

The reason for the usage of SHA-256 is it is one of the most renowned and secure hashing algorithms currently used while offering less time required to compute a hash. The algorithm belongs to the SHA-2 Family, which is succeeded by the SHA-3 family based on sponge construction structure.

Obtaining a Cryptographic Hash of a File

In the following example, a path to a file would be provided as a command line argument. Then the SHA 256 (Secured Hashing Algorithm-256bits) hash would be obtained for the file and displayed. 

Hash of the following file:

hashlib module in Python

test.txt

Firstly the hashlib and sys modules are imported. The sys module is imported to allow command-line arguments in the code. Then the function that would be used to obtain the SHA-256 hash of the file is defined. In the function, a Buffer size is defined (65536 in our case). This buffer size is the number of bytes read from the file (at a time) and fed into the SHA-256 hash function. This allows larger files to be operated without producing memory constraints. At the end of the function, the hexdigest function is called on the hash to produce its hexadecimal representation. The function call to the above function (hashfile) contains the first argument (sys.argv[1]) that is provided while calling the function from the command line (the 0th argument is the Python file name). In the end, the hash of the file is displayed.

Python3

import sys

import hashlib

def hashfile(file):

    BUF_SIZE = 65536

    sha256 = hashlib.sha256()

    with open("test.txt", 'rb') as f:

        while True:

            data = f.read(BUF_SIZE)

            if not data:

                break

            sha256.update(data)

    return sha256.hexdigest()

file_hash = hashfile(sys.argv[1])

print(f"Hash:{file_hash}")

Output:

Hash of a String in Python

Obtaining a Cryptographic Hash of a String

The above method could also be used to obtain the hash of a finite-length string. For that, the string needs to be converted to a byte stream before it is sent as an argument. For short strings, the process could be accomplished in a single call. The following example demonstrates this in practice:

Firstly a byte literal is initialized and is stored to a variable (due to the b prefix of the string). Then the sha256 function is initialized, and the byte literal is passed as an argument to the update function. This updates the sha256 algorithm with the data. After which, the hash digest is computed, and its hexadecimal equivalent is requested using the hexdigest function. At the end, this hash value is displayed.

Python3

import hashlib

string = b"My name is apple and I am a vegetable?"

sha256 = hashlib.sha256()

sha256.update(string)

string_hash = sha256.hexdigest()

print(f"Hash:{string_hash}")

Output:

Hash:252f8ca07a6fcaae293e5097151c803a7f16504e48c4eb60f651c11341e83217

Last Updated :
06 Feb, 2023

Like Article

Save Article

:mod:`hashlib` — Secure hashes and message digests

.. module:: hashlib
   :synopsis: Secure hash and message digest algorithms.

.. moduleauthor:: Gregory P. Smith <greg@krypto.org>
.. sectionauthor:: Gregory P. Smith <greg@krypto.org>

Source code: :source:`Lib/hashlib.py`

.. index::
   single: message digest, MD5
   single: secure hash algorithm, SHA1, SHA224, SHA256, SHA384, SHA512

.. testsetup::

   import hashlib



This module implements a common interface to many different secure hash and
message digest algorithms. Included are the FIPS secure hash algorithms SHA1,
SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s MD5
algorithm (defined in internet RFC 1321). The terms «secure hash» and
«message digest» are interchangeable. Older algorithms were called message
digests. The modern term is secure hash.

Note

If you want the adler32 or crc32 hash functions, they are available in
the :mod:`zlib` module.

Warning

Some algorithms have known hash collision weaknesses, refer to the «See
also» section at the end.

Hash algorithms

There is one constructor method named for each type of :dfn:`hash`. All return
a hash object with the same simple interface. For example: use :func:`sha256` to
create a SHA-256 hash object. You can now feed this object with :term:`bytes-like
objects <bytes-like object>`
(normally :class:`bytes`) using the :meth:`update` method.
At any point you can ask it for the :dfn:`digest` of the
concatenation of the data fed to it so far using the :meth:`digest` or
:meth:`hexdigest` methods.

Note

For better multithreading performance, the Python :term:`GIL` is released for
data larger than 2047 bytes at object creation or on update.

Note

Feeding string objects into :meth:`update` is not supported, as hashes work
on bytes, not on characters.

.. index:: single: OpenSSL; (use in module hashlib)

Constructors for hash algorithms that are always present in this module are
:func:`sha1`, :func:`sha224`, :func:`sha256`, :func:`sha384`,
:func:`sha512`, :func:`blake2b`, and :func:`blake2s`.
:func:`md5` is normally available as well, though it
may be missing or blocked if you are using a rare «FIPS compliant» build of Python.
Additional algorithms may also be available depending upon the OpenSSL
library that Python uses on your platform. On most platforms the
:func:`sha3_224`, :func:`sha3_256`, :func:`sha3_384`, :func:`sha3_512`,
:func:`shake_128`, :func:`shake_256` are also available.

.. versionadded:: 3.6
   SHA3 (Keccak) and SHAKE constructors :func:`sha3_224`, :func:`sha3_256`,
   :func:`sha3_384`, :func:`sha3_512`, :func:`shake_128`, :func:`shake_256`.

.. versionadded:: 3.6
   :func:`blake2b` and :func:`blake2s` were added.

.. versionchanged:: 3.9
   All hashlib constructors take a keyword-only argument *usedforsecurity*
   with default value ``True``. A false value allows the use of insecure and
   blocked hashing algorithms in restricted environments. ``False`` indicates
   that the hashing algorithm is not used in a security context, e.g. as a
   non-cryptographic one-way compression function.

   Hashlib now uses SHA3 and SHAKE from OpenSSL 1.1.1 and newer.

For example, to obtain the digest of the byte string b"Nobody inspects the
spammish repetition"
:

>>> import hashlib
>>> m = hashlib.sha256()
>>> m.update(b"Nobody inspects")
>>> m.update(b" the spammish repetition")
>>> m.digest()
b'x03x1exdd}Aex15x93xc5xfe\x00oxa5u+7xfdxdfxf7xbcNx84:xa6xafx0cx95x0fKx94x06'
>>> m.hexdigest()
'031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406'

More condensed:

>>> hashlib.sha256(b"Nobody inspects the spammish repetition").hexdigest()
'031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406'
.. function:: new(name[, data], *, usedforsecurity=True)

   Is a generic constructor that takes the string *name* of the desired
   algorithm as its first parameter.  It also exists to allow access to the
   above listed hashes as well as any other algorithms that your OpenSSL
   library may offer.  The named constructors are much faster than :func:`new`
   and should be preferred.

Using :func:`new` with an algorithm provided by OpenSSL:

>>> h = hashlib.new('sha256')
>>> h.update(b"Nobody inspects the spammish repetition")
>>> h.hexdigest()
'031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406'

Hashlib provides the following constant attributes:

.. data:: algorithms_guaranteed

   A set containing the names of the hash algorithms guaranteed to be supported
   by this module on all platforms.  Note that 'md5' is in this list despite
   some upstream vendors offering an odd "FIPS compliant" Python build that
   excludes it.

   .. versionadded:: 3.2

.. data:: algorithms_available

   A set containing the names of the hash algorithms that are available in the
   running Python interpreter.  These names will be recognized when passed to
   :func:`new`.  :attr:`algorithms_guaranteed` will always be a subset.  The
   same algorithm may appear multiple times in this set under different names
   (thanks to OpenSSL).

   .. versionadded:: 3.2

The following values are provided as constant attributes of the hash objects
returned by the constructors:

.. data:: hash.digest_size

   The size of the resulting hash in bytes.

.. data:: hash.block_size

   The internal block size of the hash algorithm in bytes.

A hash object has the following attributes:

.. attribute:: hash.name

   The canonical name of this hash, always lowercase and always suitable as a
   parameter to :func:`new` to create another hash of this type.

   .. versionchanged:: 3.4
      The name attribute has been present in CPython since its inception, but
      until Python 3.4 was not formally specified, so may not exist on some
      platforms.

A hash object has the following methods:

.. method:: hash.update(data)

   Update the hash object with the :term:`bytes-like object`.
   Repeated calls are equivalent to a single call with the
   concatenation of all the arguments: ``m.update(a); m.update(b)`` is
   equivalent to ``m.update(a+b)``.

   .. versionchanged:: 3.1
      The Python GIL is released to allow other threads to run while hash
      updates on data larger than 2047 bytes is taking place when using hash
      algorithms supplied by OpenSSL.


.. method:: hash.digest()

   Return the digest of the data passed to the :meth:`update` method so far.
   This is a bytes object of size :attr:`digest_size` which may contain bytes in
   the whole range from 0 to 255.


.. method:: hash.hexdigest()

   Like :meth:`digest` except the digest is returned as a string object of
   double length, containing only hexadecimal digits.  This may be used to
   exchange the value safely in email or other non-binary environments.


.. method:: hash.copy()

   Return a copy ("clone") of the hash object.  This can be used to efficiently
   compute the digests of data sharing a common initial substring.


SHAKE variable length digests

The :func:`shake_128` and :func:`shake_256` algorithms provide variable
length digests with length_in_bits//2 up to 128 or 256 bits of security.
As such, their digest methods require a length. Maximum length is not limited
by the SHAKE algorithm.

.. method:: shake.digest(length)

   Return the digest of the data passed to the :meth:`update` method so far.
   This is a bytes object of size *length* which may contain bytes in
   the whole range from 0 to 255.


.. method:: shake.hexdigest(length)

   Like :meth:`digest` except the digest is returned as a string object of
   double length, containing only hexadecimal digits.  This may be used to
   exchange the value safely in email or other non-binary environments.


File hashing

The hashlib module provides a helper function for efficient hashing of
a file or file-like object.

.. function:: file_digest(fileobj, digest, /)

   Return a digest object that has been updated with contents of file object.

   *fileobj* must be a file-like object opened for reading in binary mode.
   It accepts file objects from  builtin :func:`open`, :class:`~io.BytesIO`
   instances, SocketIO objects from :meth:`socket.socket.makefile`, and
   similar. The function may bypass Python's I/O and use the file descriptor
   from :meth:`~io.IOBase.fileno` directly. *fileobj* must be assumed to be
   in an unknown state after this function returns or raises. It is up to
   the caller to close *fileobj*.

   *digest* must either be a hash algorithm name as a *str*, a hash
   constructor, or a callable that returns a hash object.

   Example:

      >>> import io, hashlib, hmac
      >>> with open(hashlib.__file__, "rb") as f:
      ...     digest = hashlib.file_digest(f, "sha256")
      ...
      >>> digest.hexdigest()  # doctest: +ELLIPSIS
      '...'

      >>> buf = io.BytesIO(b"somedata")
      >>> mac1 = hmac.HMAC(b"key", digestmod=hashlib.sha512)
      >>> digest = hashlib.file_digest(buf, lambda: mac1)

      >>> digest is mac1
      True
      >>> mac2 = hmac.HMAC(b"key", b"somedata", digestmod=hashlib.sha512)
      >>> mac1.digest() == mac2.digest()
      True

   .. versionadded:: 3.11


Key derivation

Key derivation and key stretching algorithms are designed for secure password
hashing. Naive algorithms such as sha1(password) are not resistant against
brute-force attacks. A good password hashing function must be tunable, slow, and
include a salt.

.. function:: pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None)

   The function provides PKCS#5 password-based key derivation function 2. It
   uses HMAC as pseudorandom function.

   The string *hash_name* is the desired name of the hash digest algorithm for
   HMAC, e.g. 'sha1' or 'sha256'. *password* and *salt* are interpreted as
   buffers of bytes. Applications and libraries should limit *password* to
   a sensible length (e.g. 1024). *salt* should be about 16 or more bytes from
   a proper source, e.g. :func:`os.urandom`.

   The number of *iterations* should be chosen based on the hash algorithm and
   computing power. As of 2022, hundreds of thousands of iterations of SHA-256
   are suggested. For rationale as to why and how to choose what is best for
   your application, read *Appendix A.2.2* of NIST-SP-800-132_. The answers
   on the `stackexchange pbkdf2 iterations question`_ explain in detail.

   *dklen* is the length of the derived key. If *dklen* is ``None`` then the
   digest size of the hash algorithm *hash_name* is used, e.g. 64 for SHA-512.

   >>> from hashlib import pbkdf2_hmac
   >>> our_app_iters = 500_000  # Application specific, read above.
   >>> dk = pbkdf2_hmac('sha256', b'password', b'bad salt' * 2, our_app_iters)
   >>> dk.hex()
   '15530bba69924174860db778f2c6f8104d3aaf9d26241840c8c4a641c8d000a9'

   Function only available when Python is compiled with OpenSSL.

   .. versionadded:: 3.4

   .. versionchanged:: 3.12
      Function now only available when Python is built with OpenSSL. The slow
      pure Python implementation has been removed.

.. function:: scrypt(password, *, salt, n, r, p, maxmem=0, dklen=64)

   The function provides scrypt password-based key derivation function as
   defined in :rfc:`7914`.

   *password* and *salt* must be :term:`bytes-like objects
   <bytes-like object>`.  Applications and libraries should limit *password*
   to a sensible length (e.g. 1024).  *salt* should be about 16 or more
   bytes from a proper source, e.g. :func:`os.urandom`.

   *n* is the CPU/Memory cost factor, *r* the block size, *p* parallelization
   factor and *maxmem* limits memory (OpenSSL 1.1.0 defaults to 32 MiB).
   *dklen* is the length of the derived key.

   .. versionadded:: 3.6


BLAKE2

.. sectionauthor:: Dmitry Chestnykh

.. index::
   single: blake2b, blake2s

BLAKE2 is a cryptographic hash function defined in RFC 7693 that comes in two
flavors:

  • BLAKE2b, optimized for 64-bit platforms and produces digests of any size
    between 1 and 64 bytes,
  • BLAKE2s, optimized for 8- to 32-bit platforms and produces digests of any
    size between 1 and 32 bytes.

BLAKE2 supports keyed mode (a faster and simpler replacement for HMAC),
salted hashing, personalization, and tree hashing.

Hash objects from this module follow the API of standard library’s
:mod:`hashlib` objects.

Creating hash objects

New hash objects are created by calling constructor functions:

.. function:: blake2b(data=b'', *, digest_size=64, key=b'', salt=b'', 
                person=b'', fanout=1, depth=1, leaf_size=0, node_offset=0,  
                node_depth=0, inner_size=0, last_node=False, 
                usedforsecurity=True)

.. function:: blake2s(data=b'', *, digest_size=32, key=b'', salt=b'', 
                person=b'', fanout=1, depth=1, leaf_size=0, node_offset=0,  
                node_depth=0, inner_size=0, last_node=False, 
                usedforsecurity=True)


These functions return the corresponding hash objects for calculating
BLAKE2b or BLAKE2s. They optionally take these general parameters:

  • data: initial chunk of data to hash, which must be
    :term:`bytes-like object`. It can be passed only as positional argument.
  • digest_size: size of output digest in bytes.
  • key: key for keyed hashing (up to 64 bytes for BLAKE2b, up to 32 bytes for
    BLAKE2s).
  • salt: salt for randomized hashing (up to 16 bytes for BLAKE2b, up to 8
    bytes for BLAKE2s).
  • person: personalization string (up to 16 bytes for BLAKE2b, up to 8 bytes
    for BLAKE2s).

The following table shows limits for general parameters (in bytes):

Hash digest_size len(key) len(salt) len(person)
BLAKE2b 64 64 16 16
BLAKE2s 32 32 8 8

Note

BLAKE2 specification defines constant lengths for salt and personalization
parameters, however, for convenience, this implementation accepts byte
strings of any size up to the specified length. If the length of the
parameter is less than specified, it is padded with zeros, thus, for
example, b'salt' and b'saltx00' is the same value. (This is not
the case for key.)

These sizes are available as module constants described below.

Constructor functions also accept the following tree hashing parameters:

  • fanout: fanout (0 to 255, 0 if unlimited, 1 in sequential mode).
  • depth: maximal depth of tree (1 to 255, 255 if unlimited, 1 in
    sequential mode).
  • leaf_size: maximal byte length of leaf (0 to 2**32-1, 0 if unlimited or in
    sequential mode).
  • node_offset: node offset (0 to 2**64-1 for BLAKE2b, 0 to 2**48-1 for
    BLAKE2s, 0 for the first, leftmost, leaf, or in sequential mode).
  • node_depth: node depth (0 to 255, 0 for leaves, or in sequential mode).
  • inner_size: inner digest size (0 to 64 for BLAKE2b, 0 to 32 for
    BLAKE2s, 0 in sequential mode).
  • last_node: boolean indicating whether the processed node is the last
    one (False for sequential mode).

Explanation of tree mode parameters.

See section 2.10 in BLAKE2 specification for comprehensive review of tree
hashing.

Constants

.. data:: blake2b.SALT_SIZE
.. data:: blake2s.SALT_SIZE

Salt length (maximum length accepted by constructors).

.. data:: blake2b.PERSON_SIZE
.. data:: blake2s.PERSON_SIZE

Personalization string length (maximum length accepted by constructors).

.. data:: blake2b.MAX_KEY_SIZE
.. data:: blake2s.MAX_KEY_SIZE

Maximum key size.

.. data:: blake2b.MAX_DIGEST_SIZE
.. data:: blake2s.MAX_DIGEST_SIZE

Maximum digest size that the hash function can output.

Examples

Simple hashing

To calculate hash of some data, you should first construct a hash object by
calling the appropriate constructor function (:func:`blake2b` or
:func:`blake2s`), then update it with the data by calling :meth:`update` on the
object, and, finally, get the digest out of the object by calling
:meth:`digest` (or :meth:`hexdigest` for hex-encoded string).

>>> from hashlib import blake2b
>>> h = blake2b()
>>> h.update(b'Hello world')
>>> h.hexdigest()
'6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'

As a shortcut, you can pass the first chunk of data to update directly to the
constructor as the positional argument:

>>> from hashlib import blake2b
>>> blake2b(b'Hello world').hexdigest()
'6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'

You can call :meth:`hash.update` as many times as you need to iteratively
update the hash:

>>> from hashlib import blake2b
>>> items = [b'Hello', b' ', b'world']
>>> h = blake2b()
>>> for item in items:
...     h.update(item)
...
>>> h.hexdigest()
'6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183'

Using different digest sizes

BLAKE2 has configurable size of digests up to 64 bytes for BLAKE2b and up to 32
bytes for BLAKE2s. For example, to replace SHA-1 with BLAKE2b without changing
the size of output, we can tell BLAKE2b to produce 20-byte digests:

>>> from hashlib import blake2b
>>> h = blake2b(digest_size=20)
>>> h.update(b'Replacing SHA1 with the more secure function')
>>> h.hexdigest()
'd24f26cf8de66472d58d4e1b1774b4c9158b1f4c'
>>> h.digest_size
20
>>> len(h.digest())
20

Hash objects with different digest sizes have completely different outputs
(shorter hashes are not prefixes of longer hashes); BLAKE2b and BLAKE2s
produce different outputs even if the output length is the same:

>>> from hashlib import blake2b, blake2s
>>> blake2b(digest_size=10).hexdigest()
'6fa1d8fcfd719046d762'
>>> blake2b(digest_size=11).hexdigest()
'eb6ec15daf9546254f0809'
>>> blake2s(digest_size=10).hexdigest()
'1bf21a98c78a1c376ae9'
>>> blake2s(digest_size=11).hexdigest()
'567004bf96e4a25773ebf4'

Keyed hashing

Keyed hashing can be used for authentication as a faster and simpler
replacement for Hash-based message authentication code (HMAC).
BLAKE2 can be securely used in prefix-MAC mode thanks to the
indifferentiability property inherited from BLAKE.

This example shows how to get a (hex-encoded) 128-bit authentication code for
message b'message data' with key b'pseudorandom key':

>>> from hashlib import blake2b
>>> h = blake2b(key=b'pseudorandom key', digest_size=16)
>>> h.update(b'message data')
>>> h.hexdigest()
'3d363ff7401e02026f4a4687d4863ced'

As a practical example, a web application can symmetrically sign cookies sent
to users and later verify them to make sure they weren’t tampered with:

>>> from hashlib import blake2b
>>> from hmac import compare_digest
>>>
>>> SECRET_KEY = b'pseudorandomly generated server secret key'
>>> AUTH_SIZE = 16
>>>
>>> def sign(cookie):
...     h = blake2b(digest_size=AUTH_SIZE, key=SECRET_KEY)
...     h.update(cookie)
...     return h.hexdigest().encode('utf-8')
>>>
>>> def verify(cookie, sig):
...     good_sig = sign(cookie)
...     return compare_digest(good_sig, sig)
>>>
>>> cookie = b'user-alice'
>>> sig = sign(cookie)
>>> print("{0},{1}".format(cookie.decode('utf-8'), sig))
user-alice,b'43b3c982cf697e0c5ab22172d1ca7421'
>>> verify(cookie, sig)
True
>>> verify(b'user-bob', sig)
False
>>> verify(cookie, b'0102030405060708090a0b0c0d0e0f00')
False

Even though there’s a native keyed hashing mode, BLAKE2 can, of course, be used
in HMAC construction with :mod:`hmac` module:

>>> import hmac, hashlib
>>> m = hmac.new(b'secret key', digestmod=hashlib.blake2s)
>>> m.update(b'message')
>>> m.hexdigest()
'e3c8102868d28b5ff85fc35dda07329970d1a01e273c37481326fe0c861c8142'

Randomized hashing

By setting salt parameter users can introduce randomization to the hash
function. Randomized hashing is useful for protecting against collision attacks
on the hash function used in digital signatures.

Randomized hashing is designed for situations where one party, the message
preparer, generates all or part of a message to be signed by a second
party, the message signer. If the message preparer is able to find
cryptographic hash function collisions (i.e., two messages producing the
same hash value), then they might prepare meaningful versions of the message
that would produce the same hash value and digital signature, but with
different results (e.g., transferring $1,000,000 to an account, rather than
$10). Cryptographic hash functions have been designed with collision
resistance as a major goal, but the current concentration on attacking
cryptographic hash functions may result in a given cryptographic hash
function providing less collision resistance than expected. Randomized
hashing offers the signer additional protection by reducing the likelihood
that a preparer can generate two or more messages that ultimately yield the
same hash value during the digital signature generation process — even if
it is practical to find collisions for the hash function. However, the use
of randomized hashing may reduce the amount of security provided by a
digital signature when all portions of the message are prepared
by the signer.

(NIST SP-800-106 «Randomized Hashing for Digital Signatures»)

In BLAKE2 the salt is processed as a one-time input to the hash function during
initialization, rather than as an input to each compression function.

Warning

Salted hashing (or just hashing) with BLAKE2 or any other general-purpose
cryptographic hash function, such as SHA-256, is not suitable for hashing
passwords. See BLAKE2 FAQ for more
information.

>>> import os
>>> from hashlib import blake2b
>>> msg = b'some message'
>>> # Calculate the first hash with a random salt.
>>> salt1 = os.urandom(blake2b.SALT_SIZE)
>>> h1 = blake2b(salt=salt1)
>>> h1.update(msg)
>>> # Calculate the second hash with a different random salt.
>>> salt2 = os.urandom(blake2b.SALT_SIZE)
>>> h2 = blake2b(salt=salt2)
>>> h2.update(msg)
>>> # The digests are different.
>>> h1.digest() != h2.digest()
True

Personalization

Sometimes it is useful to force hash function to produce different digests for
the same input for different purposes. Quoting the authors of the Skein hash
function:

We recommend that all application designers seriously consider doing this;
we have seen many protocols where a hash that is computed in one part of
the protocol can be used in an entirely different part because two hash
computations were done on similar or related data, and the attacker can
force the application to make the hash inputs the same. Personalizing each
hash function used in the protocol summarily stops this type of attack.

(The Skein Hash Function Family,
p. 21)

BLAKE2 can be personalized by passing bytes to the person argument:

>>> from hashlib import blake2b
>>> FILES_HASH_PERSON = b'MyApp Files Hash'
>>> BLOCK_HASH_PERSON = b'MyApp Block Hash'
>>> h = blake2b(digest_size=32, person=FILES_HASH_PERSON)
>>> h.update(b'the same content')
>>> h.hexdigest()
'20d9cd024d4fb086aae819a1432dd2466de12947831b75c5a30cf2676095d3b4'
>>> h = blake2b(digest_size=32, person=BLOCK_HASH_PERSON)
>>> h.update(b'the same content')
>>> h.hexdigest()
'cf68fb5761b9c44e7878bfb2c4c9aea52264a80b75005e65619778de59f383a3'

Personalization together with the keyed mode can also be used to derive different
keys from a single one.

>>> from hashlib import blake2s
>>> from base64 import b64decode, b64encode
>>> orig_key = b64decode(b'Rm5EPJai72qcK3RGBpW3vPNfZy5OZothY+kHY6h21KM=')
>>> enc_key = blake2s(key=orig_key, person=b'kEncrypt').digest()
>>> mac_key = blake2s(key=orig_key, person=b'kMAC').digest()
>>> print(b64encode(enc_key).decode('utf-8'))
rbPb15S/Z9t+agffno5wuhB77VbRi6F9Iv2qIxU7WHw=
>>> print(b64encode(mac_key).decode('utf-8'))
G9GtHFE1YluXY1zWPlYk1e/nWfu0WSEb0KRcjhDeP/o=

Tree mode

Here’s an example of hashing a minimal tree with two leaf nodes:

  10
 /  
00  01

This example uses 64-byte internal digests, and returns the 32-byte final
digest:

>>> from hashlib import blake2b
>>>
>>> FANOUT = 2
>>> DEPTH = 2
>>> LEAF_SIZE = 4096
>>> INNER_SIZE = 64
>>>
>>> buf = bytearray(6000)
>>>
>>> # Left leaf
... h00 = blake2b(buf[0:LEAF_SIZE], fanout=FANOUT, depth=DEPTH,
...               leaf_size=LEAF_SIZE, inner_size=INNER_SIZE,
...               node_offset=0, node_depth=0, last_node=False)
>>> # Right leaf
... h01 = blake2b(buf[LEAF_SIZE:], fanout=FANOUT, depth=DEPTH,
...               leaf_size=LEAF_SIZE, inner_size=INNER_SIZE,
...               node_offset=1, node_depth=0, last_node=True)
>>> # Root node
... h10 = blake2b(digest_size=32, fanout=FANOUT, depth=DEPTH,
...               leaf_size=LEAF_SIZE, inner_size=INNER_SIZE,
...               node_offset=0, node_depth=1, last_node=True)
>>> h10.update(h00.digest())
>>> h10.update(h01.digest())
>>> h10.hexdigest()
'3ad2a9b37c6070e374c7a8c508fe20ca86b6ed54e286e93a0318e95e881db5aa'

Credits

BLAKE2 was designed by Jean-Philippe Aumasson, Samuel Neves, Zooko
Wilcox-O’Hearn
, and Christian Winnerlein based on SHA-3 finalist BLAKE
created by Jean-Philippe Aumasson, Luca Henzen, Willi Meier, and
Raphael C.-W. Phan.

It uses core algorithm from ChaCha cipher designed by Daniel J. Bernstein.

The stdlib implementation is based on pyblake2 module. It was written by
Dmitry Chestnykh based on C implementation written by Samuel Neves. The
documentation was copied from pyblake2 and written by Dmitry Chestnykh.

The C code was partly rewritten for Python by Christian Heimes.

The following public domain dedication applies for both C hash function
implementation, extension code, and this documentation:

To the extent possible under law, the author(s) have dedicated all copyright
and related and neighboring rights to this software to the public domain
worldwide. This software is distributed without any warranty.

You should have received a copy of the CC0 Public Domain Dedication along
with this software. If not, see
https://creativecommons.org/publicdomain/zero/1.0/.

The following people have helped with development or contributed their changes
to the project and the public domain according to the Creative Commons Public
Domain Dedication 1.0 Universal:

  • Alexandr Sokolovskiy
.. seealso::

   Module :mod:`hmac`
      A module to generate message authentication codes using hashes.

   Module :mod:`base64`
      Another way to encode binary hashes for non-binary environments.

   https://www.blake2.net
      Official BLAKE2 website.

   https://csrc.nist.gov/csrc/media/publications/fips/180/2/archive/2002-08-01/documents/fips180-2.pdf
      The FIPS 180-2 publication on Secure Hash Algorithms.

   https://en.wikipedia.org/wiki/Cryptographic_hash_function#Cryptographic_hash_algorithms
      Wikipedia article with information on which algorithms have known issues and
      what that means regarding their use.

   https://www.ietf.org/rfc/rfc8018.txt
      PKCS #5: Password-Based Cryptography Specification Version 2.1

   https://nvlpubs.nist.gov/nistpubs/Legacy/SP/nistspecialpublication800-132.pdf
      NIST Recommendation for Password-Based Key Derivation.

Learn to calculate the Hash of a file in Python, with examples. It is also called the file checksum or digest. A checksum hash is an encrypted sequence of characters obtained after applying certain algorithms and manipulations on user-provided content.

Read More: File checksum in Java

1. Hash Algorithms

Hash functions are cryptographic functions that produces a fixed length/format encoded output for any given input. For any given input, a given hash function will always generate the same output – no matter how many times we execute the hash function.

Hash functions are generally applied over passwords and store then encrypted strings in database. In this way, user password is never stored in any plain text form in the system and makes security robust.

Similarly, when want to transfer a secure file to another host – we also pass the file digest information so remote host can generate the digest from transferred file and match with digest sent from source host. It both matches, file content has not been tempered.

we can read more about individual hashing algorithms in detail. In this post, we will only see the examples of various hashing algorithms.

2. Python File Hash Algorithms

We can use the hashlib module which provides necessary methods for generating various kind of hashes.

  • To know all available algorithms provided by this module, use its algorithms_guaranteed property.
  • The update() function to append byte message to the secure hash value.
  • The digest() or hexdigest() functions we can get the secure hash.
import hashlib

print(hashlib.algorithms_guaranteed)

Program output.

{'shake_256', 'sha1', 'blake2s', 'sha3_224', 'sha512', 
 'sha3_256', 'sha224', 'sha256', 'sha3_384', 
 'sha384', 'blake2b', 'sha3_512', 'shake_128', 'md5'}

3. Pyhton File Hash Examples

Lets see few examples to generate hash of a given file in Python.

Example 1: Find SHA256 Hash of a File in Python

Python program to find the SHA256 hash of a given file.

import hashlib # hashlib module
import os.path # For file handling
from os import path

def hash_file(filename):

	if path.isfile(filename) is False:
		raise Exception("File not found for hash operation") 

	# make a hash object
	h_sha256 = hashlib.sha256()

	# open file for reading in binary mode
	with open(filename,'rb') as file:

		# read file in chunks and update hash
		chunk = 0
		while chunk != b'':
			chunk = file.read(1024)	
			h_sha256.update(chunk)

	# return the hex digest
	return h_sha256.hexdigest()

####### Example Usage
message = hash_file("Python")
print(message)

Program output.

8964ed69cac034a6bc88ad33089500b6a26a62f19e1574f2f8fbfddeb9a30667

Example 2: Find SHA Hash of a File in Python (Not recommended)

Python program to find the SHA1 hash of a given file.

import hashlib # hashlib module
import os.path # For file handling
from os import path

def hash_file(filename):

	if path.isfile(filename) is False:
		raise Exception("File not found for hash operation") 

	# make a hash object
	h = hashlib.sha1()

	# open file for reading in binary mode
	with open(filename,'rb') as file:

		# read file in chunks and update hash
		chunk = 0
		while chunk != b'':
			chunk = file.read(1024)	
			h.update(chunk)

	# return the hex digest
	return h.hexdigest()

####### Example Usage
message = hash_file("Python")
print(message)

Program output.

498fd2d318447f9d0fac30c6e0997c03861c679b

Example 3: Find MD5 Hash of a File in Python (Not recommended)

Python program to find MD5 SHA1 hash of a given file.

import hashlib # hashlib module
import os.path # For file handling
from os import path

def hash_file(filename):

	if path.isfile(filename) is False:
		raise Exception("File not found for hash operation") 

	# make a hash object
	md5_h = hashlib.md5()

	# open file for reading in binary mode
	with open(filename,'rb') as file:

		# read file in chunks and update hash
		chunk = 0
		while chunk != b'':
			chunk = file.read(1024)	
			md5_h.update(chunk)

	# return the hex digest
	return md5_h.hexdigest()

####### Example Usage
message = hash_file("Python")
print(message)

Program output.

ee6dcaf64b270667125a33f9b7bebb75

Happy Learning !!

Понравилась статья? Поделить с друзьями:
  • Как найти клонов в инстаграм
  • Трей стоун breakpoint как найти
  • Как найти площадь трапеции по трем сторонам
  • Как найти сумму углов пятиугольника равна
  • Как найти свой imei по серийному номеру