277 lines
18 KiB
Markdown
277 lines
18 KiB
Markdown
# Foldhash
|
|
|
|
This repository contains foldhash, a fast, non-cryptographic, minimally
|
|
DoS-resistant hashing algorithm implemented in Rust designed for computational
|
|
uses such as hash maps, bloom filters, count sketching, etc.
|
|
|
|
When should you **not** use foldhash:
|
|
|
|
- You are afraid of people studying your long-running program's behavior to
|
|
reverse engineer its internal random state and using this knowledge to create
|
|
many colliding inputs for computational complexity attacks. For more details
|
|
see the section "HashDoS resistance".
|
|
|
|
- You expect foldhash to have a consistent output across versions or
|
|
platforms, such as for persistent file formats or communication protocols.
|
|
|
|
- You are relying on foldhash's properties for any kind of security.
|
|
Foldhash is **not appropriate for any cryptographic purpose**.
|
|
|
|
Foldhash has two variants, one optimized for speed which is ideal for data
|
|
structures such as hash maps and bloom filters, and one optimized for
|
|
statistical quality which is ideal for algorithms such as
|
|
[HyperLogLog](https://en.wikipedia.org/wiki/HyperLogLog) and
|
|
[MinHash](https://en.wikipedia.org/wiki/MinHash).
|
|
|
|
Foldhash can be used in a `#![no_std]` environment by disabling its default
|
|
`"std"` feature.
|
|
|
|
|
|
## Performance
|
|
|
|
We evaluated foldhash against three commonly used hashes in Rust:
|
|
[aHash](https://github.com/tkaitchuck/aHash) v0.8.11,
|
|
[fxhash](https://github.com/cbreeden/fxhash) v0.2.1, and
|
|
[SipHash-1-3](https://en.wikipedia.org/wiki/SipHash), the default hash algorithm
|
|
in Rust at the time of writing. We evaluated both variants foldhash provides,
|
|
`foldhash-f` and `foldhash-q`, which correspond to `foldhash::fast` and
|
|
`foldhash::quality` in the crate respectively.
|
|
|
|
First we note that hashers with random state inflate the size of your `HashMap`,
|
|
which may or may not be important for your performance:
|
|
```rust
|
|
std::mem::size_of::<foldhash::HashMap<u32, u32>>() = 40 // (both variants)
|
|
std::mem::size_of::<ahash::HashMap<u32, u32>>() = 64
|
|
std::mem::size_of::<fxhash::FxHashMap<u32, u32>>() = 32
|
|
std::mem::size_of::<std::collections::HashMap<u32, u32>>() = 48
|
|
```
|
|
|
|
We tested runtime performance on two machines, one with a 2023 Apple M2 CPU, one
|
|
with a 2023 Intel Xeon Platinum 8481C server CPU, both with stable Rust 1.80.1.
|
|
Since one of our competitors (aHash) is reliant on AES-based instructions for
|
|
optimal performance we have included both a benchmark with and without
|
|
`-C target-cpu=native` for the Intel machine.
|
|
|
|
We tested across a wide variety of data types we consider representative of
|
|
types / distributions one might hash in the real world, in the context of a hash
|
|
table key:
|
|
|
|
- `u32` - random 32-bit unsigned integers,
|
|
- `u32pair` - pairs of random 32-bit unsigned integers,
|
|
- `u64` - random 64-bit unsigned integers,
|
|
- `u64pair` - pairs of random 64-bit unsigned integers,
|
|
- `u64lobits` - 64-bit unsigned integers where only the bottom 16 bits vary,
|
|
- `u64hibits` - 64-bit unsigned integers where only the top 16 bits vary,
|
|
- `ipv4` - [`std::net::Ipv4Addr`](https://doc.rust-lang.org/std/net/struct.Ipv4Addr.html), which is equivalent to `[u8; 4]`,
|
|
- `ipv6` - [`std::net::Ipv6Addr`](https://doc.rust-lang.org/std/net/struct.Ipv6Addr.html), which is equivalent to `[u8; 16]`,
|
|
- `rgba` - random `(u8, u8, u8, u8)` tuples,
|
|
- `strenglishword` - strings containing words sampled uniformly from the top 10,000 most common English words,
|
|
- `struuid` - random UUIDs, hashed in string representation,
|
|
- `strurl` - strings containing URLs sampled uniformly from a corpus of 10,000 URLs,
|
|
- `strdate` - random `YYYY-MM-DD` date strings,
|
|
- `accesslog` - `(u128, u32, chrono::NaiveDate, bool)`, meant to simulate a typical
|
|
larger compound type, in this case `(resource_id, user_id, date, success)`
|
|
for an access log.
|
|
- `kilobyte` - random bytestrings one kilobyte in length,
|
|
- `tenkilobyte` - random bytestrings ten kilobytes in length.
|
|
|
|
We tested the performance of hashing the above data types in the following four contexts:
|
|
|
|
- `hashonly` - only the time it takes to hash the value,
|
|
- `lookupmiss` - the time it takes to do a lookup in a 1,000 element hash map
|
|
of random elements, only sampling keys of which we know that are not in the hash map,
|
|
- `lookuphit` - similar to `lookupmiss`, except the keys are sampled from keys
|
|
known to be in the hash map,
|
|
- `setbuild` - the time it takes to construct a `HashSet` of 1,000 elements
|
|
from 1,000 randomly sampled elements each repeated 10 times (so 10,000 inserts,
|
|
with ~90% duplicates).
|
|
|
|
All times are reported as expected time per operation, so one hash, one lookup,
|
|
or one insert respectively. The full results [can be found
|
|
here](https://gist.github.com/orlp/1271ad5b8b775c651cc55773888858eb). To
|
|
summarize, we will only show the results for `u64` and `strenglishword` here, as
|
|
well as the observed geometric mean and average rank over the full benchmark.
|
|
|
|
```
|
|
Xeon 8481c
|
|
|
|
┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐
|
|
│ avg_rank ┆ 1.58 ┆ 2.66 ┆ 2.09 ┆ 3.70 ┆ 4.97 │
|
|
│ geometric_mean ┆ 6.21 ┆ 7.01 ┆ 7.56 ┆ 8.74 ┆ 28.70 │
|
|
╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡
|
|
│ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │
|
|
╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡
|
|
│ u64 ┆ hashonly ┆ 0.79 ┆ 1.03 ┆ 0.67 ┆ 1.23 ┆ 9.09 │
|
|
│ u64 ┆ lookupmiss ┆ 2.01 ┆ 2.44 ┆ 1.73 ┆ 2.73 ┆ 12.03 │
|
|
│ u64 ┆ lookuphit ┆ 3.04 ┆ 3.59 ┆ 2.64 ┆ 3.84 ┆ 12.65 │
|
|
│ u64 ┆ setbuild ┆ 6.13 ┆ 6.52 ┆ 4.88 ┆ 6.66 ┆ 17.80 │
|
|
| ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... |
|
|
│ strenglishword ┆ hashonly ┆ 2.63 ┆ 2.98 ┆ 3.24 ┆ 3.57 ┆ 11.87 │
|
|
│ strenglishword ┆ lookupmiss ┆ 4.63 ┆ 5.03 ┆ 4.51 ┆ 5.86 ┆ 15.19 │
|
|
│ strenglishword ┆ lookuphit ┆ 8.62 ┆ 9.25 ┆ 8.28 ┆ 10.06 ┆ 21.35 │
|
|
│ strenglishword ┆ setbuild ┆ 14.77 ┆ 15.57 ┆ 18.86 ┆ 15.72 ┆ 35.36 │
|
|
└────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘
|
|
|
|
Xeon 8481c with RUSTFLAGS="-C target-cpu=native"
|
|
|
|
┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐
|
|
│ avg_rank ┆ 1.89 ┆ 3.12 ┆ 2.25 ┆ 2.77 ┆ 4.97 │
|
|
│ geometric_mean ┆ 6.00 ┆ 6.82 ┆ 7.39 ┆ 6.94 ┆ 29.49 │
|
|
╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡
|
|
│ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │
|
|
╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡
|
|
│ u64 ┆ hashonly ┆ 0.79 ┆ 1.01 ┆ 0.67 ┆ 1.34 ┆ 9.24 │
|
|
│ u64 ┆ lookupmiss ┆ 1.68 ┆ 2.12 ┆ 1.62 ┆ 1.96 ┆ 12.04 │
|
|
│ u64 ┆ lookuphit ┆ 2.68 ┆ 3.19 ┆ 2.28 ┆ 3.16 ┆ 13.09 │
|
|
│ u64 ┆ setbuild ┆ 6.16 ┆ 6.42 ┆ 4.75 ┆ 7.03 ┆ 18.88 │
|
|
| ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... |
|
|
│ strenglishword ┆ hashonly ┆ 2.60 ┆ 2.97 ┆ 3.25 ┆ 3.04 ┆ 11.58 │
|
|
│ strenglishword ┆ lookupmiss ┆ 4.41 ┆ 4.96 ┆ 4.82 ┆ 4.79 ┆ 32.31 │
|
|
│ strenglishword ┆ lookuphit ┆ 8.68 ┆ 9.35 ┆ 8.46 ┆ 8.63 ┆ 21.48 │
|
|
│ strenglishword ┆ setbuild ┆ 15.01 ┆ 16.34 ┆ 19.34 ┆ 15.37 ┆ 35.22 │
|
|
└────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘
|
|
|
|
Apple M2
|
|
|
|
┌────────────────┬────────────┬────────────┬────────────┬─────────┬─────────┬─────────┐
|
|
│ avg_rank ┆ 1.62 ┆ 2.81 ┆ 2.02 ┆ 3.58 ┆ 4.97 │
|
|
│ geometric_mean ┆ 4.41 ┆ 4.86 ┆ 5.39 ┆ 5.71 ┆ 21.94 │
|
|
╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡
|
|
│ distr ┆ bench ┆ foldhash-f ┆ foldhash-q ┆ fxhash ┆ ahash ┆ siphash │
|
|
╞════════════════╪════════════╪════════════╪════════════╪═════════╪═════════╪═════════╡
|
|
│ u64 ┆ hashonly ┆ 0.60 ┆ 0.70 ┆ 0.41 ┆ 0.78 ┆ 6.61 │
|
|
│ u64 ┆ lookupmiss ┆ 1.50 ┆ 1.61 ┆ 1.23 ┆ 1.65 ┆ 8.28 │
|
|
│ u64 ┆ lookuphit ┆ 1.78 ┆ 2.10 ┆ 1.57 ┆ 2.25 ┆ 8.53 │
|
|
│ u64 ┆ setbuild ┆ 4.74 ┆ 5.19 ┆ 3.61 ┆ 5.38 ┆ 15.36 │
|
|
| ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... ┆ ... |
|
|
│ strenglishword ┆ hashonly ┆ 1.84 ┆ 2.13 ┆ 1.85 ┆ 2.13 ┆ 11.61 │
|
|
│ strenglishword ┆ lookupmiss ┆ 2.71 ┆ 2.96 ┆ 2.47 ┆ 2.99 ┆ 9.27 │
|
|
│ strenglishword ┆ lookuphit ┆ 7.54 ┆ 8.77 ┆ 7.83 ┆ 8.77 ┆ 18.65 │
|
|
│ strenglishword ┆ setbuild ┆ 16.61 ┆ 17.09 ┆ 14.83 ┆ 16.52 ┆ 26.42 │
|
|
└────────────────┴────────────┴────────────┴────────────┴─────────┴─────────┴─────────┘
|
|
```
|
|
|
|
We note from the above benchmark that for hash table performance the extra
|
|
quality that `foldhash-q` provides is almost never actually worth the small but
|
|
also non-negligible computational overhead it has over `foldhash-f`. This is our
|
|
justification for providing `foldhash::fast` as a default choice for hash
|
|
tables, even though it has measurable biases (see also the "Quality" section).
|
|
|
|
fxhash generally does fairly well for small inputs on the benchmarks, however it
|
|
has structural weaknesses as a hash which makes it ill-advised to use as a
|
|
general-purpose hash function in our opinion. For example the `lookuphit`
|
|
benchmark on Apple M2 for `u64hibits` takes 1.77 nanoseconds per lookup for
|
|
foldhash, but 67.72 nanoseconds for fxhash (due to everything colliding - the
|
|
effects would be even worse with a larger hash map). In our opinion foldhash-f
|
|
strikes the right balance between quality and performance for hash tables,
|
|
whereas fxhash flies a bit too close to the sun.
|
|
|
|
aHash is faster than foldhash for medium-long strings when compiled with
|
|
AES instruction support, but is slower in almost every other scenario or when
|
|
AES instructions are unavailable.
|
|
|
|
|
|
## Quality
|
|
|
|
Foldhash-f is a fairly strong hash in terms of collisions *on its full 64-bit
|
|
output*. However, statistical tests such as
|
|
[SMHasher3](https://gitlab.com/fwojcik/smhasher3) can distinguish it from an ideal
|
|
hash function in tests that focus on the relationship between individual
|
|
input/output bits. One such property is avalanching: changing a single bit in
|
|
the input does not flip every other bit with 50% probability when using
|
|
foldhash-f like it should if it behaved like a proper random oracle.
|
|
|
|
As the benchmarks above show, spending more effort in foldhash-f to improve the
|
|
hash quality does not lead to better hash table performance. However, there are
|
|
also use cases for hash functions where it is important that (each bit of) the
|
|
hash is unbiased and a random function of all bits of the input, such as in
|
|
algorithms as HyperLogLog or MinHash.
|
|
|
|
For this purpose we also provide foldhash-q, which is simply a post-processed
|
|
version of foldhash-f to properly avalanche all the bits. Foldhash-q passes the
|
|
[SMHasher3](https://gitlab.com/fwojcik/smhasher3) test suite [without any
|
|
failures](https://github.com/orlp/foldhash_smhasher3). You can also plot the
|
|
worst-case probability (where 50% is ideal) that any particular output bit flips
|
|
if you flip an input bit, which nicely visualizes how fxhash and foldhash-f
|
|
fail this avalanche property but foldhash-q and SipHash-1-3 pass:
|
|
|
|
|
|
| FxHash | Foldhash-f | Foldhash-q | SipHash-1-3 |
|
|
|--------|------------|------------|-------------|
|
|
| <img src="assets/avalanche-fxhash.png" width=300> | <img src="assets/avalanche-foldhash-fast.png" width=300> | <img src="assets/avalanche-foldhash-quality.png" width=300> | <img src="assets/avalanche-siphash.png" width=300>
|
|
|
|
|
|
## Background
|
|
|
|
The name foldhash is derived from the *folded multiply*. This technique
|
|
compresses two 64-bit words into a single 64-bit word while simultaneously
|
|
thoroughly mixing the bits. It does this using a 64 x 64 bit -> 128 bit
|
|
multiplication followed by folding the two halves of the 128-bit product
|
|
together using a XOR operation:
|
|
|
|
```rust
|
|
let full = (x as u128) * (y as u128);
|
|
let lo = full as u64;
|
|
let hi = (full >> 64) as u64;
|
|
let folded = lo ^ hi;
|
|
```
|
|
|
|
We're not aware of a formal analysis of this operation, but empirically it works
|
|
very well. An informal intuition for why it should work is that multiplication
|
|
can be seen as the sum of many shifted copies of one of the arguments, only
|
|
including those shifted copies where the other argument has set bits, e.g. for
|
|
multiplying 4-bit words `abcd` and `efgh`:
|
|
|
|
```
|
|
abcd * efgh =
|
|
|
|
abcd * e
|
|
abcd * f
|
|
abcd * g
|
|
abcd * h
|
|
--------------- +
|
|
```
|
|
|
|
Note that the middle bits of the product are a function of many of the input
|
|
bits, whereas the top-most and bottom-most bits are impacted by fewer of the
|
|
input bits. By folding the top half back onto the bottom half these effects
|
|
compensate each other, making all the output bits affected by much of the input.
|
|
|
|
We did not invent the folded multiply, it was previously used in essentially the
|
|
same way in [aHash](https://github.com/tkaitchuck/aHash),
|
|
[wyhash](https://github.com/wangyi-fudan/wyhash), and
|
|
[xxhash3](https://github.com/Cyan4973/xxHash). The operation was also used
|
|
in [mum-hash](https://github.com/vnmakarov/mum-hash), and probably others.
|
|
We do not know who originally invented it, the earliest reference
|
|
we could find was Steven Fuerst [blogging about it](https://web.archive.org/web/20121213174842/http://locklessinc.com/articles/crypto_hash/)
|
|
in 2012.
|
|
|
|
|
|
## HashDoS resistance
|
|
|
|
The folded multiply has a fairly glaring flaw: if one of the halves is zero, the
|
|
output is zero. This makes it trivial to create a large number of hash
|
|
collisions (even by accident, as zeroes are a common input to hashes). To combat
|
|
this, every folded multiply in foldhash has the following form:
|
|
|
|
```rust
|
|
folded_multiply(input1 ^ secret1, input2 ^ secret2)
|
|
```
|
|
|
|
Here `secret1` or `secret2` are either secret random numbers generated by
|
|
foldhash beforehand, or partial hash results influenced by such a secret prior.
|
|
This (plus other careful design throughout the hash function) ensures that it is
|
|
not possible to create a list of inputs that collide for every instance of
|
|
foldhash, and also prevents certain access patterns on hash tables going
|
|
quadratric by ensuring that each hash table uses a different seed and thus a
|
|
different access pattern. It is these two properties that we refer to when we
|
|
claim foldhash is "minimally DoS-resistant": it does the bare minimum to defeat
|
|
very simple attacks.
|
|
|
|
However, to be crystal clear, **foldhash does not claim to provide HashDoS
|
|
resistance against interactive attackers**. For a student of cryptography it
|
|
should be trivial to derive the secret values from direct observation of hash
|
|
outputs, and feasible to derive the secret values from indirect observation of
|
|
hashes, such as through timing attacks or hash table iteration. Once an attacker
|
|
knows the secret values, they can once again create infinite hash collisions
|
|
with ease.
|