Ignore confidentiality, availability for now, discussed later.
Example: 100 GB data, can read/write anywhere, want to detect tampering anywhere
Physical security
- Disk in safe, keep in your pocket
- con: expensive, no remote access
- Better: leverage small, secure medium & use large, insecure storage server
Checksums/hashes
- Store hash of data on small device
- Standard checksum: good at detecting random errors, bad at engineered errors
Better: cryptographic hash
- Efficient to compute
- One-way (i.e., given x and H(x), no efficient way to find an x' such that H(x') = H(x) )
- One-way
- Vulnerable to random-guessing attack (and birthday attack)
- 160 bit hash means 1 in 2^160 chance for 2 inputs to have same output H(x).
- After trying 2^159 inputs, have a 50-50 chance of success (finding two x's with same hash)
- Have MANY uses
Are hashes irreversible? We don't know. So far, have been difficult to crack.
SHA-1 Hash Function
- Run chunks of data through tricky "compression function" many times
- For any data size, returns 160 bit hash
Have a 20 byte hash (160 bits, output of SHA-1) over 1024 bytes of data (roughly, 50 times compression)
Then, repeat trick. Have a 20 byte hash for groups of hashes (another 50x compression). Eventually, get a 20 byte hash over ALL the hashes. Just secure this.
To update: recompute hash over 1k
- Fix this hash in table, recompute parent hash
- lgN number of fixups
Why 1k? If over a huge amount (100k) then tree very shallow. Takes a long time to recompute hash if simple change was made.
Why not less than 1k? If too low, tree very deep (not much compression). End up wasting a lot of space storing hashes.
Result:
- about 3% overhead to store hashes
- The hashes can be stored insecurely. Tampering will be detected.
- Don't get something for nothing (just transform one security problem into another).
- Instead of needing 2GB of secure storage, need 20 bytes
- Amplifies what security you have
- Works with high probability (can be defeated with guessing)
- Just make probability of guessing correctly very, very small
- Based on unproven mathematical conjectures