Some VIA processors support hardware accelleration of various cryptographic algorithms, e.g. AES, SHA1 or SHA256. The hardware support offers a superior performance over pure software implementations.
On this page you can get phe_sum
, a simple tool
that aims to replace coreutils'
sha1sum
program on systems with hardware SHA1/SHA256
support. The interface, options and output format is the same as with
the original sha1sum
which enables for a drop-in
replacement.
Please refer to the PadLock project homepage for other patches, tools and informations.
phe_sum
source codeto be added
PHE instructions available in VIA C7 processors (called
xsha1
and xsha256
) can't
work in the init, update, update, update, ..., final mode that
is common when hashing large amounts of data, or when we don't know in
advance how much data we'll get (e.g. when it's coming over the
network). Instead PHE instructions always try to finalize the
hash which makes subsequent updates impossible. In other words - you
need to load all your data into memory first and then run PHE
once to get the digest. The question is what happens when checksumming
e.g. DVD image, that is not only bigger than your physical RAM but
even bigger than the whole virtual address space for a single 32bit
process. You simply can't load it into memory, which means you can't
compute the hash in hardware, which means you'll have to fall back to
software implementation, which obviously means it will take
ages to get the result. All right, what now?
This idea comes from Andy
Polyakov. PHE saves its current state into a memory on
every process switch and as well on any page fault that occurs during
the run. This state includes number of bytes hashed and an intermediate
result that could be used as an initial value for subsequent rounds.
So far so good. The only remaining question is how to trigger a
context switch or a page fault at the place we need. Solution: mmap(2)
two or more pages, mprotect(2) the last one to deny all access
(PROT_NONE). This creates an inaccessible piece of memory exactly at
the place we need. Now we put all our input data just before this
barrier and engage PHE. However we'll tell it to
hash slightly more data than we put into the buffer. With these
instructions PHE will crunch all our input and attempt to hash some
more. At that point it hits the protected area, trigges an
exception, saves current intermediate status into the memory and calls
the exception handler (well, not exactly and not exactly in this
order, never mind ;-). Anyway the exception handler skips over the PHE
instruction (hacky hack, EIP+=4 ;-) and returns. This way we get
non-finalized result that can be fed into PHE as initial value for
the next update. Repeat and repeat and hash terabytes of data at the
hardware speed. Finalizing will be done half-manually / half-hardware
at the end. See the functions padlock_sha1_nonfinalizing()
,
segv_action()
and padlock_sha1_final()
in the source for more details.