phe_sum: PadLock Hash Engine SHA1/SHA256 checksum tool

Some VIA processors support hardware accelleration of various cryptographic algorithms, e.g. AES, SHA1 or SHA256. The hardware support offers a superior performance over pure software implementations.

On this page you can get phe_sum, a simple tool that aims to replace coreutils' sha1sum program on systems with hardware SHA1/SHA256 support. The interface, options and output format is the same as with the original sha1sum which enables for a drop-in replacement.

Please refer to the PadLock project homepage for other patches, tools and informations.

Download

phe_sum source code
phe_sum.c (Colorized)

Benchmarks

to be added

Technical background

PHE instructions available in VIA C7 processors (called xsha1 and xsha256) can't work in the init, update, update, update, ..., final mode that is common when hashing large amounts of data, or when we don't know in advance how much data we'll get (e.g. when it's coming over the network). Instead PHE instructions always try to finalize the hash which makes subsequent updates impossible. In other words - you need to load all your data into memory first and then run PHE once to get the digest. The question is what happens when checksumming e.g. DVD image, that is not only bigger than your physical RAM but even bigger than the whole virtual address space for a single 32bit process. You simply can't load it into memory, which means you can't compute the hash in hardware, which means you'll have to fall back to software implementation, which obviously means it will take ages to get the result. All right, what now?

This idea comes from Andy Polyakov. PHE saves its current state into a memory on every process switch and as well on any page fault that occurs during the run. This state includes number of bytes hashed and an intermediate result that could be used as an initial value for subsequent rounds. So far so good. The only remaining question is how to trigger a context switch or a page fault at the place we need. Solution: mmap(2) two or more pages, mprotect(2) the last one to deny all access (PROT_NONE). This creates an inaccessible piece of memory exactly at the place we need. Now we put all our input data just before this barrier and engage PHE. However we'll tell it to hash slightly more data than we put into the buffer. With these instructions PHE will crunch all our input and attempt to hash some more. At that point it hits the protected area, trigges an exception, saves current intermediate status into the memory and calls the exception handler (well, not exactly and not exactly in this order, never mind ;-). Anyway the exception handler skips over the PHE instruction (hacky hack, EIP+=4 ;-) and returns. This way we get non-finalized result that can be fed into PHE as initial value for the next update. Repeat and repeat and hash terabytes of data at the hardware speed. Finalizing will be done half-manually / half-hardware at the end. See the functions padlock_sha1_nonfinalizing(), segv_action() and padlock_sha1_final() in the source for more details.

Place for your feedback...
15th May 2005 at 10:43
HD encryption performance insights
Hi,
in the current issue (11/2005) of the german computer magazine C't there is a interesting article about PadLock.

They have tested harddisk en-/decryption using cryptoloop using a 1GHz C3 (EPIA MS 1000E, 512MB RAM, 80GB HD Maxtor DiamondMax Plus 9, Kernel 2.6.11, ext3). Copying data they measured 19.3 MB/s without encryption, 6.6 MB/s using software encryption and 9.6 MB/s using padlock.
They suspect the low transfer rate using padlock is caused by misalignment of the data in the memory. PadLock needs the data to be 16 byte aligned (start address of the datablock can be divided by 16 without remainder). If cryptoloop/dm-crypt uses malloc(), the datablock will be 4 byte aligned. So statistically the padlock driver has to realign the data 3 of 4 times, which slows it down.

If these assumptions are correct and the realignment problem causes a noteworthy performance drop, can't the dm-crypt driver be patched so the memory it allocates for the data will be 16 byte aligned (e.g. by replacing malloc() with memalign() of the GNU C library) ?

They have also discovered that PadLock only achieves its high en-/decryption performance if the size of the data is not greater than the L2-cache size (max. performance around 32k datasize, ECB 256key >1000MB/s, CBC / CFB 256key > 400MB/s). If the datasize is greater than 64K the performance drops to 100-140MB/s. In this case the memory connection seems to be the bottleneck, with means using a mainboard with CN400 chipset and DDR400 RAM would be a good idea.
The performance drop will most likely be noteworthy for per file encryption (filedata encrypted en-block, datablock size = file size) with files greater than 64KB. To my understanding cryptoloop/dm-crypt works by encrypting the sectors of the harddisk (independently). So the datablock size will be the sector size. If the sector size is lesser than 64KB, there will be no memory transfer necessary _during_ the encryption, only before and after. This decoupling may attenuate the performance problem a bit.

The url of the used benchmark software is http://www.heise.de/ct/ftp/05/11/230/
May 16   9:31 via padlock support (by daniel w.)
Jun 25   22:18 Re: via padlock support (by anonymous)
Jun 8   17:17 Why is it so slow? (by john)
Jun 8   17:23 Re: Why is it so slow? (by Michal Ludvig)
Jan 26   14:37 Re: Re: Why is it so slow? (by Frank)
Aug 8   20:24 Updates? (by Sadara)
Oct 2   19:51 I like it (by Markus)
Oct 3   22:13 Re: I like it (by Michal Ludvig)
Oct 26   22:09 padlock gentoo (by flipstar)
Oct 27   18:52 hardware entropy support (by coderman)
Mar 14   17:16 Via padlock transparent (by Ben Jones)
Mar 14   21:25 Re: Via padlock transparent (by Michal Ludvig)
Mar 15   17:01 Re: Re: Via padlock transparent (by Ben Jones)
Apr 8   18:16 multiblock vs. singleblock (by Arnd)
Apr 10   18:13 Re: multiblock vs. singleblock (by Arnd)
Apr 10   22:17 Re: Re: multiblock vs. singleblock (by Michal Ludvig)
May 15   10:43 HD encryption performance insights (by EvilOverlord)
Nov 29   4:35 Re: HD encryption performance insights (by klaus_kleber)
Jun 29   13:25 How can I confirm.. (by james c)
Apr 17   23:34 Re: How can I confirm.. (by Fabien Wernli)
Apr 17   23:52 Re: Re: How can I confirm.. (by Michal)
Dec 17   13:26 Re: Re: How can I confirm.. (by Arnd Hannemann)
Jul 10   22:43 padlock detection failed (by padlock detection failed)
Jul 10   22:45 Re: padlock detection failed (by padlock detection failed)
Jul 10   22:50 Re: padlock detection failed (by Michal Ludvig)
Jul 11   20:02 Re: padlock detection failed (by padlock detection failed)
Jul 11   21:10 Re: Re: padlock detection failed (by padlock detection failed)
Jul 11   23:39 Re: Re: Re: padlock detection failed (by Michal Ludvig)
Jul 14   14:48 no-RNG, ACE (by Goetz Bock)
Jul 24   21:11 Re: no-RNG, ACE (by Michal Ludvig)
Aug 3   3:30 Re: Re: no-RNG, ACE (by ET Tan)
Feb 16   21:50 Re: Re: no-RNG, ACE--openssl (by anonymous)
Sep 15   21:40 Re: Re: Re: no-RNG, ACE--openssl (by anonymous)
Oct 9   5:27 Padlock vpn support? (by Matt S)
Jan 10   16:57 xstore in etherboot (by Robert Hamilton)
Jan 10   21:05 Re: xstore in etherboot (by Robert Hamilton)
Jan 11   0:12 Re: Re: xstore in etherboot (by Robert Hamilton)
Feb 6   4:56 new DP-310 dual-cpu-boards (by Stephan)
Feb 6   9:24 Re: new DP-310 dual-cpu-boards (by Michal)
Feb 24   3:02 Re: Re: new DP-310 dual-cpu-boards (by Stephan)
Dec 21   11:28 Re: Re: new DP-310 dual-cpu-boards (by Witek Baryluk)
Feb 17   7:36 selecting aes type (by udo)
Feb 17   12:50 patching OpenSSL 0.9.7f (by udo)
Sep 16   0:56 Re: patching OpenSSL 0.9.7f (by anonymous)
Feb 17   16:34 Further openssl investigations (by udo)
Mar 24   17:20 openssl 0.9.8a-5.2 on FC5 (by udo)
Mar 26   14:22 openssl on FC5 (by udo)
Apr 27   15:28 optimisation - large blocks (by peter)
Apr 27   22:59 Re: optimisation - large blocks (by Michal Ludvig)
May 8   15:57 Padlock by default (by anonymous)
Jul 25   2:50 Re: Padlock by default (by Michal Ludvig)
Sep 13   11:28 OpenSSH Patch (by Michael)
Sep 17   15:23 Re: OpenSSH Patch (by anonymous)
Sep 29   19:30 Re: OpenSSH Patch (by G.)
Sep 23   19:08 Buggy OpenSSL AES-CFB decryption? (by François)
Oct 19   2:37 Re: Buggy OpenSSL AES-CFB decryption? (by eloj)
Oct 23   17:31 Re: Re: Buggy OpenSSL AES-CFB decryption? (by Lasse Bigum)
Jun 5   20:04 Re: Re: Re: Buggy OpenSSL AES-CFB decryption? (by TzyWPcKrZdNJNZaPMA)
Jul 3   23:37 Re: Re: Re: Buggy OpenSSL AES-CFB decryption? (by NcrPuYBjIAjrt)
Jul 9   6:45 Re: Re: Re: Buggy OpenSSL AES-CFB decryption? (by MjXRPpVkySfyWdQfl)
Jul 19   4:00 Re: Re: Re: Buggy OpenSSL AES-CFB decryption? (by QIZjmgzdbGJHqzjbD)
Jul 25   1:34 Re: Re: Re: Buggy OpenSSL AES-CFB decryption? (by CuyhSfENyKKojT)
Nov 27   12:42 Re: Re: Buggy OpenSSL AES-CFB decryption? (by honx)
Dec 22   20:23 libpadlock.so missing (by Markus Koetter)
Dec 28   11:33 Re: libpadlock.so missing (by Michal)
Dec 29   7:52 Re: Re: libpadlock.so missing (by Markus)
Dec 25   11:30 missing engine (by Franz Wudy)
Dec 26   18:34 Re: missing engine (by Markus)
Dec 28   7:58 patches online :) (by Markus)
Apr 9   14:32 Re: patches online :) (by Markus Kötter)
Jan 14   20:35 openssl -engine padlock is somewhat slow (by ONes)
Jan 22   0:27 Re: openssl -engine padlock is somewhat slow (by Zoidberg)
Jan 22   0:48 Re: Re: openssl -engine padlock is somewhat slow (by Michal Ludvig)
Aug 13   23:42 Re: Re: Re: openssl -engine padlock is somewhat slow (by Ove Andersen)
Mar 26   15:51 Re: openssl -engine padlock is somewhat slow (by Daniele)
Jan 18   22:47 Apply benchmark to "real world" file encryption (by H. Latzko)
Feb 3   18:11 gpg can use padlock ace? (by udo)
Feb 4   13:58 Re: gpg can use padlock ace? (by Michal Ludvig)
Feb 4   13:41 VIA PadLock: RNG ACE2 PHE(8192) PMM (by udo)
Feb 4   13:56 Re: VIA PadLock: RNG ACE2 PHE(8192) PMM (by Michal Ludvig)
Jun 20   12:21 Re: Re: VIA PadLock: RNG ACE2 PHE(8192) PMM (by Daniel Kalchev)
May 13   9:59 aes-types when using cryptsetup-luks? (by udo)
May 13   11:33 Re: aes-types when using cryptsetup-luks? (by Michal)
May 23   8:02 OpenVPN/padlock.so (by Prasanna)
Oct 2   1:22 openssl 0.9.8e is horrible slow (by Markus Kötter)
May 26   1:35 latest patch for ssh (by Tofu)
May 26   7:55 Re: latest patch for ssh (by Michal Ludvig)
Jun 8   20:29 Re: latest patch for ssh (by eloj)
Aug 4   0:50 OpenSSH SCP no speed diff (by Justin)
Aug 13   0:30 Re: OpenSSH SCP no speed diff (by horst)
Oct 15   2:40 Re: OpenSSH SCP no speed diff (by Denny)
Aug 6   9:07 Error in FC7 (by warren)
Aug 6   17:24 Re: Error in FC7 (by warren)
Aug 28   22:02 RNG Hack (by henric)
Feb 8   23:43 Re: RNG Hack (by eloj)
Sep 2   10:45 patch for Linux 2.4? (by dennis khoo)
Nov 17   16:57 rng difference? (by udo)
Nov 18   22:43 Re: rng difference? (by Michal)
Nov 24   14:46 Re: Re: rng difference? (by udo)
Nov 24   14:36 phe_sum (by udo)
Jan 25   18:37 padlock in squid (by Konstantin Gavrilenko)
Jun 3   4:31 Re: padlock in squid (by Wade Mealing)
Aug 3   15:38 VIA releases PadLock documentation (by Lasse Bigum)
Aug 3   17:12 Re: VIA releases PadLock documentation (by Michal Ludvig)
Dec 1   9:22 speed differences not understood (by Paul)
Dec 1   9:27 Re: speed differences not understood (by Paul)
Jun 20   2:05 padlock_sha_copy(): malloc() failed (by udovdh)
Dec 21   11:14 HMAC slowdown, AES in Core i9 (by Witek Baryluk)
Feb 8   5:40 64 bit (by Carsten)
Sep 15   3:47 padlock for ubuntu 10.04 (source + binary) (by Ciaby)