Improved python gzip reading speed

Dealing with large files of protein trajectories, I realized that some of my python scripts are incredibly slow in comparison with c++ code. I noticed that unzipping a trajectory before reading is faster than using the gzip module to read directly from the gzipped file ^^.

I have five different approaches to benchmark the reading speed for the following two (same) files:

-rw-r--r-- 1 doep doep 2.4G Feb 15 16:05 traj.pdb
-rw-r--r-- 1 doep doep 609M Feb 15 15:59 traj.pdb.gz

Each runtime was measured twice using the real-time of the ‘time’ command. Each approach reads in every single line via:

while True:
    line = f.readline()
    if not line: break

The five methods are:

  1. Reading from uncompressed file via: open()
  2. Reading from uncompressed file using the io module: io.open()
  3. Reading from compressed file using the gzip module: gzip.open()
  4. Reading from compressed file using a small class based on the zlib module: zlib_file()
  5. Reading from compressed file using named pipes: os.mkfifo()

Results:

zlib

Conclusion:
Because storing/reading uncompressed file is not an option, the named pipes os.mkfifo() are the best/fastest solution for simply reading in files. But it also used the second system CPU, so the real-time is smaller than the user-time (90 +- 4.5). If you need seeks, etc you should extend the zlib_file class to your needs and gain a factor of ~2 in speedup. It is sad to see the performance of the gzip.open() approach, as ‘zcat¬† traj.pdb.gz > /dev/null’ took only 21.165 seconds.

For uncompressed reads, the open() approach is the faster one, but on a different machine things were different as io.open() was 20x times faster than the open(). So you should check the open() speed on your machine before using it.

Complete code:

"""This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
 
You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>."""
 
from __future__ import print_function
 
import io
import zlib
import sys
 
class zlib_file():
    def __init__(self, buffer_size=1024*1024*8):
        self.dobj = zlib.decompressobj(16+zlib.MAX_WBITS) #16+zlib.MAX_WBITS -> zlib can decompress gzip
        self.decomp = []
        self.lines = []
        self.buffer_size = buffer_size
 
    def open(self, filename):
        self.fhwnd = io.open(filename, "rb")
        self.eof = False
 
    def close(self):
        self.fhwnd.close()
        self.dobj.flush()
        self.decomp = []
 
    def decompress(self):
        raw = self.fhwnd.read(self.buffer_size)
        if not raw:
            self.eof = True
            self.decomp.insert(0, self.dobj.flush())
 
        else:
            self.decomp.insert(0, self.dobj.decompress(raw))
 
    def readline(self):
        #split
        out_str = []
 
        while True:
            if len(self.lines) > 0:
                return self.lines.pop() + "\n"
 
            elif len(self.decomp) > 0:
                out = self.decomp.pop()
                arr = out.split("\n")
 
                if len(arr) == 1:
                    out_str.append(arr[0])
 
                else:
                    self.decomp.append(arr.pop())
                    arr.reverse()
                    out_str.append(arr.pop())
                    self.lines.extend(arr)
 
                    out_str.append("\n")
                    return "".join(out_str)
 
            else:
                if self.eof: break
                self.decompress()
 
        if len(out_str) > 0:
            return "".join(out_str)
 
    def readlines(self):
        lines = []
        while True:
            line = self.readline()
            if not line: break
 
            lines.append(line)
 
        return lines
 
if __name__ == "__main__":
    mode = int(sys.argv[1])
 
    if mode == 1:
        f = open("traj.pdb")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
 
    elif mode == 2:
        f = io.open("traj.pdb")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
 
    elif mode == 3:
        import gzip
        gz = gzip.open(filename="traj.pdb.gz", mode="r")
 
        while True:
            line = gz.readline()
            if not line: break
 
        gz.close()
 
    if mode == 4:
        f = zlib_file()
        f.open("traj.pdb.gz")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
 
    elif mode == 5:
        import os
        import subprocess
 
        tmp_fifo = "tmp_fifo"
 
        os.mkfifo(tmp_fifo)
 
        p = subprocess.Popen("gzip --stdout -d traj.pdb.gz > %s" % tmp_fifo, shell=True)
        f = io.open(tmp_fifo, "r")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
        p.wait()
 
        os.remove(tmp_fifo)

VN:F [1.9.22_1171]
Rating: 7.5/10 (2 votes cast)

Password generator

If you also need a new WPA password for your router, try this:

doep ~ # cat /dev/urandom | tr -dc a-zA-Z0-9 | head -c 63
VN:F [1.9.22_1171]
Rating: 2.5/10 (2 votes cast)

Encrypting individual $HOME folders

Too much time and an unencrypted hard disk on a notebook, lying next to me, are enough reasons to find a comfortable solution to encrypt single user home directories.

The basic setup looks like:

  • encrypt a container file or partition with dm-crypt
  • use openssl to encrypt the needed keyfile
  • configure pam_mount to automatically mount the home dir on login

After you have read some howto‘s about using dm-crypt you should have an usable kernel running. Now it’s time to install the packages we need with the following commands:

doep ~ # emerge -av sys-fs/cryptsetup
doep ~ # emerge -av sys-auth/pam_mount

Because I couldn’t use a whole partition for the home dirs, I hat to use a container file instead. Depending on the container size, the creation takes a bit, so relax.

doep ~ # dd if=/dev/urandom of=/home/doep.img bs=1M count=2048

After the container was written, you have to create a loop back device for it. If loop0 is already taken, use the next available one, but remember on later commands ūüėČ

doep ~ # losetup /dev/loop0 /home/doep.img

As mentioned before, the container will be encrypted using a key file. This key file we have to encrypt with the user’s password. The regexp stuff removed the \n at the ending, otherwise you can get some problems mounting the device.

doep ~ # KEY=$(tr -cd [:graph:] < /dev/urandom | head -c 1024)
doep ~ # echo $KEY | perl -pe 's{\n}{}gs' | openssl aes-256-cbc > /home/doep.key
doep ~ # chown doep:doep /home/doep.key
doep ~ # chmod 0400 /home/doep.key

Now we can start formatting the container with cryptsetup. With cat /proc/crypto you can see, which ciphers are available on your system. If your preferred one is missing, change your kernel config. After everything went fine, you should have a new mapper device in /dev/mapper/doep.img; you also can delete the plain-text key now.

doep ~ # echo $KEY | cryptsetup luksFormat -c aes-cbc-essiv:sha256 -s 256 /dev/loop0
doep ~ # echo $KEY | cryptsetup luksOpen /dev/loop0 doep.img
doep ~ # KEY=""

This mapper device can now be formatted with your preferred filesystem, ext4 in my case. If you want, you can test if the filesystem is mountable. Then remove the LUKS mapping with cryptsetup and the loop device.

doep ~ # mkfs.ext4 /dev/mapper/doep.img
doep ~ # mount /dev/mapper/doep.img /home/doep_new
doep ~ # umount /home/doep_n
doep ~ # cryptsetup luksClose doep.img
doep ~ # losetup -d /dev/loop0

Now we need to configure the pam_mount plugin to automatically mount the user directory if a new session starts. Therefore add the to lines to your /etc/pam.d/login file

session    optional     /lib/security/pam_mount.so
auth       optional     /lib/security/pam_mount.so

and add this entry to the <pam_mount> section in the /etc/security/pam_mount.conf.xml file. if you used different ciphers earlier, specify them here.

<volume
fstype="crypt"
user="doep"
options="loop,cipher=aes-cbc-essiv:sha256"
mountpoint="/home/%(USER)_new"
path="/home/doep.img"
fskeycipher="aes-256-cbc"
fskeypath="/home/%(USER).key"
fskeyhash="md5"
/>

Now switch to VT and try to login. You will get some verbose output but your container file should be mounted.

doep ~ # df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/loop0            2.0G   35M  1.9G   2% /home/doep_new

If you got some errors, try to mount the container manually with mount.crypt_LUKS to locate the error.

doep ~ # mount.crypt_LUKS -v -ov -o keyfile=/home/doep.key -o fsk_cipher=aes-256-cbc -o fsk_hash=MD5 /dev/loop0 /home/doep_new

If everything works, you can move your files into the container and then replace /home/doep_new with your real $HOME. To work with your desktop manager (XXX), also add the 2 lines you added to /etc/pam.d/login to /etc/pam.d/XXX.

VN:F [1.9.22_1171]
Rating: 9.0/10 (3 votes cast)

Using subversion over ssh gate server

If you have the same problem like me, connecting to a svn server that is only reachable over a gate server, try this:

  • Add your root public ssh key to the ~/.ssh/authorized_keys on the gate server.
  • And your user public ssh key to the ~/.ssh/authorized_keys on the svn server.
  • Try this: ssh -L 24:<svn_server>:22 <gate_login>@<gate_server>
  • Now nmap should show something like this:
  • doep ~ # nmap localhost
    PORT      STATE SERVICE
    24/tcp    open  priv-mail
  • You can also create an init script. /etc/init.d/sshtun
  • #!/sbin/runscript
    SSHTUN_OPTIONS="-L 24:<svn_server>:22 <gate_login>@<gate_server> -N"
    
    depend() {
    need net
    }
    
    start() {
    ebegin "Starting sshtun"
    start-stop-daemon --start --quiet --exec /usr/bin/ssh --background \
    --pidfile /var/run/sshtun.pid --make-pidfile -- $SSHTUN_OPTIONS
    eend $?
    }
    
    stop() {
    ebegin "Stopping sshtun"
    start-stop-daemon --stop --quiet --pidfile /var/run/sshtun.pid
    eend $?
    }
  • Don’t forget to make it executable:
  • doep ~ # chmod +x /etc/init.d/sshtun
  • Then start the ssh tunnel:
  • doep ~ # /etc/init.d/sshtun start
  • To create the tunnel on startup, use rc-update:
  • doep ~ # rc-update add sshtun default
  • Now open your ssh config ~/.ssh/config and add a new server:
  • Host svn
    HostName localhost
    User <svn_user>
    Port 24
  • If you have already checked out a repository you can easily change the location:
  • doep@doep ~ $ svn switch --relocate http://<svn_server>/svn/repository svn+ssh://svn/var/svn/repos/repository
  • Otherwise check it out:
  • doep@doep ~ $ svn co svn+ssh://svn/var/svn/repos/repository
VN:F [1.9.22_1171]
Rating: 6.0/10 (3 votes cast)

Using subversion through a ssh tunnel with windows

  • Download and install the PuTTY from here
  • Download and install RapidSVN from here
  • Copy the plink.exe form the PuTTY folder to the RapidSVN/bin folder.
  • Generate a public/private key pair with PuTTYgen and save the two keys to any folder by pressing the ‘any key’.
  • Add the public key to your ~/.ssh/authorized_keys on the svn server. Maybe youe have to correct the format like “ssh-rsa <KEY> user@host“.
  • Create a new ssh session with PuTTY using the following option:
    • host = localhost
    • port = 24
    • connection/data/auto-login username = your username
    • connection/ssh/auth/private key file for authentication = path to your generated private key file
  • Then save this session under the name svn.
  • Generate a new session called tun with this options:
    • host = tunnel server
    • port = 22
    • connection/data/auto-login username = username for the tunnel server
    • in connection/ssh/tunnels/ add: L24 svnserver:22
  • Now tell RapidSVN how to deal with svn+ssh:// URLs:
    • Search your svn config file. For windows 7 its located here: C:\Users\username\AppData\Roaming\Subversion\
    • Open it and goto the [tunnels] section. Then add this line: ssh = $SVN_SSH “C:/Program Files (x86)/RapidSVN-0.10.0/bin/plink.exe”
  • If RapidSVN is already open, then restart it.
  • Start the tun session with PuTTY and login.
  • You can also start the svn session to see if the key auth works. If not, check the authorized_keys file on the svn server.
  • Now you can checkout your repo, with an URL like: svn+ssh://svn/absolute path to repo on svn server/. Here, the ‘svn’ stands for the PuTTY session name.
VN:F [1.9.22_1171]
Rating: 2.5/10 (2 votes cast)

Using synaptics Touchpad via hal

When reinstalling my notebook, I wanted the Xorg-Server to use the event based device system, but also have the advantage of the synaptics driver.

Easiest approach to realize that, is using the hardware abstraction layer to detect X11 input devices.

First of all we need evdev and synaptics as input devices, as well as the hal and dbus useflag in our make.conf.

/etc/make.conf:

USE="${USE} hal dbus"
INPUT_DEVICES="synaptics evdev"

Now remerge the xorg-packages if use flags have changed:

emerge -uNDav --oneshot xorg-drivers xorg-server

Although they should start automatically, it’s no bad idea to add hald and dbus to a runlevel.

rc-update add hald boot
rc-update add dbus boot

Next step is to tell hal what to do with input devices. Fortunatelly gentoo ships predefined rules we can copy:

cp /usr/share/hal/fdi/policy/10osvendor/{10-x11-input.fdi,11-x11-synaptics.fdi} /etc/hal/fdi/policy/

These two files define the X11 options for every detected keyboard, mouse and synaptics touchpad. If you want your old xorg configuration, it’s necessary to rewrite the Options of your xorg.conf input related sections into these files.

For example to activate clicking by tapping for the synaptics touchpad, the line

Option "TapButton1" "1"

of the xorg.conf will now move to 11-x11-synaptics.fdi:

<merge key="input.x11_options.TapButton1" type="string">1</merge>

Anyhow, on my system this tweak didn’t work correctly, so i had to add a little startup script:

~/.kde4/Autostart/synaptics:

#!/bin/sh
synclient TapButton1=1
syndaemon -i1 -k -t -d

Don’t forget to make it executable:

chmod a+x ~/.kde4/Autostart/synaptics

If you’re not using kde4, you may place these lines in your ~/.xinitrc

Note: syndaemon is a utility that locks the touchpad while typing, feel free to leave this line out if you don’t like that.

Now comes the fun part, commenting out every input related section of the xorg.conf, because all this is now managed by hald. In my case, the new /etc/X11/xorg.conf looks like this:

Section "ServerLayout"
    Identifier     "Layout0"
    Screen      0  "Screen0"
#    InputDevice    "Keyboard0" "CoreKeyboard"
#    InputDevice    "Mouse0" "CorePointer"
    Option "AllowEmptyInput" "true"
EndSection
#Section "InputDevice"
#    # generated from data in "/etc/conf.d/gpm"
#    Identifier     "Mouse0"
#    Driver         "mouse"
#    Option         "Protocol" "ImPS/2"
#    Option         "Device" "/dev/input/mice"
#    Option         "Emulate3Buttons" "no"
#    Option         "ZAxisMapping" "4 5"
#EndSection

#Section "InputDevice"
#    # generated from default
#    Identifier     "Keyboard0"
#    Driver         "evdev"
#    Option         "XkbModel"   "pc105"
#    Option         "XkbLayout"  "de"
#    Option         "XkbVariant" "nodeadkeys"
#    Option         "AutoRepeat" "500 30"
#EndSection

I had some strange results when starting X the first time, so i set the “AllowEmptyInput” Option, that fixed it for me.

That’s it. Now reboot and enjoy.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)

Using acerhk with kernel >= 2.6.30

For my notebook (MD 42200) I need acerhk to switch on the wirelessled to get the wifi working. But with newer kernels acerhk doesn’t compile. Last time I had this problem I somewhere found a patched version of it: acerhk-0.5.35_patched.tar.gz

Simply decompress and compile it like this:

doep ~ # tar xvzf acerhk-0.5.35_patched.tar.gz
doep ~ # cd acerhk-0.5.35_patched
doep ~ # make clean; make -j

Then install the module:

doep ~ # make install
VN:F [1.9.22_1171]
Rating: 1.0/10 (1 vote cast)

KMS for Radeon HD 3200 (RS780)

With 2.6.32 it’s now also possible for r6xx/r7xx radeon cards to use kernel based mode switching.

First you have to emerge a 2.6.32 kernel:

doep ~ # emerge -av vanilla-sources

After oldconfig let’s enable “staging drivers” and “Enable modesetting on radeon by default (NEW)” in the device drivers section. And don’t forget to remove any framebuffer. You have also to remove the framebuffer boot options in your boot loader config. Then compile the kernel and boot it without vesa command line options. If you have an AGP card it’s a good idea to add this boot option: radeon.agpmode=-1

hmm.. kms works, so I have an resolution of 1920×1200 px while booting, but when my X starts, the screen is completely white. On my notebook kms works fine with a rv350 chip.

I had to use the radeon drivers: x11-drivers/xf86-video-ati to get it working.

VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)

Setup 3D support for Radeon HD 3200 (RS780)

What to do with an ATI graphic card ? Using proprietary fglrx drivers and use old kernels. Btw. they didn’t work for me, I had some strange stripes on my screen.. Or use the open source drivers radeon, radeonhd but without 3D acceleration ? Since 2.6.31 you can solve this problem using unstable drivers.

Here are all the steps I had to do to get working 2D/3D acceleration with open source video drivers. I’m using a 2.6.31 vanilla kernel. The 2.6.32 supports also kms for r6xx/r7xx but it’s not yet in my portage :/

doep ~ # emerge -av  =vanilla-sources-2.6.31.1
and then the other stuff..

My xorg-server version is 1.6.3.901. After booting the kernel and installing the modules I reinstalled x11-drivers/xf86-video-radeonhd, media-libs/mesa and x11-libs/libdrm.

doep ~ # emerge -av x11-libs/libdrm media-libs/mesa x11-drivers/xf86-video-radeonhd
[ebuild   R   ] x11-libs/libdrm-9999  USE="-debug" 0 kB [1]
[ebuild   R   ] x11-drivers/xf86-video-radeonhd-9999  USE="-debug" 0 kB [2]
[ebuild   R   ] media-libs/mesa-9999  USE="nptl xcb -debug -gallium -motif -pic"
VIDEO_CARDS="radeonhd -intel -mach64 -mga -none -nouveau -r128 -radeon -s3virge -savage
-sis (-sunffb) -tdfx -trident -via" 0 kB [1]

Total: 3 packages (3 reinstalls), Size of downloads: 0 kB
Portage tree and overlays:
 [0] /usr/portage
 [1] /usr/local/portage/layman/x11
 [2] /usr/local/portage/layman/zen-overlay

Of course you need some portage overlays to get the newest git sources. Then I followed the guide from the xorg wiki:

doep ~ # git clone git://anongit.freedesktop.org/~agd5f/drmdoep ~ # cd drm
doep ~ # git checkout -t -b r6xx-r7xx-3d origin/r6xx-r7xx-3d
doep ~ # ./autogen.sh --prefix=$(pkg-config --variable=prefix libdrm) --libdir=$(pkg-config
--variable=libdir libdrm) --includedir=$(pkg-config --variable=includedir libdrm)
doep ~ # make
doep ~ # make installdoep ~ # cd linux-core
doep ~ # make
doep ~ # make install

After stopping the X server, unloading the old modules: radeon and drm, I start the X with the new modules and the following config:

Section "Device"
 Option     "NoAccel"    "False"
 Option     "AccelMethod"    "exa"
 Option     "DRI"        "True"

 Identifier  "Card0"
 Driver      "radeonhd"
 VendorName  "ATI Technologies Inc"
 BoardName   "Radeon HD 3200 Graphics"
 BusID       "PCI:1:5:0"
EndSection

Some test: direct rendering is enabled

doep ~ # glxinfo | grep direct
IRQ's not enabled, falling back to busy waits: 2 0
direct rendering: Yes

and glxgears gets over 800 fps

doep ~ # glxgears
IRQ's not enabled, falling back to busy waits: 2 0
4179 frames in 5.0 seconds = 835.678 FPS

Now xv is working and I can play games like quake3, supertux, supertuxkart without problems ūüėÄ

VN:F [1.9.22_1171]
Rating: 5.0/10 (1 vote cast)

Cross compile Qt-Apps for windows

I hate it to reboot my computer only to compile my qt apps for some windows users. And my vm doesn’t work anymore. So I was looking for a better way to solve my problem..

Well, i think you will have installed qt on your machine ūüėČ if not:

doep@doep ~ $ emerge -av x11-libs/qt-core

To build a win32 cross compiler crossdev would be helpful:

doep@doep ~ $ emerge -av sys-devel/crossdev

Now we can start with our gcc.  You have to do the following steps as root. To lists all supported architectures use:

doep ~ # crossdev -t help

The one we need is mingw32. Now we have to wait a bit until gcc, libc and binutils are ready.

doep ~ # crossdev -t mingw32

If no error occurs, gcc-config should create an output like this.

doep ~ # gcc-config -l
[1] mingw32-4.4.2 *
[2] i686-pc-linux-gnu-4.2.4
[3] i686-pc-linux-gnu-4.3.2 *
[4] i686-pc-linux-gnu-4.4.0
[5] x86_64-pc-linux-gnu-4.3.3
[6] x86_64-pc-linux-gnu-4.4.1 *

Download a windows qt version and install it. Then link the created folder to /usr/share/qt4win . You can also use wine to install the Qt package.

doep ~ # ln -s /mnt/win/Qt/4.5.0 /usr/share/qt4win

Now we have to create a qt spec folder in /usr/share/qt4/mkspecs/ called win32-x-g++. You can download the spec file here: win32-x-g++.tar

doep ~ # tar xvzf win32-x-g++.tar.gz -C /usr/share/qt4/mkspecs/

You are now ready to “try” to cross compile your program !! But first do a cleanup:

doep@doep ~ $ make distclean

Then tell qmake to use our win32-x-g++ spec:

doep@doep ~ $ qmake -spec win32-x-g++

And then start the compilation:

doep@doep ~ $ make -j4

Your application now needs the additionally libgcc_s_sjlj-1.dll to start. It’s located in:

/usr/x86_64-pc-linux-gnu/i686-mingw32/gcc-bin/4.4.0/libgcc_s_sjlj-1.dll
VN:F [1.9.22_1171]
Rating: 2.0/10 (1 vote cast)