Monthly Archives: February 2013

Decrypting RSDF files

A small code-snippet to decrypt links from a RSDF container:

def decryptRSDF(filename):
    from Crypto.Cipher import AES

    links = []

    f = open(filename, "r")
    lines = f.readlines()
    f.close()

    data = bytearray.fromhex("".join(lines))
    array = data.split("\n")

    key = bytearray.fromhex("8C35192D964DC3182C6F84F3252239EB4A320D2500000000")
    iv = bytearray.fromhex("a3d5a33cb95ac1f5cbdb1ad25cb0a7aa")

    aes_context = AES.new(str(key), AES.MODE_ECB, str(iv))

    for line in array:
        url_in = base64.b64decode(line)
        length = len(url_in)

        if length > 0:
            url_input = bytearray(url_in)
            url_output = bytearray(length)

            #1 byte
            output_block = bytearray(aes_context.encrypt(str(iv)))

            url_output[0] = url_input[0] ^ output_block[0]

            #other bytes
            for n in range(1, length+1):
                iv[:15] = iv[1:]
                iv[15] = url_input[n-1]

                if n < length:
                    output_block = bytearray(aes_context.encrypt(str(iv)))
                    url_output[n] = url_input[n] ^ output_block[0]

            links.append(str(url_output))

    return links
VN:F [1.9.22_1171]
Rating: 3.0/10 (1 vote cast)

PyQt4 QFileDialog freezes when qt4reactor is running

To use the twisted reactor inside a Qt gui application, I’m using the qt4reactor package:

app = QtGui.QApplication(sys.argv)

import qt4reactor
qt4reactor.install()

from twisted.internet import reactor
factory = Factory()

gui = Gui(app)

gui.show()
reactor.runReturn()

sys.exit(app.exec_())

But I realized that after the “reactor.runReturn()” line, each QFileDialog freezes the complete program.

filename = QtGui.QFileDialog.getOpenFileName()

A solution is to use the non-native dialog instead:

filename = QtGui.QFileDialog.getOpenFileName(options=QtGui.QFileDialog.DontUseNativeDialog)
VN:F [1.9.22_1171]
Rating: 0.0/10 (0 votes cast)

Improved python gzip reading speed

Dealing with large files of protein trajectories, I realized that some of my python scripts are incredibly slow in comparison with c++ code. I noticed that unzipping a trajectory before reading is faster than using the gzip module to read directly from the gzipped file ^^.

I have five different approaches to benchmark the reading speed for the following two (same) files:

-rw-r--r-- 1 doep doep 2.4G Feb 15 16:05 traj.pdb
-rw-r--r-- 1 doep doep 609M Feb 15 15:59 traj.pdb.gz

Each runtime was measured twice using the real-time of the ‘time’ command. Each approach reads in every single line via:

while True:
    line = f.readline()
    if not line: break

The five methods are:

  1. Reading from uncompressed file via: open()
  2. Reading from uncompressed file using the io module: io.open()
  3. Reading from compressed file using the gzip module: gzip.open()
  4. Reading from compressed file using a small class based on the zlib module: zlib_file()
  5. Reading from compressed file using named pipes: os.mkfifo()

Results:

zlib

Conclusion:
Because storing/reading uncompressed file is not an option, the named pipes os.mkfifo() are the best/fastest solution for simply reading in files. But it also used the second system CPU, so the real-time is smaller than the user-time (90 +- 4.5). If you need seeks, etc you should extend the zlib_file class to your needs and gain a factor of ~2 in speedup. It is sad to see the performance of the gzip.open() approach, as ‘zcatĀ  traj.pdb.gz > /dev/null’ took only 21.165 seconds.

For uncompressed reads, the open() approach is the faster one, but on a different machine things were different as io.open() was 20x times faster than the open(). So you should check the open() speed on your machine before using it.

Complete code:

"""This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Lesser General Public License as published by
the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
 
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Lesser General Public License for more details.
 
You should have received a copy of the GNU Lesser General Public License
along with this program.  If not, see <http://www.gnu.org/licenses/>."""
 
from __future__ import print_function
 
import io
import zlib
import sys
 
class zlib_file():
    def __init__(self, buffer_size=1024*1024*8):
        self.dobj = zlib.decompressobj(16+zlib.MAX_WBITS) #16+zlib.MAX_WBITS -> zlib can decompress gzip
        self.decomp = []
        self.lines = []
        self.buffer_size = buffer_size
 
    def open(self, filename):
        self.fhwnd = io.open(filename, "rb")
        self.eof = False
 
    def close(self):
        self.fhwnd.close()
        self.dobj.flush()
        self.decomp = []
 
    def decompress(self):
        raw = self.fhwnd.read(self.buffer_size)
        if not raw:
            self.eof = True
            self.decomp.insert(0, self.dobj.flush())
 
        else:
            self.decomp.insert(0, self.dobj.decompress(raw))
 
    def readline(self):
        #split
        out_str = []
 
        while True:
            if len(self.lines) > 0:
                return self.lines.pop() + "\n"
 
            elif len(self.decomp) > 0:
                out = self.decomp.pop()
                arr = out.split("\n")
 
                if len(arr) == 1:
                    out_str.append(arr[0])
 
                else:
                    self.decomp.append(arr.pop())
                    arr.reverse()
                    out_str.append(arr.pop())
                    self.lines.extend(arr)
 
                    out_str.append("\n")
                    return "".join(out_str)
 
            else:
                if self.eof: break
                self.decompress()
 
        if len(out_str) > 0:
            return "".join(out_str)
 
    def readlines(self):
        lines = []
        while True:
            line = self.readline()
            if not line: break
 
            lines.append(line)
 
        return lines
 
if __name__ == "__main__":
    mode = int(sys.argv[1])
 
    if mode == 1:
        f = open("traj.pdb")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
 
    elif mode == 2:
        f = io.open("traj.pdb")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
 
    elif mode == 3:
        import gzip
        gz = gzip.open(filename="traj.pdb.gz", mode="r")
 
        while True:
            line = gz.readline()
            if not line: break
 
        gz.close()
 
    if mode == 4:
        f = zlib_file()
        f.open("traj.pdb.gz")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
 
    elif mode == 5:
        import os
        import subprocess
 
        tmp_fifo = "tmp_fifo"
 
        os.mkfifo(tmp_fifo)
 
        p = subprocess.Popen("gzip --stdout -d traj.pdb.gz > %s" % tmp_fifo, shell=True)
        f = io.open(tmp_fifo, "r")
 
        while True:
            line = f.readline()
            if not line: break
 
        f.close()
        p.wait()
 
        os.remove(tmp_fifo)

VN:F [1.9.22_1171]
Rating: 7.5/10 (2 votes cast)