So, last night my friend Morg asked me to port a certain tool (namely, ttm_unpack) to Python from C, since he didn't want to introduce a C-compiler as a dependency for GOGonLinux project. A quick note about GOGonLinux - the project's main task is to make installing games from GOG on Linux as easy as possible. I'm an avid supporter and follower and this is my first major contribution to the project. Join us on #gogonlinux @ freenode and contribute to the codebase!
Back to the topic at hand, since we already had the C-code available I thought "why not?" and said yes, since I thought it would take an hour at maximum with testing.
How wrong can a man be? Very wrong, I'll tell you that. But in the end, we succeeded, and you can download ttm_unpack_py from my Github-page.
So, what kind of problems did we have while porting the tool?
- Even though Python has been built upon C, Python is very different from C. It should not be a surprise, but some of the inner workings of the way how Python works were quite surprising. For example, integers in Python don't really overflow, at least not like they do in C. We learned this the hard way when the original tool depended on C's uint32_t going over its maximum value and overflowing back to 0. What this meant was that we needed to implement the overflowing manually to get datafiles extracted correctly.
- Python has been described a multitude of times as being "explicit language", where you need to declare self when dealing with object's inner workings and so forth. It's all good and dandy, but you can't actually have a short int-type, at least not without some hacky stuff. You want to know how to do it? You need to declare an array and have it store only short ints. So, yeah, no go.
- So, if you need to convert an integer to binary value in Python, you'll use bin(). This, however, creates a string representing the binary integer, which may cause some problems (as we found). If you want to convert the binary back into integer, you need to explicitly pass the base of the binary to int() function, otherwise your code will fail.
- Python has immutable strings. Immutable strings are a pet peeve of mine, and having grown mostly programming C and C++ I've never really understood why you'd want strings to be immutable without explicitly telling. Anyhow, it can be worked around, so this is nothing big.
- When unpacking data from binary files with struct.unpack() into a string, for some reason you need to specify the amount of chars you need. For example, we needed to use struct.unpack(str(fnameLen) + 's', pfile.read((char_size * fnameLen))), since if we used only 's' as format paramater, the unpacker would be expecting only a single character as parameter (even if 's' is supposed to be string, as per documentation!). For a dynamic language this seems awfully non-dynamic and C-like.
- General complaint: when you write software with magic numbers and operations, explain what are you doing and why! We spent too long wondering what the original author meant with the following line of code: char xorValue = ((char*)(&decryptState))[idx&3]; where decryptState was defined as int decryptState = 0xdeadcafe;. Also, please explain WHY you do things and not only what you are doing. I know I may not be perfect in doing that either, but still.
- So, when you use Python and you make a long value, what happens that Python automatically appends 'L' or 'l' at the end of the representation of the value. You can quickly see where this is going - need to manually strip the character from string in order to have it do what we needed.
So, there we have it. Python is more than well suited for some things but for some other things I'd rather use better tools or even existing tools, but sometimes you simply don't have a choice. Until next time!