Presenting: django-simple-wiki

It was bothering me that all the wikis I tried, all had either errors, feature lacks, too many dependencies or were simply unmaintained. So I decided to create yet another one. Curiously, the third hit when googling ‘django wiki’ is Create a wiki in 20 minutes. Luckily that’s not really true, so all the PHP guys and MediaWiki can continue breathing. This took me several days.

Google Code project page

Demo website

Hierarchy and relations
First of all, as in the Trac wiki system, I chose to create a system for hierarchy, meaning that it’s possible to create an article and then create sub-articles. The hierarchy does not support multiple inheritance, because it needs to be basis for the permission system. That’s where the relation system comes in place: All articles can contain symmetrical relations to any other articles in the hierarchy.

Parsing
Python and Django supports Markdown pretty much out of the box, so it’s an obvious choice to use this for parsing. The HTML features of normal Markdown have been removed, so all HTML is escaped in django-simple-wiki. And parsing is static, so every time a revision is created, the contents are passed and stored. This means that the contents of the article itself are not supposed to be dynamic. On the other hand, it is desirable to avoid parsing contents for every page hit. The parsing area of the application is only a few lines of code, and can be expanded if further parsing needs to be done, or someone wants to replace Markdown completely. For instance, if no parsing is done and HTML escaping is disabled, the wiki becomes a very simple CMS.

Curious issues
There are a few out standing problems:

  • Permission system is related to User entries in the Django auth system. But maybe this is too much of an annoyance, if the project already has groups setup in the existing auth system. On the other hand, other users would be bothered to setup both wiki groups and user groups, if the permission system was linked to user groups. And directly linking articles to user groups would require wiki-related groups to be created directly in the auth system.
  • Since relations are symmetrical, what should happen if one article is locked, but a user modifies it’s relations by deleting them from related articles?
  • Title editing: The title can only be created once, since it is coupled to the ‘slug’ of the article. A user can deliberately create a completely different title, which is fine, but should subsequent editing be allowed, which would add complexity to the revision system?
  • Article deletion: When an article is deleted from the backend it shouldn’t worry anyone. But if the feature is added to the frontend, we would want to handle maliciousness etc. But should we really store all these files and revisions? Should we alert admins, so they can do the final cleanup?
Posted in Django, Python | Tagged , , , , | 1 Comment

Getting started with Data Parallel Haskell

DPH is a work-in-progress, and not everything is yet easy to realize. But you should still prepare yourself as a Haskell programmer, because this is really where Haskell can set off into uncharted areas of popularity and innovation.

As an exercise for a programming course, I developed a spell checker, that runs in parallel and was motivated to write this short howto on getting to grips with DPH. First of all, DPH comes as a builtin library with GHC 6.10.1+, so it’s easy to get started. This howto is compatible with 6.10.1. But the DPH team has, as of March 2009, warned that significant fixes are added in the GHC trunk, and you should run the following command before installing (ie. recompiling GHC), so I advice you to do so for your own benefit.

$ ./darcs-all --dph get

Now, what you need to know is: DPH consists of two things, a compiler module that transforms your code for parallelism (‘vectorization’) and some special libraries that you should use in your code, as they are compatible with the vectorizer.

DPH can help you achieve two kinds of parallelism: Normal (unlifted) data parallelism and nested data parallelism. What the vectorizer works on is the latter, and for simple one-dimentional parallelism, you don’t need to know much, so let’s just get that done in a few lines:

module SpellCheckUnlifted where
 
import Types
import Trie
import Data.Array.Parallel.Unlifted
import GHC.PArr
 
type DocumentU = Array Word
 
spellCheck :: Dictionary -> Word -> Bool
spellCheck dic w = lookupTrie dic w
 
spellCheckU :: Dictionary -> Document -> [Word]
spellCheckU dic doc = 
  fromP ( filterP (\s -> not (spellCheck dic s)) (toP doc) )

As you can see, it’s fairly simple: Import Data.Array.Parallel.Unlifted and GHC.PArr. Then use functions toP and fromP to transform a list into a parallel array and back. The filterP function will execute functions in parallel on your array.

Download all source code of spell checker example.

Nested parallel example

The strong feature of DPH is nested parallelism, and you should think your problem through before engaging in this. In my approach, I start out with a non-parallel problem, and I think this is probably the classic example; we model our problems non-parallel and then consider how to transform them. My example is not suitable for nested parallelism, as it obviously has too little complexity and too much overhead for parallelism, but nonetheless there’s some interesting points to be gained:

  • The problem set (a list of words) is easily transformed
  • The sub-problems (looking up in a Trie data structure) is nested
  • The data types are non-trivial and need some convertion
{-# LANGUAGE PArr #-}
{-# OPTIONS -fvectorise #-}
module SpellCheckDPH where
 
import qualified Prelude
import Data.Array.Parallel.Prelude
import Data.Array.Parallel.Prelude.Word8
import qualified Data.Array.Parallel.Prelude.Int as I
 
type WordP = [:Word8:]
type DocumentP = [:WordP:]
type DictionaryP = [:TrieP:]
data TrieP = NodeP Word8 DictionaryP | LeafP Word8
 
isNode :: Word8 -> TrieP -> Bool
isNode w t = case t of
  (NodeP w' dic) -> w == w'
  _ -> False
 
isLeaf :: Word8 -> TrieP -> Bool
isLeaf w t = case t of
  (LeafP w') -> w == w'
  _ -> False
 
lookupTrieP :: DictionaryP -> I.Int -> WordP -> Bool
lookupTrieP dic i w =
    let len = lengthP w
        in if (I.<) ((I.+) i 1) len
              then let node = filterP (isNode (w!:i)) dic
                       in if (I.==) (lengthP node) 0
                             then False
                             else case node!:0 of
                                    (NodeP _ d) -> lookupTrieP d ((I.+) i 1) w
                                    _ -> False
              else (I.==) (lengthP (filterP (isLeaf (w!:i)) dic)) 1
 
spellCheckP :: DictionaryP -> DocumentP -> DocumentP
spellCheckP dic doc = filterP (\s -> not (lookupTrieP dic 0 s)) doc

Notice the first two lines. These tell GHC to add some syntax to handle the parallel arrays ([:a:]) and to let the vectorizer handle the code. Secondly, we import the DPH prelude and hide the normal Prelude. You should keep normal GHC library functions separate from DPH! spellCheckP does not convert a normal datatype into a parallel array, because we don’t want anything that cannot be vectorized in this module.

lookupTrieP is not a pretty function, but it has to work with arrays and a limited number of utility functions in the DPH Prelude. And BEWARE! Not all functions in GHC.Parr have been implemented in the DPH Prelude, which is because they simply don’t work with nested parallel arrays, yet, so you simly shouldn’t import this module.

You’ll also notice how I avoided polymorphism. Since DPH only supports Ints, Doubles and Word8 (unsigned 8-bit int), it isn’t of any use. When you transform your datatypes into nested parallel types, you will loose generality. But you can still use data types with different constructors etc. And that’s the power of DPH!

Now we only have to transform our nested data structure and problem into parallel arrays. This is done in a seperate module (SpellCheck.hs in the example).

c2w :: Char -> W.Word8
c2w c = fromIntegral (ord c)
w2c :: W.Word8 -> Char
w2c w = chr (fromIntegral w)
wArr2Str :: [W.Word8] -> String
wArr2Str wArr = map w2c wArr
 
t2P :: Trie Char -> TrieP
t2P (Leaf c) = LeafP (c2w c)
t2P (Node c d) = NodeP (c2w c) (dic2P d)
 
dic2P :: Dictionary -> DictionaryP
dic2P ds = toP $ map t2P ds
 
doc2P :: Document -> DocumentP
doc2P doc = let doc' = filter (\s -> not $ s=="") doc
              in toP $ map (toP.(\s -> map c2w s)) doc'

Download all source code of spell checker example.

Compiling and testing

When you compile, you have to use ‘-fdph-par’ for parallel libraries and ‘-fpar-seq’ for a similar, but sequential library. Also, you have to add ‘-Odph’ to instruct GHC to use the parallel compiler module, ‘-threaded’ because we’re using threads and ‘-fcpr-off’ to turn off the CPR optimizer (a trade-off, but I don’t know how severe).

$ ghc --make -fdph-par -Odph -threaded SpellCheck

Now, let’s test it and see the results. To run with to cores, we add ‘+RTS -N2 -RTS’, and to get some diagnostic output, we add ‘+RTS -N2 -sstderr -RTS’. My results are not very good, because I only have one core :)

$ ./SpellCheck +RTS -N2 -sstderr -RTS /usr/share/dict/words document.txt 1
   1,024,145,348 bytes allocated in the heap
     580,407,632 bytes copied during GC
      33,537,816 bytes maximum residency (16 sample(s))
      15,817,400 bytes maximum slop
              97 MB total memory in use (1 MB lost due to fragmentation)

  Generation 0:  1945 collections,    25 parallel,  3.92s,  4.47s elapsed
  Generation 1:    16 collections,    13 parallel,  1.98s,  2.38s elapsed

  Parallel GC work balance: 1.57 (79357576 / 50670026, ideal 2)

  Task  0 (worker) :  MUT time:   7.88s  (  2.43s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  1 (worker) :  MUT time:   7.88s  (  2.47s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  Task  2 (worker) :  MUT time:   1.99s  (  2.47s elapsed)
                      GC  time:   5.90s  (  6.85s elapsed)

  Task  3 (worker) :  MUT time:   7.88s  (  2.47s elapsed)
                      GC  time:   0.00s  (  0.00s elapsed)

  INIT  time    0.00s  (  0.00s elapsed)
  MUT   time    1.99s  (  2.47s elapsed)
  GC    time    5.90s  (  6.85s elapsed)
  EXIT  time    0.00s  (  0.00s elapsed)
  Total time    7.88s  (  9.32s elapsed)

  %GC time      74.8%  (73.5% elapsed)

  Alloc rate    515,130,747 bytes per MUT second

  Productivity  25.2% of total user, 21.3% of total elapsed
Posted in Computers | Tagged , | Leave a comment

GTK and scrolling without scrollbars

There’s still some work to do on the full screen plugin for Rhythmbox, but the current version is very usable indeed. The latest addition is scrolling by hovering the track list.

I changed the display from a normal fixed table with 3 tracks to a gtk.Layout with a gtk.VBox containing n tracks. In a Layout widget it’s possible to freely place and move child widgets, so by detecting motion events on the edges of the Layout you can emulate scrolling by moving a child widget accordingly with Layout.move(widget, x, y). Unfortunately I get a rather nasty blinking effect when scrolling too fast, and I don’t have an explanation for this, so I’d be glad to hear from anyone who can help.

def track_layout_scroll(self, widget, event):
    time_step = 10 #msecs
    ycoord = event.y
    accel_factor = 10 #how many pixels to scroll at the edge
    edge_distance = 100.0 #pixels
    layout_size = widget.get_size()
    top_dist = edge_distance - ycoord
    bot_dist = edge_distance - layout_size[1] + ycoord
 
    if top_dist > 0:
        accel = -1 - (top_dist / edge_distance) * accel_factor
    elif bot_dist > 0:
        accel =  1 + (bot_dist / edge_distance) * accel_factor
    else:
        accel = 0.0
 
    if self.scroll_event_id:
        gobject.source_remove(self.scroll_event_id)
 
    if not accel == 0.0:
        self.scroll_event_id = gobject.timeout_add(time_step, self.do_scrolling, accel, widget)
 
def do_scrolling(self, accel, layout_widget):
    step = int(1*accel)
    if step == 0:
        return
    vbox_size = self.vbox.size_request()
    layout_size = layout_widget.get_size()
    scroll_height = vbox_size[1]-layout_size[1]
    if self.scroll_y + step < 0:
        self.scroll_y = 0
    elif self.scroll_y + step > scroll_height:
        self.scroll_y = scroll_height
    else:
        self.scroll_y += step
 
    self.track_layout.move(self.vbox, 0, -self.scroll_y)
    return self.scroll_y > 0 and self.scroll_y < scroll_height

The code is related to my Rhythmbox plugin, but I’m sure you get the idea. Also please note, that track_layout_scroll has to receive notify_motion_event from the Layout widget, and that you have to set a size for the Layout widget with set_size().

Update: By playing around with time_step (lowering it to be more exact) and slowing down the acceleration, I managed to almost make the white flashes disappear.

Posted in Python | Tagged , , , , , | Leave a comment

Nullable Object must have a value

I’ve been coding a lot of ASP.NET lately using the whole Microsoft portfolio of dev tools, and I’m in total shock. I had expected to hate it, but not from rational reasoning, but rather because they made it. I thought it was sleek enterprise level stuff. But I was wrong… here are some obvious rational reasons to hate it:

  • Runtime error messages are cryptic and un-helpful (see the title of this post). Often they’re just summarized into texts like “error in database constraint… could be null, foreign-key, primary key… please check all the code you’ve ever written”.
  • System.Collections don’t support basic stuff like tuples and sets.
  • DataTables can’t handle null values. I mean: They call this DRO!?
  • I tried to edit some SQL in a Table Adapter. It suddenly switched the parameters for the auto-generated method without warning me – silently breaking the application and giving me hours of debugging.
  • The design view for aspx files in Visual Studio is useless since it always requires manual editing afterwards
  • I wanted to copy a database. Since the copy-function was broken in Enterprise Manager (You get an error message and using the “copy error message” function renders an “error copying error message”) I made a backup and wanted to restore it in a new database. The dialog for this operation silently switched the target for the restore several times – even the database files: to overwrite the production database. Pretty lucky to notice this, since it was hidden in another tab.
  • Visual Sourcesafe isn’t a proper versioning system. You might as well just use a network share and depend on ntfs file locking. Where’s the collaboration!?
  • There’s only a very limited amount of free or open applications available for it. And especially Umbraco which is the flagship of .NET CMS systems is full of errors and tells you to pay up as soon as you’d like the pro stuff – like say a proper development/deployment system.
  • Umbraco crashed our IIS – all we did was use the copy-content function!? How exactly is it that a webpage is allowed to take down a whole server?

I could probably ramble on about this. The bottom line is simply: I would never choose their technology over stuff like Subversion, Apache, Java or Python. And the whole ASP.NET framework is far from Ruby and Django – but of course it’s a lot older… it’s so 2003. I would have filed a few bug reports, but I guess these guys don’t have a bugzilla. Too bad.. I’ll just have to leave my feedback on this public blog.

Posted in Computers, Debate | Tagged , , , | Leave a comment

Django tip: Translating your application names (app_label)

This MUST be a common issue for international developers: We use some geeky English name for our application and afterwards find that the translated admin index looks a little silly in the eyes of our native speaking users. Apparently a patch has been accepted in the Django dev version, but it’s not yet in the trunk. Here’s what to do: In your applications’ __init__.py files put:

1
2
from django.utils.translation import gettext_noop
gettext_noop("AppName")

When your run manage.py makemessages -a there will be entries for these. Make sure to remove lines saying #, fuzzy. Now all you have to do is to customize the default admin template called index.html so it will actually do the translation of the application names.

18
        <caption>{% trans app.name %}</caption>
Posted in Django | Tagged , , , | 2 Comments