« Architecting a Shared Codebase from Browser and Desktop | Main | Keynote: Flex 4 Preview (Flex Camp Boston 2008) »

Sorting out Pixels or How I Didn't Plan to Spend My Sunday

I wasted, well maybe wasted is too strong of a word. I used way more time than I should have on Sunday playing with Pixel Bender trying to figure out how to make it sort values. In my trend to make Pixel Bender do stuff besides manipulate pixels I thought a fun mental exercise would be to implement a parallel sort. Now I've not come up with any good use for this yet, considered if this is the best performing parallel sort, yadda, yadda, but it works, which is what I really wanted to get out of this experiment.

While quicksort and mergesort are efficient given their O(n*log(n)) runtimes, they require iteration which isn't what Pixel Bender is good at. I ran across one paper that outlined a parallel sort on a linear array of cellular automata algorithm but its running time was O(2*n-3) cycles. For a reasonably large array I was concerned about exceeding the size of an image that I could feed into Pixel Bender.

As always, Knuth had an answer. On page 111 of his Sorting and Searching tome, he outlines Batcher's sorting scheme which is characterized as a "merge exchange sort". The key piece that makes the algorithm attractive for what I wanted to do is that step "M3. [Loop on i.]" can be done for all relevant i in any order, even simultaneously. [Knuth 111]. It also scales nicely, with worst case steps required to sort being O(1/2*ceil(log_2(n))*(ceil(log_2(n)+1). At 4096 elements 78 steps are needed while at 262,144 elements only 171 steps are needed. I'm going to exceed the width much more quickly than the height.

Doing a rough implementation of the algorithm in ActionScript was straight forward:

/**
* Merge Exchange Sort using Batcher's Method
* Knuth, The Art of Computer Programming,
* Vol 3. Sorting and Searching, Page 111.
* @param data The data to be sorted, will be modified inplace
*/
private function mergeExchangeSort(data:Array):void {
    // Sanity check
    var n:int = data.length;
    if (n < 2) {
        return;
    }
    // M1. [Initialize p.]
    var t:int = Math.ceil(Math.log(n)/Math.log(2));
    var p:int = Math.pow(2, t - 1);
    do {
        // M2. [Initialize q, r, d.]
        var q:int = Math.pow(2, t - 1);
        var r:int = 0;
        var d:int = p;
        var loopOnQ:Boolean = true;

        do {
            // M3. [Loop on i.]
            for (var i:int = 0; i < n - d; i++) {
                if ((i & p) == r) {
                    // M4. [Compare/exchange R_(i+1):R_(i+d+1)]
                    if (data[i] > data[i + d]) {
                        var temp:Object = data[i];
                        data[i] = data[i + d];
                        data[i + d] = temp;
                    }
                }
            }
            // M5. [Loop on q.]
            if (q != p) {
                d = q - p;
                q = q / 2;
                r = p;
            } else {
                loopOnQ = false;
            }
        }
        while (loopOnQ);

        // M6. [Loop on p.]
        p = Math.floor(p / 2);
    }
    while (p > 0);
}

The variables, p, q, r, and d are bookkeeping and my thoughts was that I would pass these in as values for each row in the pipeline. While the M3 and M4 piece would be the actual parallelized sorting piece. I quickly hit my first problem when I discovered that it doesn't look like I can do bitwise operators in Pixel Bender. The ((i & p) == r) line is vital to only testing possible exchanges at specific indexes for a particular step. It is in fact what gives the algorithm its complexity and beauty.

My next thought was to manually calculate the bitwise and using what functions I had available in Pixel Bender. In the back of my mind I think there is some eloquent way to do bitwise operations that I've forgotten since college (and some quick Google searches didn't refresh my memory). Just wanting to get something working I hacked together a quick little routine that used only operations available in Pixel Bender. Thankfully Pixel Bender does support modulus. This is the ActionScript version of the code:

private function bitwiseAnd(a:int, b:int):int {
    var result:int = 0;
    var n:int = 1;
    while ((a > 0) && (b > 0)) {
        if (((a % 2) == 1) && ((b % 2) == 1)) {
            result += n;
        }
        a = a / 2;
        b = b / 2;
        n = n * 2;
    }
    return result;
}

I had forgotten that Pixel Bender, when exporting for Flash, doesn't support loops :( I unrolled the loop for the largest values that I thought I'd test with, yay cut and paste. Then I ran into another subtle aspect of the code that I'd skimmed over. Primarily the fact that when looping over i it is doing a swap of the elements. Given that I'd be running in parallel I really had to calculate not only if i was bigger than i+d but the opposite at the same time. That would mean another unrolled bitwise and operation. This was turning into more of a pain than I had hoped and I was far exceeding whatever playtime limit I'd set aside for this little project.

Alas, for better or worse as Brian has aptly pointed out, I have a tenacious personality. I did take a break to run some errands, eat some more of the yummy chicken stew I made last night, and catch up on some saved TV shows. I came back, didn't have any insight, and figured I'd just write up this blog post. Half way through writing it I had an insight. The p, r, q, d, and i nonsense is just bookkeeping overhead. For a fixed size n, the values of p, r, q, d, and what swaps need to be made can be calculated once and reused as part of the pipeline.

Pixel Bender supports multiple source images. One could be my input data and the other could be the bookkeeping junk. (Insert long pause while I try this out). Success! I've now used way too much time putting this together. I only hope I haven't forgotten something else that I was supposed to be doing...

The couple of changes are before M3 in the algorithm mentioned above, I store the value of d in my bookkeeping bitmap:

// Pass along the value for d
step++;
bitmapData.setPixel(0, step + Y_OFFSET, d);

Then I replace M4 with the following change:

// M4. [Compare/exchange R_(i+1):R_(i+d+1)]
// Store the fact that we need to make this comparison
bitmapData.setPixel(X_OFFSET + i, step + Y_OFFSET, RIGHT);
bitmapData.setPixel(X_OFFSET + i + d, step + Y_OFFSET, LEFT);

This stores either a compare to the right or compare to the left value in the bookkeeping bitmap.

Lastly to get the pipelining working, the Pixel Bender code operates offset by one row. That is when working on row 1 it is pulling source pixels from row 0. This way each time the code runs the source row can be changed but the previous rounds values will continue to work down the sorting steps. This way if you can feed new data into the system every step and then after the minimum number of steps needed start pulling data off the other end that has been sorted. The cheesy UI I put together to test the code demonstrates this pipelining effect.

The Pixel Bender code ended up being straight forward once I introduced the bookkeeping bitmap. The only funky thing is doing the comparisons. Since I'm setting values as pixels, when in Pixel Bender it is chopped up into RGB which means a little special handling. Also, I've had some issues with float versus integer comparisons so don't be surprised if there are some stray oddities in the code. This then is the core of the algorithm on the Pixel Bender side:

// grab the data to figure out what which direction to compare with, if any
pixel4 bookdata = sampleNearest(book, outCoord());
// grab the d value to use, which is always stored at x = 0 for this row
pixel4 dData = sampleNearest(book, float2(0.0, outCoord().y));
// convert it into a real value, only need GB since we would create images too large otherwise
// although I don't know what the size limit is...
float d = dData.g * 65536.0 + dData.b * 256.0;
// grab the previous step's data, doing this makes for easier pipelining
pixel4 me = sampleNearest(src, outCoord() + float2(0.0, -1.0));
// default to doing nothing
dst = me;
// always skip x = 0 as that holds bookkeeping information
// and skip y = 0 as that is the start of the pipeline
// we read the previous row and write in this row
if ((int(outCoord().x) != 0) && (int(outCoord().y) != 0)) {
    // need to compare to the value on our right at me.x + d
    if (bookdata.r == 1.0) {
        pixel4 right = sampleNearest(src, outCoord() + float2(d, -1.0));
        // simplistic way to make the RGB compare like a number, can probably be optimized
        if ((me.r > right.r) || ((me.r == right.r) && ((me.g > right.g) || ((me.g == right.g) && (me.b > right.b))))) {
            dst = right;
        }
        // need compare to the value on our left at me.x - d
    } else if (bookdata.g == 1.0) {
        pixel4 left = sampleNearest(src, outCoord() + float2(-d, -1.0));
        // see note above
        if ((left.r > me.r) || ((left.r == me.r) && ((left.g > me.g) || ((left.g == me.g) && (left.b > me.b))))) {
            dst = left;
        }
    }
}

Give the finished application a try. Flash Player 10 required. It currently limits the values to 0xFFFF and the number of elements to 16 so they fit in the grid. View the full source if you want to play with it some more.

Tags: flex pixelbender sort