RVMbinary Released

RVMBinary has now been released onto the The Comprehensive R Archive Network (CRAN).

The webpage is located here

RVMbinary is an implementation of RVM for binary classification. It is based on what is described in this paper: Tipping, M. E. and A. C. Faul (2003). Fast marginal likelihood maximisation for sparse Bayesian models. In C. M. Bishop and B. J. Frey (Eds.), Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, Jan 3-6. PDF
Comments

Timing

So recently I have needed to do a time comparison of two different c programs. Unfortunately I was not able to use a c timing method as I couldn’t edit the source code of one of the programs hence my first attempt was to use Python.

Python has an inbuilt module known as time as well as something called Timeit.

Timeit is supposed to help you from falling into the traps of timing an algorithm. There are many, many traps it appears. First off you must decide what time to use. There seems to be a couple of options here such as Wall time and CPU time. Wall time is the real time taken to run the algorithm e.g. the actual time you experience compared to CPU time which measures the number of cycles the CPU takes to run your program. In theory CPU time should be a more accurate representation for comparison and hopefully independent of anything else that is possibly running.

Unfortunately by default Timeit seems to use the wall time and then by default run your program 10000 times. Selecting the best time from those runs gives you your runtime. Obviously the reason to do this is that the shortest time is a better measure than the average time which would include situations where the operating system is getting in the way.

However unfortunately this didn’t seem to provide me with a theoretically consistent time. Therefore I tried to use the CPU time. Eventually I cam across the inbuilt Linux function time. This produces 3 times: real time, user time and kernel time. The user time is the CPU time that I required and seemed to be consistent with theory.

This all boils down to this essentially. “Timing is not scientific”
Comments

New Year New Style

It seems it is a custom that with every new year I take stock of all the things I promised to keep doing. Of course promises with yourself are easily broken that and a lot more work seems to be happening in my PhD now. Anyway I am going to try really hard to keep this new and fresh blog a bit more up to date with little bits of information.

This will probably include over the next few months a lot to do with R (http://www.r-project.org/).

Anyway, here is the new simplified site. I promise to add something very soon.
Comments

Mathews Correlation Coefficient

Recently I have been looking into various different classification metrics for a binary classification problem. One of the most useful single value metrics I have come across is the Mathews Correlation Coefficient (MCC). Its main advantage over that of accuracy, or percentage of correct predictions, is the ability to handle unbalanced classes.

When looking into this I decided it might be nice to see a 3D representation of the MCC and see how its shape is effected by changing the ratio of the classes. So here is a flash animation of an MCC. Plotted along the two axis are TP and TN. The Number of Positives (NP) and Number of Negatives (NN) are varied from 1 to 100. Here 1-MCC is plotted so that 0 is perfect prediction and 2 is perfect anti-prediction. 1 is equivalent to random guessing.

Hi this section of the page requires Flash Player to view it. You can download this from Adobe.

Comments

Word Counting Fun

So there has been a super lack of update to this site recently initially it was my interest in some cool stuff I had been working on which seemed to be working back in February. I then found a problem and have since then been trying to figure out what is going on. I have made some kind of computer monster which is confusing me. One day it will be nice but then as soon as I think I have tamed it BAM it bites me in the ass!! Anyway then my first year report writing got in the way and there has not really been anything exciting happen since then. Today however I found a very cool little thing built into Mac OS X. So windows user ignore and Linux users probably read on as its probably in Linux as well. Anyway if you are like me and write all your stuff in LaTeX then you probably if your silly like me copy and paste all your pdf into word/pages to count the number of words well no need anymore. In terminal just write:

ps2ascii report.pdf | wc -w

where report.pdf is the required pdf to word count and whoop it does it all for you! Pretty cool hey. Also for all you Mac Lovers, Snow Leopard for $29 (not sure if they have announced pounds yet) is amazing. I can’t wait to get my hands on it, look at Grand Central Dispatch which promises an easy way to multi-core program(http://www.apple.com/macosx/technology/)
Comments