Category Archives: Computational Biology

My Erdos Number is 4

Related to my previous post, I now have an ErdÅ‘s number of 4. Another thing I’ve always wanted! Here are the details and an explanation of ErdÅ‘s numbers for those who aren’t familiar with them.

I’ve posted previously about the mathematician Paul ErdÅ‘s. Among other things, ErdÅ‘s was insanely prolific and published 1,475 papers with 511 collaborators. Since one of his many areas of interest was graphs, it’s not surprising that a collaboration graph of his co-authors, and their co-authors, and so on…should be of interest. Courtesy of Wikipedia:

The ErdÅ‘s number…describes the “collaborative distance” between a person and mathematician Paul ErdÅ‘s, as measured by authorship of mathematical papers. It was created by friends as a humorous tribute to the enormous output of ErdÅ‘s, one of the most prolific modern writers of mathematical papers, and has become well-known in scientific circles as a tongue-in-cheek measurement of mathematical prominence.

The Erdős collaboration graph is too huge to visualize, sadly, but the Erdős Number Project site has some interesting facts about the graph. Unfortunately, I think this information is skewed because it is based only on papers published in mathematical journals, while the high degree of interdisciplinary collaboration means that many people outside of mathematics have finite Erdős numbers. Anyway, according to this information, about 83,642 other people have Erdős number 4 (probably a gross underestimate.)

My relationship to Erdős comes from the fact that one of my co-authors, Michael Brudno, was a collaborator with at least two authors with Erdős number 2: Serafim Batzoglou and Lior Pachter. Each of those authors is a co-author with Daniel J. Kleitman, who not only has Erdős number 1, but has the lowest known Erdős-Bacon number: 3.

It’s conceivable that through one of Mike Brudno’s other collaborators, his number could in fact be 2, making mine 3, but confirming or disconfirming that would be too laborious. I’m more than satisfied with 4, which is slightly lower than the mean–especially considering that I never dreamed I’d have an ErdÅ‘s number at all!

Savant: Genome Browser for High-Throughput Sequencing Data

I forgot to mention I now have a peer-reviewed publication! I don’t have a “bucket list” as such, but this is something I’ve always wanted. Not being an academic has made that unlikely, but because of my job in a research lab, it’s finally happened. Yay! Here are the details:

Savant: genome browser for high-throughput sequencing data
Marc Fiume, Vanessa Williams, Andrew Brook, Michael Brudno
Bioinformatics 2010; 26:1938-1944, August 15, 2010
doi: 10.1093/bioinformatics/btq332.

Read the abstract or the full paper (PDF). It has pretty pictures 😉

Cryptanalysis and genomics

For some reason it occurred to me that these two things should go together (while reading about Schroedinger’s brilliant notion about “the stuff of the gene” being some kind of aperiodic crystal). Anyway, while searching for stuff on this topic, I came across this great bit: Craig Venter’s synthetic bacterium contains coded “watermarks” in its DNA. One of these watermarks actually contains a Webpage, complete with a link. Others include quotations by James Joyce and Richard Feynman.

It sounds like science fiction, doesn’t it? Seriously cool–and slightly creepy. Imagine this kind of thing being introduced into humans via gene therapy!

I also found this paper on using cryptanalytic techniques to predict introns and exons. Sadly, that was all I could find. Perhaps it is not a fruitful avenue of research. Or perhaps it is just new and/or obscure. Time will tell.

New Direction: Computational Biology

After an absurdly long job search, I’ve finally found myself a comfortable place in a computational biology lab. I’ve been here a bit more than a month and thought I should mention something about what I’m doing.

I’m working for Dr. Michael Brudno in the Computational Biology Lab at the University of Toronto. At the moment, I’m developing an application for visualization and analysis of biological sequence and annotation data with a graduate student named Marc Fiume. (We just chose a name for our project today: SAVANT. I like it.) I’m also sitting in on a graduate seminar on analysis of high throughput sequencing data and attending the occasional presentation on related research at The Centre for Applied Genomics. I’ll be spending one day a week at Sick Kids hospital, in order to interact with biologists and bioinformaticians who are among the target users of SAVANT.

I’m having a great time.

This is all a huge change from the enterprise web development that is more or less what I’ve been doing since 1996. A huge change that I really needed. Sometimes you just need to start over, you know? It was getting to the point where I honestly couldn’t picture myself actually taking any of the jobs I was applying for. I couldn’t face the same-old, same-old any longer.

I’m not sure where this is all going to lead, but I’m kind of hoping to make a career in this relatively young field. I believe that my many years of experience in commercial software engineering will be useful here. I think I can have fun and make a difference. The territory is huge; the problem space practically inexhaustible. I can’t imagine getting bored any time soon. Heading off in a new direction feels exactly right. So work-wise right now, it’s all good. :)