Archive for the 'Distributed Computing' Category

The Web Bores Me

I have to face the facts: I’m completely bored of writing web apps. I’m not bored by the architecture of the Web, which I believe should be leveraged more than it currently is; but sometimes I really don’t think I can face the grind and hassle of assembling what should be a simple web application or web service. Let’s face it: it’s so freaking dull. And so much harder than it should be. I read somewhere (unfortunately, I can’t remember where) that the most complex aspect of any enterprise web development project is AJAX. I can believe it easily. Add a pile of enterprise middleware suckage plus associated crappy tools and it’s ten times more disheartening.

So I’m learning Cocoa. Gonna see what I can do with desktop apps that speak Web. At least it’ll be different. And cross-platform capability be damned (for the moment, at least.) Then who knows? Maybe it’s time to take distributed computing to the iPhone :)

I suppose it’s no wonder I’m more comfortable writing middleware. It’s hard, but at least it’s not hard and terminally boring. (YMMV.)

Amazon Web Services Redux

It seems my earlier post “The Long Tail of Web Services” is getting some traffic from links here and here. At least someone is willing to put their money where my mouth is :)

Since that original post, Amazon has come out with yet another service (still in limited beta) called Amazon SimpleDB. This is a simple but apparently powerful service to query structured data. Although I note some complaining from the database community about it not really being a database, that’s just a semantic issue. If they renamed it, the complaints would probably go away. I think I would describe SimpleDB as something like a content-addressable DHT.

BTW, I can’t help wondering if this new service is related to Amazon’s Dynamo. (This is total speculation on my part, BTW. Perhaps closer inspection will tell. Or maybe Amazon will, eventually…)

Super-Peer Architectures

I came across this presentation on Super-Peer Architectures today. I haven’t had time to read it in full, but it looks stuffed with useful data. It will especially appeal to anyone interested in the architecture of Skype. A list of the authors’ other publications can be found here and here.

oponia networks

Since “What are you up to these days?” is becoming a FAQ, here’s the short answer: I co-founded a technology company called oponia networks. We’ve got some money, some super-smart people, and a cool project on the go. Our lawyers won’t let me say much about what we’re doing just yet, but I can say that we’re developing distributed applications for the consumer and enterprise markets. We think we’re going to change the way people use the web—but then doesn’t everybody say that these days? Heh.

Stay tuned for some more info on the alpha in a few days…

P2P Multicast Feed Distribution with FeedTree

I’ve been interested in P2P approaches to syndication for a while, and FeedTree is a research project offering one solution to the problem. This poster provides a good overview of how the system works. A more detailed technical description is available here.

In a nutshell, an HTTP proxy on each reader’s machine becomes a node in a Pastry overlay network. The node then joins a multicast tree for each subscribed feed, and updates are pushed to all subscribers using the Scribe group messaging protocol. Feeds are either directly published into the network by FeedTree-aware publishers, or are polled by some subset of interested nodes on behalf of all the others. A digital signature can be added to the feed by the publisher to prevent spoofing. Configuring your feed reader to use the proxy is quite simple, in most cases, and then you’re set. Publishing takes a bit more work to set up, at present, but I don’t see why you couldn’t eventually wrap a simple GUI/wizard around this process to make it painless.

I had every intention of giving FeedTree a go before posting this, but here’s the rub: “Step 2: Be sure that your computer is accessible from the Internet on port 29690.” Well, I could have done so, but it would have required re-configuring my entire home network first. I couldn’t justify the effort just to try out one piece of software. I suspect most software end-users either wouldn’t have the permissions necessary to meet this requirement; wouldn’t know how to go about it; or, like me, wouldn’t be sufficiently motivated to take the trouble. Alas, methinks this is a showstopper if FeedTree is to become anything more than a research project. That’s a shame.

The “not even on the napkin” approach I had in my head was based on JXTA, of course (a JXTA-based solution would remove the need to hack your network, making it end-user friendly, for one thing.) Rather than pushing updates to subscribers, I envisioned peers using the Discovery service to advertise and search for feeds. To wit: rather than polling feed A every hour, the peer would first try to discover a copy of the feed less than one-hour old. If none could be found, only then would it poll feed A’s source, and publish an appropriately time-stamped advertisement for other peers to discover. The FeedTree “push” approach provides for more timely updates than this if the publisher injects the feed directly into the FeedTree network. Also, I can see the potential in my scheme to end up with “too many” peers polling for a feed, if their desired polling frequencies are too much out of synch. This requires considerably more thought. FeedTree also allows multiple “volunteers” to poll a feed in order to maintain a certain level of timeliness, but the overall polling frequency is better controlled.

Anyway, I’m glad to see someone doing real work on alternatives to uncontrolled HTTP polling for feeds. Ultimately, though, I don’t see anything gaining wide usage that isn’t dead-simple to install and use, without regard to network topology. (Imagine that you did get FeedTree installed on your laptop at home. As soon as you take the machine to the office, or to school or wherever…you’re hosed. *sigh*) And without mass adoption and seamless mobile usage, no solution will be able to make any real dent in the scaling problem.

Network Applications of Bloom Filters

“A Bloom filter is a simple space-efficient randomized data-structure for representing a set in order to support membership queries.” The survey paper Network Applications of Bloom Filters is a great technical overview of what they are and what you might want to use them for. Examples include distributed caching, distributed hash tables (DHTs), resource routing, more efficient multicast, and traffic measurement. For a quickstart, see this helpful tutorial on using Bloom filters in place of lookup hashes in Perl. The author also describes how Bloom filters may be applied in social software systems to allow people to share information about their networks without revealing who their friends are to the world or to a central authority.