Category Archives: Distributed Computing

Distributed Computing

oponia networks

Since “What are you up to these days?” is becoming a FAQ, here’s the short answer: I co-founded a technology company called oponia networks. We’ve got some money, some super-smart people, and a cool project on the go. Our lawyers won’t let me say much about what we’re doing just yet, but I can say that we’re developing distributed applications for the consumer and enterprise markets. We think we’re going to change the way people use the web—but then doesn’t everybody say that these days? Heh.

Stay tuned for some more info on the alpha in a few days…

P2P Multicast Feed Distribution with FeedTree

I’ve been interested in P2P approaches to syndication for a while, and FeedTree is a research project offering one solution to the problem. This poster provides a good overview of how the system works. A more detailed technical description is available here.

In a nutshell, an HTTP proxy on each reader’s machine becomes a node in a Pastry overlay network. The node then joins a multicast tree for each subscribed feed, and updates are pushed to all subscribers using the Scribe group messaging protocol. Feeds are either directly published into the network by FeedTree-aware publishers, or are polled by some subset of interested nodes on behalf of all the others. A digital signature can be added to the feed by the publisher to prevent spoofing. Configuring your feed reader to use the proxy is quite simple, in most cases, and then you’re set. Publishing takes a bit more work to set up, at present, but I don’t see why you couldn’t eventually wrap a simple GUI/wizard around this process to make it painless.

I had every intention of giving FeedTree a go before posting this, but here’s the rub: “Step 2: Be sure that your computer is accessible from the Internet on port 29690.” Well, I could have done so, but it would have required re-configuring my entire home network first. I couldn’t justify the effort just to try out one piece of software. I suspect most software end-users either wouldn’t have the permissions necessary to meet this requirement; wouldn’t know how to go about it; or, like me, wouldn’t be sufficiently motivated to take the trouble. Alas, methinks this is a showstopper if FeedTree is to become anything more than a research project. That’s a shame.

The “not even on the napkin” approach I had in my head was based on JXTA, of course (a JXTA-based solution would remove the need to hack your network, making it end-user friendly, for one thing.) Rather than pushing updates to subscribers, I envisioned peers using the Discovery service to advertise and search for feeds. To wit: rather than polling feed A every hour, the peer would first try to discover a copy of the feed less than one-hour old. If none could be found, only then would it poll feed A’s source, and publish an appropriately time-stamped advertisement for other peers to discover. The FeedTree “push” approach provides for more timely updates than this if the publisher injects the feed directly into the FeedTree network. Also, I can see the potential in my scheme to end up with “too many” peers polling for a feed, if their desired polling frequencies are too much out of synch. This requires considerably more thought. FeedTree also allows multiple “volunteers” to poll a feed in order to maintain a certain level of timeliness, but the overall polling frequency is better controlled.

Anyway, I’m glad to see someone doing real work on alternatives to uncontrolled HTTP polling for feeds. Ultimately, though, I don’t see anything gaining wide usage that isn’t dead-simple to install and use, without regard to network topology. (Imagine that you did get FeedTree installed on your laptop at home. As soon as you take the machine to the office, or to school or wherever…you’re hosed. *sigh*) And without mass adoption and seamless mobile usage, no solution will be able to make any real dent in the scaling problem.

Network Applications of Bloom Filters

“A Bloom filter is a simple space-efficient randomized data-structure for representing a set in order to support membership queries.” The survey paper Network Applications of Bloom Filters is a great technical overview of what they are and what you might want to use them for. Examples include distributed caching, distributed hash tables (DHTs), resource routing, more efficient multicast, and traffic measurement. For a quickstart, see this helpful tutorial on using Bloom filters in place of lookup hashes in Perl. The author also describes how Bloom filters may be applied in social software systems to allow people to share information about their networks without revealing who their friends are to the world or to a central authority.