Category Archives: Syndication

Syndication

Is FeedBurner Having Scaling Problems?

The last few days, when I log in to check my feed usage, FeedBurner tells me how many subscribers I have but claims I had no item views and no item clicks the previous day. Some indeterminate time later in the day the real stats magically appear. Are they having trouble completing the “big crunch” in the time between midnight (I assume PST) and my EST morning?

Maybe I’m just paranoid, but… it seems every time Google buys someone, they simultaneously start having scaling difficulties. The question is: is it a chicken thing, or an egg thing? Do companies that get to a size where scaling is becoming a problem become natural targets for Google acquisition? Or does getting bought by Google bring you the kiss of death, scaling-wise (just look at Blogger?)

File this under baseless rumours, if you like. We’ll see what happens as time goes on.

Update: I actually received an email from someone at Google about this (sorry for the delay in posting the update, but I don’t often read that email account). I won’t repeat it in full, but suffice it to say that there is a “big crunch” (as the writer calls it: “nightly roll-ups”) which begin at midnight Central time and last a few hours. It has apparently always been thus. However, I still don’t see my stats until about 10-11 am EST, which is different than before, so I guess I’ll have to report that as… an issue. Though not, I assume, anything to do with Google. :-)

P2P Multicast Feed Distribution with FeedTree

I’ve been interested in P2P approaches to syndication for a while, and FeedTree is a research project offering one solution to the problem. This poster provides a good overview of how the system works. A more detailed technical description is available here.

In a nutshell, an HTTP proxy on each reader’s machine becomes a node in a Pastry overlay network. The node then joins a multicast tree for each subscribed feed, and updates are pushed to all subscribers using the Scribe group messaging protocol. Feeds are either directly published into the network by FeedTree-aware publishers, or are polled by some subset of interested nodes on behalf of all the others. A digital signature can be added to the feed by the publisher to prevent spoofing. Configuring your feed reader to use the proxy is quite simple, in most cases, and then you’re set. Publishing takes a bit more work to set up, at present, but I don’t see why you couldn’t eventually wrap a simple GUI/wizard around this process to make it painless.

I had every intention of giving FeedTree a go before posting this, but here’s the rub: “Step 2: Be sure that your computer is accessible from the Internet on port 29690.” Well, I could have done so, but it would have required re-configuring my entire home network first. I couldn’t justify the effort just to try out one piece of software. I suspect most software end-users either wouldn’t have the permissions necessary to meet this requirement; wouldn’t know how to go about it; or, like me, wouldn’t be sufficiently motivated to take the trouble. Alas, methinks this is a showstopper if FeedTree is to become anything more than a research project. That’s a shame.

The “not even on the napkin” approach I had in my head was based on JXTA, of course (a JXTA-based solution would remove the need to hack your network, making it end-user friendly, for one thing.) Rather than pushing updates to subscribers, I envisioned peers using the Discovery service to advertise and search for feeds. To wit: rather than polling feed A every hour, the peer would first try to discover a copy of the feed less than one-hour old. If none could be found, only then would it poll feed A’s source, and publish an appropriately time-stamped advertisement for other peers to discover. The FeedTree “push” approach provides for more timely updates than this if the publisher injects the feed directly into the FeedTree network. Also, I can see the potential in my scheme to end up with “too many” peers polling for a feed, if their desired polling frequencies are too much out of synch. This requires considerably more thought. FeedTree also allows multiple “volunteers” to poll a feed in order to maintain a certain level of timeliness, but the overall polling frequency is better controlled.

Anyway, I’m glad to see someone doing real work on alternatives to uncontrolled HTTP polling for feeds. Ultimately, though, I don’t see anything gaining wide usage that isn’t dead-simple to install and use, without regard to network topology. (Imagine that you did get FeedTree installed on your laptop at home. As soon as you take the machine to the office, or to school or wherever…you’re hosed. *sigh*) And without mass adoption and seamless mobile usage, no solution will be able to make any real dent in the scaling problem.

All Feeds Lead to Rome

There’s probably no completely painless way to deal programmatically with the tangled mess of syndication formats, but the Rome project is looking promising. Unlike the Jakarta Commons FeedParser, Rome not only parses feeds but also provides:

  • RSS-specific, Atom-specific, and generalized object models–handy if you want to persist feeds after you’ve parsed them;
  • generators for all syndication formats;
  • conversion from any format to any other.

Apparently, there’s some co-operation brewing between the FeedParser and Rome folks, which will no doubt be good news for Java feed hackers everywhere.