Trying to figure out the category tag in RSS2.0
I’m building out a module for PressForward to create an outgoing RSS feed.
Eventually the hope is to allow users to make such feeds problematically, generating them based on packages of feeds, selected feeds, perhaps even selected items. But for now, in order to prepare for integration with the machine learning program other members of the project are working on, I’m just setting it up to output the ‘All Feed’, or the set of all items aggregated by the plugin.
I want to get the RSS feed to use the fullest set of tags possible. The death of Google Reader means that the number of sites that decide to support a tag that it ignored (or to better support a tag it underused) will drive further support and use of that tag. That means that not only the fullest possible set should be incorporated, but also I want to use them as accurately as possible.
One of the interesting things about the RSS spec is that it supplies a number of methods for automated taxonomic organization of channels and associative packaging. The blogChannel tags all in the RSS2.0 example in the spec could be used to auto-create subscription packages, or to suggest packaging intelligently.
It could be the same with the category tag for the channel, if only it was a little bit clearer. Apparently, current spec use prescribes using the category tag with the domain attribute identifying a system/site that lists feeds and the value being the number under which the feed is listed.
The sample feed lists Syndic8 as the domain and gives the feed’s unique ID on that site. But this is clearly not helpful for auto-generation (as a feed must be registered first, created second). It also misses an opportunity for greater taxonomy info in the feeds. Also, limiting the the attributes to domain quashes the possibilities to leverage an API to help machines understand the relationships between feeds and between their topics and other potential topics.
Let’s take Freebase as an example. Here’s a common and well-updated set of related topics and taxonomies. It’s the perfect thing for determining a feed’s taxonomic terms and even forms those terms in the term/subterm forward-slash format prescribed in the RSS spec.
<category xmlns=”http://www.freebase.com/internet/website_category” domain=”Freebase” title=”name”>Aggregator</category>
<category domain=”Freebase” title=”mid”>/m/075x5v</category>
<category domain=”Freebase” title=”id”>/en/aggregator</category>
In the channel tags here, I use category descriptions in a way pretty similar to the spec. There’s a few modifications.
- I describe the category namespace using the Freebase notable type for my category. This is pretty much how W3 talks about using namespaces in XML. This puts the category item relative to the its parent type in the Freebase taxonomy.
- I describe all domains using the taxonomy system I’m basing them on, this is pretty much the method described in the RSS2.0 spec, as far as I can tell, but I haven’t seen anyone using external sites for taxonomy categories as opposed to listings.
- I use a title attribute to describe how these items could be used to form a Freebase MQL query.
The idea for the basis of this format is that a machine could read the channel tags, recognize the domain as a query-ready item and then take the items within, using the title attribute, to form a query via the Freebase API.
Seems like a good idea to me, though I’m far more familiar with reading XML than writing it. What do you think?
EDIT: Made some changes, here they are.