February 22, 2006

Shel Israel, on his Naked Conversations blog, has an interesting comment about Technorati’s filter by authority feature. It seems to have been inspired by Google, in the sense that authoritative blogs are considered to be the ones more widely linked to, an approach to which Shel responds:

Big numbers do NOT necessarily mean you have the biggest influence. If you have a political blog and you have only three readers but they happen to be the US, Russian and Chinese heads of state, your numbers may suck at Technorati and PubSub, but you most assuredly have great influence. [...] It has been using the word “authority” where it should be using the word “popularity” pretty much since it started.

The question, then, is how to extract actual authority, rather than popularity, from usage. This has to be algorithmic: if you ask ten randomly selected people which sources have the most authority, their lists are likely to differ. So we need to standardise on what authority actually means, and then somehow turn that into a function that we can apply in real time to actual weblog usage data.

The problem, of course, is that authority is necessarily subjective, because everybody’s needs differ. The best search engine is then one that takes your requirements into account and tailors results accordingly, but for a wide variety technical reasons this isn’t realistic – can you imagine building separate results pages for 20,000,000 users? Although a certain amount of customisation per user is possible, the main trunk of results for a wide-reaching search engine needs to be a best fit for all users. Somehow you need to build a base that roughly orders everything so it can be searched easily, and then allow each user to fine-tune the results to their liking. Unless you have a relatively small number of sources, it’s impractical for a near-instantaneous search to poll the entire sphere of all possible resources for your requirements each time.

A personal search engine is different: you take the sources you trust and manually add them to the database. The resources tab in Elgg can be used like this, as can any RSS aggregator, although depending on how you’re searching, it’s best to use one with full-text search capabilities across your subscriptions (which will probably appear on the next Elgg roadmap). Perhaps we could use this data to provide a two-tier search engine: users subscribe to content they’re interested in, and then this usage data is combined with the content links to provide a richer, dynamic idea of how important a source is. But this is flawed too – are all subscribers equal? Should we apply more weight to the US, Russian and Chinese heads of state – or professors at research institutions, for example – than John Doe in Islington? Do we give people the option to flip between both models? Google’s link-based model works – or gives the appearance of working – because so many websites are static and informational. The blogosphere is fluid and almost entirely opinion-based; how do you go about modelling that?

We’re back to the question of how to define authority. Findability has a thread of ideas; Clay Shirky has a post about the nature of authority as it relates to Wikipedia. Perhaps some of you have different ideas?

