Open data at data.gov.uk

Ben Werdmuller — January 21, 2010

The British equivalent to Obama’s data.gov opened today. Over at ReadWriteWeb, Marshall Kirkpatrick points out the scale of the ambition involved:

At launch, Data.gov.uk has nearly 3,000 data sets available for developers to build mashups with. The U.S. site, Data.gov, has less than 1,000 data sets today.

[…][Unlike the US equivalent, the site] includes 22 military data sets at launch, including one called Suicide and Open Verdict Deaths in the U.K. Regular Armed Forces.

However, these are raw datasets. As Paul Clarke points out, the site only pays lip service to openness until someone comes along and turns these sets into useful reports and applications:

The only test of real success is: use. Not usefulness. Not theoretical use. Real use. Getting beyond the novelty application, the demonstrator, and the hobby lies at the heart of really untapping the potential of data.gov.uk.

Indeed, the figures that Techcrunch Europe report suggest that turning this data into something useful may be harder than it sounds:

So far over 2,400 developers have registered to test the site and provide feedback, [while] 10 applications have been created.

I left a comment on Paul Clarke’s post pointing out some potential pitfalls that may inhibit innovation, including the government’s insistence on licensing the data under Crown Copyright and their impartiality regarding Twitter. There’s also been some criticism around the lack of a common data format for each feed (although the RDF triple proudly displayed on the front page suggests this is likely to change).

Nonetheless, I believe this represents a huge step forward. Turning raw materials into useful, compelling applications that improve the users’ quality of life requires a huge amount of creativity and talent, and providing the data feeds in the first place is a crucial first step.

You can list all the available datasets here.

The death of newspapers, and why it matters

Ben Werdmuller — January 4, 2010

The Internet has, undeniably, changed our culture.

For most of the 20th century, we paid for our news, entertainment, art and literature. We allowed businesses to act as gatekeepers for this content, and accepted that the media landscape would be dictated by decisions made in the boardroom. Publishers, movie studio bosses, broadcasters and record company executives dictated what we read, saw and heard, based on financial projections. Their opinions about what was commercially viable regulated supply. Content had a price.

This situation was dictated by economic scarcity. That is to say, not only did an original work, such as a novel or a movie, cost money to produce, but each item used to distribute it, such as a book or a DVD, had its own individual cost of production. To make money, a publishing house or a movie studio needed to recoup its initial production costs for the original work, as well as the per-item cost for each book or DVD. The exception to this in the media landscape was broadcast media – television and radio – which anyone could watch for free, in exchange for a regular advertising break. However, in both distributed and broadcast media, the content needed to be commercial enough to either attract buyers or advertisers. In order to recoup the production cost. the companies involved controlled what was released according to what they thought would sell. As a result the market for content was led by supply – what the content companies deemed worthy of release – rather than consumer demand.

The first continuously-published American newspapers launched in April, 1704. Since then, their philosophy of objective journalism has played an important part in American culture. For democracy to function, a citizen must understand the facts surrounding an issue, so they can vote on it in an informed way: access to impartial information is key. One New York resident remarked in the 1840s that “one thing is certain – nowhere will you find better informed people – that is, those who better understand all the principal movements of the day, whether political, moral or religious, than the readers of a country newspaper”. As the primary method for disseminating facts and information to the public, newspapers have been fundamental to democracy.

In the first decade of the 21st century, the model for distributing newspaper content changed. In 2008, newspaper circulation in the US dropped by 4.6% on weekdays and 4.8% on Sundays. Meanwhile, visits to the top fifty news-related websites, which all are free to access, increased by 27%. Correspondingly, the first quarter of 2009 was the worst ever for newspapers, with sales plunging by $2.9 billion.

The seeds of the Internet were sewn in 1969. However, it wasn’t until Tim Berners-Lee invented the World Wide Web in 1989 that its effects on the media began to be felt. While content had been made available on the network for twenty years, it had been purely text-based, required a level of technological knowhow to operate, and needed to be accessed through specialist communications software. The Web was based on hypertext, a more accessible way of joining documents and articles together through linked topics and phrases. Most importantly, though, it brought with it the Web browser, a single portal for accessing all content, and allowed the use of embedded images, movies and sound.

In 1992, the Internet was opened for commercial access, and online services like AOL, Prodigy and Delphi began offering connectivity. Anyone could run a site on the Web, which was now accessible to millions of people worldwide. In 1993, Global Network Navigator became the first online publication to support itself with interactive advertising banners, and the path forward was clear: newspapers could make their content available for free to anyone in the world with Internet access, and pay for it with advertising. Due to the nature of the network, once a piece of content had been produced, the cost of disseminating it indefinitely was negligible. The barrier to entry had also been dramatically lowered: anyone could publish news without having to establish a distribution network. Other advertising-supported sites like the Drudge Report, the Huffington Post and opinion-orientated “Web logs” like DailyKos began to spring up. The former media gatekeepers were no longer an effective part of the news ecosystem.

These events moved newspaper content beyond the scarcity model. Wikipedia says this about scarcity: “Goods that are scarce are called economic goods. […] Other goods are called free goods if they are desired but in such abundance that they are not scarce, such as air and seawater”. Thanks to the Internet, content became like air and seawater: almost infinitely abundant, and free. The possibilities provided by Internet advertising seemed to have heralded a new era.

Internet advertising has a major benefit over its printed cousin: it can be targeted towards its audience, and statistics about advertisement effectiveness and reader engagement can be captured in real time. Advertisers know exactly how many people have responded with an advertisement, and can tailor it to a particular viewing demographic. Contrast that with the print medium, where by necessity everyone must see the same advertisements, and advertisers must make inferences from the newspaper’s readership statistics and their own sales to determine an advertisement’s effectiveness. It should be no surprise that in addition to its $2.9 billion in lost sales, print advertising sales in American newspapers declined by $7.5 billion in 2008.

Given its theoretical superiority, the loss of newspaper advertising revenue in print should have been made up for online. However, this is not the case. Scarcity provided a captive market: often there were only one or two newspapers available in any particular location. Suddenly, with the advent of the Web, there were tens of thousands of titles available everywhere. As a result, what had previously been a supply-constrained readership that read a relatively small number of sources fragmented into a demand-driven one that read articles in the most convenient way to them, from whichever source was most conveniently available. Competition for readers had become fierce, and the abundance of publications willing to host advertising meant that prices were much lower.

Furthermore, a lot of advertising that had traditionally been placed in newspapers was now being cannibalized by new, specialized websites like Craigslist and Monster.com. As New York University’s Clay Shirky notes, these new companies “all have the logic that if you want to list a job or sell a bike, you don’t go to the place that’s printing news from Antananarivo and the crossword puzzle. You go to the place that’s good for listing jobs and selling bikes.” Newspapers, or even their associated websites, were no longer hubs for local information. People were visiting specialized sources for each kind of information they needed.

Shirky also points out that the alignment of advertising and journalism was always going to be short-lived: “the commercial success of newspapers and their linking of that to accountability journalism wasn’t a deep truth about reality. Best Buy was not willing to support the Baghdad bureau because Best Buy cared about news from Baghdad. They just didn’t have any other good choices.” In other words, the advertising attention they received was because they were the only, rather than best, option. As soon as the Internet opened up more efficient avenues, the money flowed away.

To replace this vacuum, some newspapermen are attempting to rebuild a captive audience through other means. Rupert Murdoch, the head of News International (the multinational news corporation that owns the Fox News Channel, the Wall Street Journal and the New York Post, among others), announced in the summer of 2009 that he would begin charging for access to all of his newspaper’s online content, from the Wall Street Journal right down to the Sun. With it, they will also ban readers from electronically sharing content with their friends, which is a kind of social word-of-mouth marketing that has driven readership levels in recent years. As Chase Carey, News International’s Chief Operating Officer, puts it: “we believe customers value quality journalism. We need to get paid for our product as it shifts to the digital world.”

Murdoch’s announcement sent a strong signal to the rest of the newspaper industry, and split commentators down the middle. Consumers, after all, were now used to getting their content for free. Both the music and movie industries had been having a very difficult time convincing their customers to purchase rather than pirate their wares. On the other hand, it was clear that making content free and advertising-supported was not delivering the revenue that publishers had been expecting. Variety, the entertainment trade newspaper, had experimentally made all its content available for free online in 2006. Although their website’s readership flourished, advertising dollars did not appreciably increase. On December 17, 2009, the “pay wall,” as website pages demanding payment for content are known, was re-established.

Indeed, a recent decision by the Dallas Morning News to bring its editorial department under the control of its advertising sales division (brought to my attention by Paul Adrian of Press for the People) would seem to support the idea that news content should be directly paid for. The old supply-driven model allowed editorial departments to maintain journalistic integrity: companies might have been ticked off by a newspaper article, but where else could they place their advertising? However, in today’s multi-source media, the loss of a valuable advertising contract is a very real possibility. The situation at the Dallas Morning News may help ensure the newspaper’s longevity, but it results in subjective journalism that is at the whim of overriding commercial concerns. Arguably, the only way forward for objective journalism is to charge the people who value it.

However, serious questions are being asked about the viability of this route. In particular, how willing will people be to pay for content, even from a trusted newspaper, now that there are thousands of competitors giving it away for free online? “When we look at why people quit buying the newspaper, it’s overwhelmingly because ‘I can get it for free online,’” notes William Dean Singleton, the CEO of the fourth-largest American newspaper company, MediaNews. It may not be possible to force an artificial scarcity in news reporting without all newspapers charging for it at the same time – something that would require widespread collusion in the industry. With the exception of reporting niches like finance, where, according to Shirky, “data is valuable in inverse proportion to its availability (unlike editorials, say, or political reporting),” most consumers prefer to receive their content for free. In the mainstream, Shirky suggests, “the key questions for the average publisher contemplating pay walls are: How serious will that competition be? How many users will you lose? Will banning sharing create a defensible advantage? And the answers are: crushing, most, and no.”

How, then, will objective journalism survive? One emerging suggestion is that we must de-couple journalism from newspapers. We may have to accept that the latter may become extinct in order to save the former. After all, it’s the factual reporting and analysis that are valuable to our society, rather than the bundles of low-grade paper they are printed on. I would argue that those things, when provided in a thoughtful way that makes full use of current technology, are worth paying for.

As O’Reilly Publishing’s online editor Kurt Cagle puts it: “When a previously thriving industry seems to be dying, it is most likely because the services that it initially provided are becoming obsolete. It is better in this situation to rethink what such services should provide, then build a niche for it. Otherwise, you’re just wasting money.”

It’s an open question, and one I intend to help address in 2010.

Public IT project hell: let’s make government work for us

Ben Werdmuller — December 3, 2009

Why does it cost $235 million to integrate a few IT systems?

Johannes Ernst contrasts the Yahoo/Facebook deep integration announcement with the US government’s announcement that they will spend $235 million on integrating incompatible healthcare IT systems, and asks some pertinent questions:

I assume we all agree that an environment in which leading-edge companies innovate on their own to the benefit of their customers is better than one in which the government has to spend large amounts of money to drag along kicking and screaming “participants” — as it is so common in health IT. How do we turn US healthcare IT from the latter to the former?

One might equally substitute education, or local councils, or law enforcement. It’s a widely-accepted truth that public IT endeavors suck, and that enforcing data standards across disparate public bodies is like herding confused, angry cats into a very wet bag. It’s also true that commercial web services have been very good at integrating for the good of their customers, often without any money (let alone $235 million) changing hands.

I do think there’s a false distinction that’s been made here: public bodies and government departments tend to be swamped in a sea of bureaucracy that prevents them from moving or changing as nimbly as many commercial companies. (Of course, as companies begin to become institutionalized through age and size, they also become less nimble: take Microsoft and IBM.) Many of these restrictions are necessary for the simple reason that they’re using our money, and some regulation is required to ensure tax funds are being spent wisely and benefit the wider public good. We don’t want people to just walk off with it.

Our tax dollars at play

It’s also widely-accepted that our tax dollars are not spent wisely, and often don’t benefit the wider public good. Public bodies are full of inefficiencies, in part because of the bureaucracy involved. I’ve certainly worked within university environments where entire departments of people could reasonably be described as incompetent, but had integrated themselves so well into the system that they had become a required port of call in the bureaucratic workflow. I’ve also seen fully private companies formed using university money and resources earmarked for public research, and government grants essentially spent on beer and travel. These are the kinds of inefficiencies and sanctioned fraud that must be stamped out.

Public bodies and private companies are different in one major respect: their stakeholders. It is a legal requirement for shareholders in a company to have access to the company returns, board minutes and so on (although a wider cloak of privacy is often necessary). In a public body, the stakeholders are the public, yet we often don’t have access to details like financial statements, minutes and decision-making rationale. In Britain, an attempt to get government departments to work like commercial companies has resulted in a ridiculous system where departments must pay each other and the British taxpayer often doesn’t have a legal right to the information they produce.

The public is the board

Ultimately, in a democracy, the public should be the board of directors. Genuine public oversight hasn’t been possible before, but transparency and accountability are now possible via the Internet. We don’t need political parties and administrations to be our eyes and ears any more; we need them to be our hands, and act on our behalf. We need to be able to see the inner workings of public bodies: not just the numbers, but the actual internals and decisions. With genuine public oversight in a way that ensures the bodies know they’re being watched, and governments obligated to maintain these bodies for direct public benefit in a way that’s responsive to the public, costs should go down. It’s not perfect – and Switzerland has recently shown us the dangers of having frequent public referendums – but given the spending, inefficiency and fraud inherent in the system, we can no longer trust the government to do this on our behalf.

Net neutrality

Ben Werdmuller — October 29, 2009

Please take five minutes to watch FreeForm’s video about the open Internet, and then share it with as many people as you can:

Then, if you’re a US citizen, head over to the Save the Internet Campaign and sign their petition to congress to preserve net neutrality. Also see FreePress’s links on how some telecoms companies are artificially skewing the debate and presenting the false impression of a grassroots movement against net neutrality.

(Still confused about what net neutrality is? Wikipedia has a great overview.)

Learning on the social web

Ben Werdmuller — June 18, 2009

ScienceBlog reports that on Saturday, Carl Whithaus will announce the preliminary results from a California Department of Education study into increasing academic achievement using computers in 4th grade classrooms (emphasis mine):

During the first year of the two-year study, student achievement increased 27.5 percent, according to Whithaus, who is principal investigator of a study to evaluate the project’s effectiveness.

Computer use – and particularly, online community engagement – increases engagement with formal learning, which is great news for the e-learning software market. But I’m particularly interested in the effect of networks on informal learning – specifically, learning from our activities on the web.

Learning happens when two sets of experiences and assumptions are exposed with each other – in other words, when we communicate. The web is the most globally efficient communications method the world has ever seen, and as a result, I believe, may rapidly transform our world culture for the better.

Last month, I met with J. Nathan Matias from the World University Project, a project that aims to evolve higher education by shedding light on how people learn and teach around the world. His intent is to highlight experiences that people in the west have largely not been exposed to, and in so doing advance mutual understanding between our academic systems. It’s a brilliant idea, which takes advantage of the potential of a universally accessible global communications network.

Recently, the Iranian election swamped Twitter, to the point where they rescheduled maintenance in order to minimize the effect on dissidents in the country. Suddenly, because Iranian dissidents were online and conversing with people from the west, Iran seemed less like a scary, far-off country filled with terrorists and more like – gasp – a country filled with actual human beings. Clay Shirky had this to say:

I’ve been thinking a lot about the Chicago demonstrations of 1968 where they chanted “the whole world is watching.” Really, that wasn’t true then. But this time it’s true … and people throughout the world are not only listening but responding. They’re engaging with individual participants, they’re passing on their messages to their friends, and they’re even providing detailed instructions to enable web proxies allowing Internet access that the authorities can’t immediately censor. That kind of participation is really extraordinary.

On a smaller scale, we’re now interacting with people from other walks of life, with markedly different sets of skills and interests, on a daily basis. The opportunity available to us is not just to get our message out on an unprecedented scale – but to get other peoples’ messages in, and in the process make ourselves more educated and informed than we’ve ever been. On a personal level, it can help us with our fourth grade homework; on a societal level, it’s a revolution.

Supporting freedom of speech

Ben Werdmuller — May 13, 2009

BarCamp Transparency UK OutMap is sponsoring BarCamp Transparency by donating a portion of my time to developing the website (for which I’d already provided the copy), as well as providing Twitter walls and projectors on the day. If you’re in the UK and interested in open government, cyber activism or social media ethics, I highly recommend you keep the 26th of July free for a trip to Oxford. Some very high profile people are attending, and the discussions promise to be amazing. And, hey, if that’s not enough for you, mention that you found out about the event from this blog and I’ll buy you a beer.

On a not-entirely-unrelated note, I want to make you aware of GlobalVoices Advocacy, which aims to create a global anti-censorship network of bloggers and online activists in the developing world. This is important work; one of the really exciting aspects of the web is the way information can spread and undermine oppressive legislation. It’s also dangerous, as blogging in places where freedom of speech is not protected can have severe consequences. They provide tutorials on blogging anonymously, as well as blogging effectively for a cause.

Zemanta, a blogging tool that suggests content to include as you type, is offering a small funding award to the charitable cause that gets the most posts as part of their ‘blogging for a cause’ promotion. It’s a good idea, and if you like what GlobalVoices Advocacy do, maybe you could write about them too – or any other good cause that you think is deserving.

I vote for Global Voices Advocacy because freedom of speech and the fight against censorship is one of the most important fronts in the fight for human rights around the world. This is a fight that we can all participate in, without having to go through governments, and GlobalVoices Advocacy is one organization that shows us how.

This blog post is part of Zemanta’s "Blogging For a Cause" campaign to raise awareness and funds for worthy causes that bloggers care about.

There shouldn’t need to be an OpenStreetMap

Ben Werdmuller — April 9, 2009

OpenStreetMap is a project whose aim is to make a free map of the world. It’s extremely impressive: as well as searching the map in a normal way, the data is exportable via XML, PNG, JPEG, SVG and more, under a Creative Commons Attribution-Share Alike license.

But it shouldn’t need to exist.

In the US, federal government-created maps (and other data) are considered to be public information, and released freely. In the UK, such maps are subject to Crown Copyright, and the Ordnance Survey has been set up as a trading organisation that legally must make money from its efforts.

This was an archaic idea at its inception, but makes even less sense now. The economy is in dire straits, and what it should be doing is providing taxpayer-funded data for use by companies; this kind of data in particular could give British businesses a flying start. Instead, it chooses to make money from them instead, and web services are left to projects like OpenStreetMap, as well as US businesses like Google, in order to source information.

The Guardian’s Data Store is one British attempt to rectify the situation, but ideally all data in the public interest should be released in a format that is easily consumable by third-party applications. As well as helping entrepreneurs and small businesses, it’ll allow for a deeper understanding of, and participation in, how our country is run. Which can’t be a bad thing – can it?

BarCamp Transparency

Ben Werdmuller — April 7, 2009

One of the outcomes of BarCamp Oxford has been the organisation of a new BarCamp about transparency and ethics – a mix of social media, open government and cyber-activism.

It’s in its early planning stages, but it’ll take place sometime over the summer here in Oxford. If you’re interested, I suggest you take a look at the BarCamp Transparency wiki and throw your name into the ring. I was asked if I’d help organise, and while I can’t provide as much time as I’d like to due to prior commitments, I’ve volunteered to discuss openness in social media, provide web resources and help out with the event itself.

Transparency is hugely important, and becoming more so. As citizens we have more and more demands upon us to surrender our privacy and aspects of our civil liberties, but the government and politicians on all sides have been reluctant to provide more oversight into their activities. Meanwhile, social technologies have the power to enable us to find and share public information, organise ourselves into groups, and have more say in how our country is run.

This is a vital event that already sounds very promising indeed.

Facebook has no need for deleting data

Ben Werdmuller — April 6, 2009

Niall Kennedy has written an interesting post about Facebook’s data storage. They’ve written a proprietary filesystem to store photos in order to cut costs (up to now they’ve apparently been adding a $2 million NetApp storage system every week).

It turns out they’ve decided they don’t need all the features you’d find in a traditional file system (emphasis mine):

Traditional file systems are governed by the POSIX standard governing metadata and access methods for each file. These file systems are designed for access control and accountability within a shared system. An Internet storage system written once and never deleted, with access granted to the world, has little need for such overhead.

It would be nice if someone from Facebook could confirm that they do, in fact, have the ability to physically delete a photo or other items of data, and that this does, in fact, happen on the back end if you ask it to.

From what we understand of Facebook’s architecture, it probably doesn’t. When you post something, it gets copied and broadcast to your friends’ feeds; the data is out there forever. Even when you delete an account, your details aren’t fully removed. Surely, if nothing else, this is a legal minefield for the company?

Gender differences on the new frontier

Ben Werdmuller — March 10, 2009

It’s a commonly accepted fact that computing is a male-dominated industry, but I was shocked by the scale of the inequality. Okay, this is kind of unscientific, but take a look at these statistics:

  • Female population of the world: 49.8%
  • Female population of Facebook: 55%
  • Female population of social networks as a whole: 54.7%
  • Percentage of people awarded undergraduate computer science degrees by PhD-granting institutions in the US and Canada in 2006-7 who were women: 12%

While social media usage is skewed ever so slightly towards women, a whopping 88% of the people who study to learn the skills to build these tools are men. This is at a time when, in science generally, women receiving undergraduate degrees are increasing as a percentage year on year.

Some of the reasons for this have been covered a lot over the past year. This 2007 interview with Aaron Swartz (who worked on Creative Commons and is now behind the awesome government site Watchdog.net) contains some interesting thoughts on discrimination on the basis of both gender and race:

If you talk to any woman in the tech community, it won’t be long before they start telling you stories about disgusting, sexist things guys have said to them. It freaks them out; and rightly so. As a result, the only women you see in tech are those who are willing to put up with all the abuse.

[...] The denial about this in the tech community is so great that sometimes I despair of it ever getting fixed. [...] It’s an institutional problem, not a personal one.

Last year, Chris Messina called out a BusinessWeek article for disproportionately featuring the male participants at Web2Open, a Web 2.0 technology unconference Tara Hunt had predominantly organized. He followed it up this month with another post about the Future of Web Apps as a white boys’ club:

Turns out, white men also don’t have the monopoly on the best speakers – even in the tech industry – yet their ilk continue to make up a highly disproportionate number of the folks who end up on stage. And that means that good content and good ideas and important perspectives aren’t making it into the mix that should be, and as a result, audiences are getting short-changed.

This isn’t just about technology, and it isn’t just about the commercial web. We’re in an era where everything is going online; Barack Obama would arguably not be President of the United States without his engagement with grassroots social media technologies, and he is certainly continuing to embrace them into his Presidency. Yet if those technologies are effectively controlled by a minority of the population, that population’s biases and predispositions seep into how they’re designed, how they’re built, and ultimately how they work in practice.

Although I’ve picked out gender here, the same is doubtless true regarding race and sexuality discrimination in the tech sector, although the numbers haven’t been as widely published. As computing becomes more and more important in society as a whole, it becomes more and more important to ensure the people who help shape it are selected fairly and represent a cross-section of the people it serves.

Update: Lots of really interesting links in the comments, including Katie Piatt’s recommendation of Ada Lovelace Day, which encourages people to blog about women in tech.

Meitar Moscovitz points me to Will the Semantic Web Have a Gender?, a ReadWriteWeb article from last year about the possibility that the semantic web will reflect a predominantly male attitude to the world.

Image by mouton.rebelle and released under a CC-Attribution-Noncommercial license.

Next Page »
Creative Commons License
Except where stated otherwise, all posts in this weblog are licenced under a Creative Commons Licence.