Data portability and the open data movement (“the open web” for simplicity’s sake) revolve around the idea that you should be able to take your data from one service to another without restriction, as well as control who gets to see it and how. Very simply, it’s your data, so you should have the ability to do what you like with it. That means that, for example, if you want to take your WordPress blog posts and import them into MovableType (WordPress’s competitor), you should be able to. Or you should be able to take your activity from Facebook and include it in your personal website, or export your Gmail contacts for backup or transfer to a rival email service.
You can do this on your desktop: for example, you can open a Word document in hundreds of wordprocessors, and Macs will happily talk to Windows machines on a network. Allowing this sort of data transport is good for the web in the same way it’s good for offline software: it forces companies to compete on features rather than the number of people they can lock into their services. It also ensures that if a service provider goes out of business, a user’s data on that service doesn’t have to disappear with it.
In 2007, before the open web hit most peoples’ radars, Marc Canter organised the first Data Sharing Summit, which was a communal discussion between all the major Silicon Valley players, as well as many outside companies who flew in specially to participate (I attended, representing Elgg). One of the major outcomes was the importance of central control: the user owns their data. Marc, Joseph Smarr, Robert Scoble and Michael Arrington co-signed a Bill of Rights for the Social Web which laid these out. It wasn’t all roses: most of the large companies present took issue with the Bill of Rights, and as I noted in my write-up for ZDNet at the time, preferred the term “data control” rather than “data ownership”. The implication was simple: users didn’t own the data they added to those services.
Since then, the open web has been accelerating as both an idea and a practical reality. Initiatives like Chris Saad’s Dataportability.org, Marc Canter’s Open Mesh treatise, as well as useful blunders like Facebook’s recent Terms of Service mis-step, have drawn public attention its importance. Facebook in particular force you to license your content to them indefinitely, and disable (rather than delete) your account details when you choose to leave the site. Once you enter something into Facebook, you should assume it’s there forever, no matter what you do. This has been in place for some time to little complaint, but when they overreached with their licensing terms, it made international headlines across the mainstream press: control over your data is now a mainstream issue.
Meanwhile, technology has been improving, and approaches have been consolidated. The Open Stack is a collection of real-world technologies that can be applied to web services in order to provide a base level of openness today, and developments are rapidly emerging. Chris Messina is leading development around activity streams portability, which will allow you to subscribe to friends on other services and see what they’re up to. The data portability aspect of the open web is rapidly becoming a reality: you will be able to share and copy your data.
Your data will be out there. So, what happens next?
The same emerging open web technologies which allow you to explicitly share your data from one service to another will also allow tools to be constructed cheaply out of functionality provided by more than one provider. Even today, a web tool might have a front end that connects behind the scenes to Google (perhaps for search or positioning information), Amazon (for storage or database facilities), and maybe three other services. This is going to drive innovation over the next few years, but let’s say a user on that conglomerated service wants to delete their account. Can they reliably assume that all the component services will respect his or her wishes and remove the data as requested?
As web tools become more sophisticated, access control also becomes an issue. When you publish on the web, you might not want the entire world to read your content; you could be uploading a document that you’d like to restrict to your company or some other group. How do these access restrictions persist on component services?
One solution could be some kind of licensing, but this veers dangerously close to Digital Rights Manamgent, the hated technology that has crippled most online music services and players for so long and inhibited innovation in the sector. Dare Obasanjo, who works for Microsoft and is usually a good source for intelligent analysis, recently had this to say:
[..] I’ve finally switched over to agreeing that once you’ve shared something it’s out there. The problem with [allowing content to be deleted] is that it is disrespectful of the person(s) you’ve shared the content with. Looking back at the Outlook email recall feature, it actually doesn’t delete a mail if the person has already read it. This is probably for technical reasons but it also has the side effect of not deleting a message from someone’s inbox that they have read and filed away. [..] Outlook has respected an important boundary by not allowing a sender to arbitrarily delete content from a recipient’s inbox with no recourse on the part of the recipient.
The trouble is that many services make money by selling data about you, either directly or indirectly, and these are unlikely to relinquish your data (or information derived from it) without some kind of pressure. I agree with Dare completely on the social level, with content that has been shared explicity. Certainly, this model has worked very well for email, and people like Plaxo’s John McCrea are hailing the fall of ‘social DRM’. However, content that is shared behind the scenes via APIs, and content that is shared inadvertently when agreeing to perform an action over something like OAuth or OpenID, need to obey a different model.
The only real difference between data shared as a deliberate act and data shared behind the scenes is user interface. Everyone wants the user to have control over data sharing via a clear user interface. Should they also be able to enforce what’s done with that data once it transfers to a third-party service, or should they trust that the service is going to do the right thing?
The open web isn’t just for trivial information. It’s one thing to control what happens to my Dopplr information, or my blog posts, or my Flickr photographs. I really don’t mind too much about where those things go, and I’d imagine that most people would agree (although some won’t). Those aren’t, however, the only things the web is being used for: there are support communities for medical disorders, academic resources, bill management services, managed intranets and more out there on the web, and these will begin to also harness the benefits of the open web. All of them need to be careful of their data. Some of them need to do so for legal reasons; some of them need to do so for ethical reasons. Nonetheless, they could all benefit from securely being able to share data in a controlled way.
To aid discussion, I propose the following two categories of shared data:
- Explicit shares – information that a user asks specifically to share with another person or service.
- Atomic objects like blog posts, contacts or messages
- Collections like activity streams
- Implicit shares – information that is shared behind the scenes as a result of an explicit share, or to provide some kind of federated functionality.
- User information or shadow accounts transferred or created as a result of an OpenID or OAuth login
- User settings
- User contact details, friend lists, or identifiers
For the open web to work, both clearly need to be allowed. At a very base level, though, I think that users need to be aware of implicit shares, in a clear, non-technical way. (OpenID and OAuth both allow the user to grant and revoke access to functionality, but they don’t control what happens to the data when access is granted once, which is likely to be kept.) They also need to provide a facility for reliably controlling this data. Just as I can Creative Commons license a photograph and allow it to be shared while restricting anyone’s ability to use it for commercial gain, I need to be able to say that services can only use my data for a limited time, or for limited purposes. I’m not calling for DRM, but rather a published best practice that services would adhere to and publicly declare their allegiance to.
Without this, the usefulness of the open web will be limited to certain kinds of use cases – which is a shame, because if it’s allowed to reach its full potential, it could provide a new kind of social computing that will almost certainly change the world.