Danger in the cloud: a proposal

October 12, 2009 | 8 comments

In response to recent events, I’d like to propose a different kind of web service that overcomes the privacy and reliability issues with cloud web applications, while providing a solid business model for both application developers and service providers, as well as a seamless, easy-to-use experience for end users.

The T-Mobile storm

Over the weekend there’s been a storm surrounding the T-Mobile Sidekick, which is produced by Microsoft’s Danger subsidiary. It turns out the device stores the primary copy of data like calendar and address book information in the cloud rather than on each device; perhaps a fair proposition if you knew you could trust Microsoft’s servers. Unfortunately, said servers went down last week, and Microsoft didn’t have a working backup. Sidekick users suddenly found themselves without their personal information.

Is cloud computing safe?

Understandably, this has sparked a wider conversation about computing in the cloud. AppleInsider summed it up:

More immediate types of cloud services take away users’ control in managing their own data.

While Ina Fried over at CNet noted:

The Danger outage comes just a month before Microsoft is expected to launch its operating system in the cloud–Windows Azure. That announcement is expected at November’s Professional Developer Conference. One of the characteristics of Azure is that programs written for it can be run only via Microsoft’s data centers and not on a company’s own servers.

The issues surrounding cloud computing have been discussed for a while, and aren’t limited to these sorts of accidents; here’s a post I wrote in 2007 about the rights we ought to have over our cloud data. Partially because of the risks involved, and the risk of leaky data, some kinds of organizations and enterprises simply can’t use cloud computing services. (In the UK, for example, check out the requirements imposed by the Data Protection Act.) At the same time, the Sidekick debacle shows there are clear risks to end-user consumers too.

Despite this, the benefits of cloud computing are obvious, particularly for the organizations that can’t use them: device-independent applications and data we can access and use from anywhere.

Can we have the best of both worlds?

The personal computing model is relatively secure: you install applications on your computer, and they sit on your local hard drive, along with your data. Assuming there hasn’t been a security breach, or you haven’t explicitly provided access to your data over a network or through a direct action like emailing it, it’s safe.

On the other hand, because your applications and data are locked away on your hard drive, you generally have to have direct access to it in order to use them. There are remote desktop solutions like VNC, but these are clunky and fairly useless over a low bandwidth connection.

Web applications that store their data in the cloud overcome this obstacle, but lose the security of sitting on your own computer.

What if there was a halfway house between these two situations?

The personal web server that works

Theoretically, anyone can run their own web server, right now, that allows them to install web applications in a more secure, controlled environment and access them from anywhere. But there are some very good reasons why they don’t:

  • You need system administrator skills, usually on top of Linux skills, to do it.
  • Web applications – even relatively easy-to-install ones like WordPress or Elgg – are fiddly. There are configuration files, directory permissions and (potentially) source repositories to contend with.
  • The web applications you can install on your own server are often not as good as the ones you can get in the cloud.
  • When something breaks, it’s your own responsibility to fix it.
  • Servers are expensive.

What if we could fix all of these things at once? Enterprises, organizations and individuals could have their own, more secure environment that would allow them to use the cloud applications they needed with fewer security risks, while enjoying the ease-of-use and immediacy that the cloud provides.

One of the reasons everyone’s leaping to copy the iPhone’s app store business model is that it just works. Sure, you’re forced to delegate root control of the phone to iTunes, and the operating system places some seemingly arbitrary restrictions on what applications can and can’t do. But the handset works, and installing software is easier than on any other platform. The truth is, most ordinary users don’t care about those restrictions. Hell, I’m a computer scientist software developer entrepreneur power user, and I’m just happy the thing works. (Context: my previous phone ran Windows Mobile, which doesn’t.)

Imagine if you could get your own server environment that was as easy to use as the iPhone. It would look something like this:

Front end & business model

  • You sign up for the service, possibly for a small monthly fee, possibly for free (depending on the service provider). Alternatively, if you’re more technical / an enterprise / an organization, you install it on your own infrastructure. The platform is available for free and could be open source.
  • From a secure web-based admin panel, you can add and remove users (although the platform optionally also supports Active Directory and similar standards, as well as OpenID), and install / uninstall applications from a centralized app store with the usual features: ratings, search, similar apps, etc. Installation is one-click, and upgrades are similarly seamless. (That WordPress “what, I have to upgrade again?” problem: solved.)
  • Much like the iTunes app store, applications may be free, or may cost a small amount. Applications may impose licensing restrictions based on number of users: for example, the app costs $4.99 for up to 5 users, $19.99 for up to 25, etc.
  • As with the iTunes app store, the application store provider takes a cut – and so does the service provider. This creates a strong incentive for multiple vendors to provide hosted services for little cost. It also effectively creates a discount for enterprise, organizational and technical users, who can bypass a service provider. The payment to the web application developer also, for the first time, creates a solid commercial marketplace for high quality web application products, while the free option allows open source vendors to distribute as normal.

Technology

  • Behind the scenes, the server runs existing open source technology: Apache, Tomcat, PHP, Perl, Python, Ruby on Rails, MySQL, Postgres, etc. However, there are restrictions on how applications must be structured, behave and share their data. This allows the one-click install and upgrades to function correctly. Importantly, though, users of the system need never worry about the underlying framework.
  • The platform has a central data store that all applications may access via an API. This data store is fully exportable, allowing (for example) a datastore stored with a service provider to be moved to an internal setup as an organization expands. As with the iTunes app store, applications are linked to a store account rather than a physical machine, so the application licenses are portable too.

Of course, this wouldn’t replace standard web servers. What it does provide, however, is a simple cloud operating system that simultaneously works in a more secure, dependable way than existing services, would be more acceptable to many organizational users, and provides a genuine business model for web application developers.

The web is now an end user application platform, but still behaves like a lightweight document store. To obtain the level of software customization we all enjoy on our home PCs, a much higher level of technical competence is required. I strongly believe that this situation must change for the web to be a viable commercial application framework.

Making the most of the web, right now

June 10, 2009 | 1 comment

I believe a truly decentralized social web is required to fulfill the web’s potential as a platform for business collaboration, and I’m very interested in helping to push the technical and conceptual boundaries in that direction. I spend a lot of time on this blog writing about that, but I think it’s also important to remember that a huge amount is possible using the technologies, standards and ideas that we can currently pick up and use.

Creating a new web tool, or adapting one for your own use, can be a bit like pitching a movie: a lot of people come to me and say things like, “it’s like Delicious meets Youtube, but for the iPhone”. That’s great, and can result in some very interesting ideas, but I think it’s always best to go back to first principles and ask why you need the tool to begin with. My post The Internet is People addressed some key points on this:

  • Your tool must plug into an existing network of users, or be useful for user 1 (the first user to sign up). Delicious lets you save your bookmarks into the cloud; Flickr lets you easily upload photos so other people can see them. Both services come into their own when you connect with other users, but the core of the site is useful before you’ve done so. Facebook is different, but it had the Harvard real-world social network to plug in – and it now acts as a useful aggregation of your other activity on the web, which arguably is useful for user 1.
  • You can’t build a site and assume people will come and use it. It’s a lot of hard work, even when the technology is ready for launch; you need to lead by example, constantly adding content and using the site as you would like it to be used. Not to mention the hours you have to put in promoting it elsewhere.

The feature set itself should be tightly focused:

As each tool should focus on one particular network, or at least type of network, I’d argue that the exact feature set should be dictated by the needs of that network. Educational social networks might need some coursework delivery tools; a network for bakers might need a way to share bread recipes. The one common feature in any social network is people; even profiles may not be entirely necessary.

I mention at the end of the post that these principles were the guiding ideas behind the design of the Elgg architecture. They’re now the principles behind the tools and strategy I develop for my clients.

In this blog you’ll find lots of talk about new technologies, innovative approaches and the ethics of social media. These allow us to build interesting new tools, but they always sit on a firm foundation: the Internet is just people connecting and sharing with each other, and the purpose of web tools is to make that as easy as possible.

Social networking: beyond the silo

June 8, 2009 | 1 comment

  1. The rise of social networking
  2. Monetization vs. collaboration
  3. The open web
  4. Fluid collaboration

The rise of social networking

Social forces have been the driving force behind application innovation on the web. Whereas previously we might have looked to advances in computer science for new directions, now some of the most dramatically impactful applications are lightweight, simple, and technologically unimpressive. The best new web applications have centered around collaboration, sharing and discovery with other people.

Correspondingly, enterprises have been relatively quick to pick up on this trend, and software vendors have been quick to grab the market. In an Intranet Journal article earlier this year, Kara Pernice, managing director at the Nielsen Normal Group, had this to say about the rise of social technology on the intranet:

"In the 9 years [the Intranet Design Annual, which highlights the ten best-designed intranets of the year] has been coming out (since 2001), I’ve never seen a change quite as great as this one."

On the Internet at large, social network use is growing at ten times the rate of other activities and now accounts for 10% of all online time, according to Nielsen Online in this March 2009 report (PDF), and is now more popular than email. Jerimiah Owyang has a list of more relevant statistics over on this digest blog post. Executive summary: social networks are big, transformative in terms of how we communicate and share information, and growing at an enormous rate.

Monetization vs. collaboration

Wikipedia defines a “walled garden”, in software terms, as being:

[..] A closed set or exclusive set of information services provided for users (a method of creating a monopoly or securing an information system).

In other words, a walled garden is a system where the data can not easily be imported or exported. These are often also called data silos, after the solid buildings used for secure storage.

Facebook, the #1 social networking site in most western countries, has over 200 million users, including over 30 million who update their profiles at least once a day. The network is free to use, yet their revenue for 2008 has been estimated at around $265 million, despite a decidedly “in progress” revenue strategy.

This has traditionally required a walled garden strategy: the content that users put into Facebook has not been easily removed for export or viewing in other interfaces, in order to preserve revenue from advertising (and – although this is a hunch – revenue from statistical analysis of users’ data). It’s only been in the light of some extremely negative publicity (for example this February 2008 New York Times article) that they have begun to relax this policy and embrace the open direction that much of the rest of the web is heading in.

Speaking personally, I get more enquiries from people wanting to build something “Facebook-like” than anything else, presumably because of its phenomenal popularity. However, this kind of walled garden approach is not conducive to true collaboration; generally people who ask for this are lacking a full understanding of the processes involved in social networking.

According to Nielsen, there are almost 1.6 billion people online. While Facebook’s 200 million sounds like a lot, it’s actually a drop in the digital ocean – so what happens if I want to share a Facebook conversation with someone who hasn’t signed up? The only way is currently to email them a link and force them to register for the service. Facebook would love me to do this, of course, because they get more eyeballs to view their ads and more people to fill in profiles. But what’s the point of even being on the web if you can’t make use of the decentralized communication features that form its backbone?

If I want to collaborate effectively online centering around a resource (which could be a file, a discussion or a pointer to something external), I need to be able to:

  • Share that resource with the people who need to see it
  • Grant access for them to edit it if required
  • Notify them that it’s been shared with them
  • Restrict access from everyone else

Furthermore, I need to do this with the lowest possible barrier to entry. My aim is to collaborate, not to get people to use a particular piece of software. By restricting this process, the Facebook model hinders collaboration.

The open web

The web was designed to be an open system, and adheres to principles (notably “every object addressable”, ensuring that every resource on the web has a unique reference address) set out by Doug Engelbart for open hypertext systems generally. Because web pages are interoperable, and all use the same basic standards, any page on the web is allowed to link to any other page on the web, no matter who wrote it or where it is hosted. In many ways that’s the key to why the platform is successful: despite being fragmented across millions of computers throughout the world, it navigates like a cohesive whole and can be viewed using a single piece of browsing software. (The downside to this is that the whole platform lives or dies depending on the capabilities of the browser you use: the sad fact is that Internet Explorer users, who often don’t have a choice because of policy decisions in their working environment, are at a disadvantage.)

While the original web was content-based, the social web is collaborative and centered around live data. However, because web applications are each developed separately using different sets of back-end infrastructure, their data does not adhere to the principle of interoperability – their user interfaces all use the same basic standards and can be viewed in a browser, but the underlying applications and data models tend to not work with each other. When social networks emerged, for example, there was no way to get Livejournal and Friendster, two of the pioneers in the space, to speak the same language; you still can’t add someone as a friend on one social network from another. More recently, this has become apparent in the walled garden approaches of Facebook and others.

Not only does this situation create a bottleneck for application design, and run contrary to the underlying principles that made the web a success, but it’s also a bottleneck to better collaboration. As Tim Berners-Lee, the web’s inventor, put it recently in this essential TED talk, data needs to be linked and interoperable in the same way pages are now. Beyond that, because walled garden services are making money out of the private information we’re loading onto them, there’s a human issue regarding the overall control of that data. Marc Canter, Joseph Smarr and others codified this into a Bill of Rights for users of the social web back in 2007. Though the issue has moved on since then, the underlying principles set out there are essential for open, collaborative, social tools on the web.

While the World Wide Web Consortium works on academically-developed standards for linked data in the form of the semantic web, developers have been getting their game on trying to solve the problems of interoperability between their applications and user control over their data. Application Programming Interfaces (APIs) – published sets of instructions for programmatically querying and extending web applications – have become popular, but in a very walled garden kind of way. Arguably the most successful has been Twitter’s API, which has led to a number of high profile third-party applications like TweetDeck and Tweetie that collectively eclipse Twitter’s own website interface in volume of usage. But these APIs are their own form of walled garden: an application written for Twitter will only work with Twitter, for example. The APIs are not generalized between applications, and as such are not truly open; in many ways they’re a way for services to get more functionality and reach for free.

One of the first attempts to publicize the benefits of truly open data was Marc Canter’s Data Sharing Summit, which I wrote about at the time for ZDNet. Chris Saad’s DataPortability.org attempted (largely successfully) to brand it, and latterly the Open Web Foundation has attracted some of the web’s leading lights in order to create a single organization to handle the creation of a set of open web application standards. Many of these comprise the Open Stack, which I’ve written about before; more generally, Chris Messina has written a very thoughtful overview on the topic.

Fluid collaboration

It used to be that to use the web, you would need to sit down at your computer and log on. Those days are over; the web is becoming more and more ubiquitous, thanks to devices like the iPhone. It’s also being integrated into software that wasn’t previously connected – it’s as easy, for example, to paste the URL of an image into the ‘Insert Image’ dialog box in most word processors as it is to pick an image from your own hard disk. The open, generalized API standards being created by groups like the Open Web Foundation bring us closer to enjoying that level of integration with collaborative social technologies.

The Internet is people, not technology: tools on the web (or anywhere else) facilitate social networks, but are not the network themselves. Currently they consist of destination sites, like Facebook, LinkedIn or Twitter – places that you explicitly have to visit in order to collaborate or share. This is the currently-fashionable model, but it’s a necessarily limited view of how collaboration can take place: all of these sites thrive on the walled garden model and are designed around keeping participation within their walls.

Not everything on the Internet works this way. Email, and increasingly Instant Messaging, are two technologies that generally do not: messages on email, Jabber and to a much lesser extent Skype are peer-to-peer and do not go through a central service:

  1. You select the people you wish to collaborate (in this case, email or chat) with. Nobody but the listed recipients will be able to see the content you share with them, and it doesn’t matter if they’re using the same service as you; you don’t have to invite them to join email in the same way you have to invite people to join Facebook.
  2. You write your content.
  3. You send it.
  4. They (hopefully) send content back.
  5. The collaborative exchange lasts only as long as it’s useful, and then disappears (but is archived for reference).

Recently, Google announced Wave, a decentralized pairing of protocol and open source web application that took email and IM as its inspirations to redefine how collaborative social technologies could work. Questions have been raised about how a decentralized tool like this can work with corporate data policies present in most large enterprises and public sector organizations, but in some ways they miss the point: Google Wave is best thought of as a proof of concept for how decentralized, transient communities can work in a standard way on the web. In short, websites are a kind of walled garden in themselves: what we will return to is the idea of the web as an open patchwork of people, data and information that links together to form a whole, much stronger than the sum of its parts.

Predicting the future of social networking on the web is hard. However, I believe that as general open social technologies develop and become more commonplace, the “social networking site” will shrink in importance – instead, social network facilitators will become more and more ingrained in all the software you use. This will dramatically increase the types of content and communication that can be used, and present opportunities for much wider, more fluid and – most importantly – more productive collaboration as a whole.

User control on the open web

February 21, 2009 | 9 comments

Data portability and the open data movement (“the open web” for simplicity’s sake) revolve around the idea that you should be able to take your data from one service to another without restriction, as well as control who gets to see it and how. Very simply, it’s your data, so you should have the ability to do what you like with it. That means that, for example, if you want to take your WordPress blog posts and import them into MovableType (WordPress’s competitor), you should be able to. Or you should be able to take your activity from Facebook and include it in your personal website, or export your Gmail contacts for backup or transfer to a rival email service.

You can do this on your desktop: for example, you can open a Word document in hundreds of wordprocessors, and Macs will happily talk to Windows machines on a network. Allowing this sort of data transport is good for the web in the same way it’s good for offline software: it forces companies to compete on features rather than the number of people they can lock into their services. It also ensures that if a service provider goes out of business, a user’s data on that service doesn’t have to disappear with it.

In 2007, before the open web hit most peoples’ radars, Marc Canter organised the first Data Sharing Summit, which was a communal discussion between all the major Silicon Valley players, as well as many outside companies who flew in specially to participate (I attended, representing Elgg). One of the major outcomes was the importance of central control: the user owns their data. Marc, Joseph Smarr, Robert Scoble and Michael Arrington co-signed a Bill of Rights for the Social Web which laid these out. It wasn’t all roses: most of the large companies present took issue with the Bill of Rights, and as I noted in my write-up for ZDNet at the time, preferred the term “data control” rather than “data ownership”. The implication was simple: users didn’t own the data they added to those services.

Since then, the open web has been accelerating as both an idea and a practical reality. Initiatives like Chris Saad’s Dataportability.org, Marc Canter’s Open Mesh treatise, as well as useful blunders like Facebook’s recent Terms of Service mis-step, have drawn public attention its importance. Facebook in particular force you to license your content to them indefinitely, and disable (rather than delete) your account details when you choose to leave the site. Once you enter something into Facebook, you should assume it’s there forever, no matter what you do. This has been in place for some time to little complaint, but when they overreached with their licensing terms, it made international headlines across the mainstream press: control over your data is now a mainstream issue.

Meanwhile, technology has been improving, and approaches have been consolidated. The Open Stack is a collection of real-world technologies that can be applied to web services in order to provide a base level of openness today, and developments are rapidly emerging. Chris Messina is leading development around activity streams portability, which will allow you to subscribe to friends on other services and see what they’re up to. The data portability aspect of the open web is rapidly becoming a reality: you will be able to share and copy your data.

Your data will be out there. So, what happens next?

The same emerging open web technologies which allow you to explicitly share your data from one service to another will also allow tools to be constructed cheaply out of functionality provided by more than one provider. Even today, a web tool might have a front end that connects behind the scenes to Google (perhaps for search or positioning information), Amazon (for storage or database facilities), and maybe three other services. This is going to drive innovation over the next few years, but let’s say a user on that conglomerated service wants to delete their account. Can they reliably assume that all the component services will respect his or her wishes and remove the data as requested?

As web tools become more sophisticated, access control also becomes an issue. When you publish on the web, you might not want the entire world to read your content; you could be uploading a document that you’d like to restrict to your company or some other group. How do these access restrictions persist on component services?

One solution could be some kind of licensing, but this veers dangerously close to Digital Rights Manamgent, the hated technology that has crippled most online music services and players for so long and inhibited innovation in the sector. Dare Obasanjo, who works for Microsoft and is usually a good source for intelligent analysis, recently had this to say:

[..] I’ve finally switched over to agreeing that once you’ve shared something it’s out there. The problem with [allowing content to be deleted] is that it is disrespectful of the person(s) you’ve shared the content with. Looking back at the Outlook email recall feature, it actually doesn’t delete a mail if the person has already read it. This is probably for technical reasons but it also has the side effect of not deleting a message from someone’s inbox that they have read and filed away. [..] Outlook has respected an important boundary by not allowing a sender to arbitrarily delete content from a recipient’s inbox with no recourse on the part of the recipient.

The trouble is that many services make money by selling data about you, either directly or indirectly, and these are unlikely to relinquish your data (or information derived from it) without some kind of pressure. I agree with Dare completely on the social level, with content that has been shared explicity. Certainly, this model has worked very well for email, and people like Plaxo’s John McCrea are hailing the fall of ‘social DRM’. However, content that is shared behind the scenes via APIs, and content that is shared inadvertently when agreeing to perform an action over something like OAuth or OpenID, need to obey a different model.

The only real difference between data shared as a deliberate act and data shared behind the scenes is user interface. Everyone wants the user to have control over data sharing via a clear user interface. Should they also be able to enforce what’s done with that data once it transfers to a third-party service, or should they trust that the service is going to do the right thing?

The open web isn’t just for trivial information. It’s one thing to control what happens to my Dopplr information, or my blog posts, or my Flickr photographs. I really don’t mind too much about where those things go, and I’d imagine that most people would agree (although some won’t). Those aren’t, however, the only things the web is being used for: there are support communities for medical disorders, academic resources, bill management services, managed intranets and more out there on the web, and these will begin to also harness the benefits of the open web. All of them need to be careful of their data. Some of them need to do so for legal reasons; some of them need to do so for ethical reasons. Nonetheless, they could all benefit from securely being able to share data in a controlled way.

To aid discussion, I propose the following two categories of shared data:

  • Explicit shares – information that a user asks specifically to share with another person or service.

    Examples:

    • Atomic objects like blog posts, contacts or messages
    • Collections like activity streams
  • Implicit shares – information that is shared behind the scenes as a result of an explicit share, or to provide some kind of federated functionality.

    Examples:

    • User information or shadow accounts transferred or created as a result of an OpenID or OAuth login
    • User settings
    • User contact details, friend lists, or identifiers

For the open web to work, both clearly need to be allowed. At a very base level, though, I think that users need to be aware of implicit shares, in a clear, non-technical way. (OpenID and OAuth both allow the user to grant and revoke access to functionality, but they don’t control what happens to the data when access is granted once, which is likely to be kept.) They also need to provide a facility for reliably controlling this data. Just as I can Creative Commons license a photograph and allow it to be shared while restricting anyone’s ability to use it for commercial gain, I need to be able to say that services can only use my data for a limited time, or for limited purposes. I’m not calling for DRM, but rather a published best practice that services would adhere to and publicly declare their allegiance to.

Without this, the usefulness of the open web will be limited to certain kinds of use cases – which is a shame, because if it’s allowed to reach its full potential, it could provide a new kind of social computing that will almost certainly change the world.

Next Page »