According to Wikipedia, Alan Kay introduced the desktop metaphor in 1970 when he was working at Xerox PARC. I think it has served us well because it has allowed novice computer users to approach computers in ways that were familiar to them using old tools: trash cans, file folders, filing cabinets, and pieces of paper. Programs themselves occupied the screen real estate in the same way a piece of paper occupied the space on a desk; programs appeared in windows that could be moved around and could overlap, just as their tree-based counterparts. Although we’ve all worked in offices that had all those things, I’d venture a guess that no one under 40 today has ever worked in an office without a computer. I assert that it’s time to retire the desktop metaphor in modern computing.
In some ways, this is already happening, but not in the way I’d like. Every major operating system is undergoing major changes to acknowledge the growing prevalence of touch-based devices like smartphones and tablets. I think those devices are nice, but in many cases they are little more than socially acceptable ways for adults to carry around the modern equivalent of a Gameboy. Nevertheless, the major companies that shape the user experience on modern computing devices are focusing on building user interfaces that are consistent across touch devices and laptop/desktop experiences. Apple, Microsoft and Canonical (makers of Ubuntu) are the companies that immediately jump to mind. I’m excluding Google because they don’t seem to be pushing very hard to turn Android into a desktop operating system, though products like the Asus Transformer are certainly going in that direction. Also, Google develops at least three operating systems that I know of: their internal server OS, Android, and Chromium, which is a decidedly laptop-driven OS for the time being. In that respect, Google may be adopting a wise approach that doesn’t attempt to have one operating system suit both keyboard-based and touch-based computing needs.
I’m unconvinced that we’ve developed anything more effective than the keyboard for general-purpose computing, despite the inroads made in the last six or so years with touch-based computing. I believe it is worthwhile to mention unconventional (or unpopular) ways of organizing user interfaces that seem to be more powerful and more efficient for the accomplished user. After all, we all will end up user computers enough to make an investment of a few hours up front worthwhile in the long run. Why not optimize for the experienced user?
One problem is that all modern operating systems, whether mobile or on the desktop, tightly couple programs and documents to ever-present user interface elements. At first, this approach is appealing, because it gives new users visual indications of how to access the resources of the computer. But after some thought, it becomes clear that the approach doesn’t scale. The memory of our computers is growing much more quickly than our screen real estate, and screen real-estate has to be reserved for each element of the computer’s memory the user wants ready access to. We run more programs, keep more tabs in our browsers open, leading to more documents being written and referenced at once. This leads to task bars, docks, tab bars and alt-tab switchers that are jam packed with entries, all begging for the user’s attention all the time. The clutter has become so prominent that there is a new wave of “distraction free” apps that do little more than force all the clutter out of the way so that users can reclaim their attention (and screen space!)
One of the most long-standing problems with user interfaces today is that users are left to manage window positioning and placement on their own. Essentially, windows pop up in arbitrary locations with arbitrary sizes, and users are left to manage the resulting mess by hand. In an age where mobile devices can listen to us talk and execute queries that provide us with answers to complex questions in realtime, these kinds of limitations are embarrassing. What’s the alternative?
Tiling window managers have been around for a long time, and there are lots of them. Windows 7 wisely adopted a limited form of tiling, in which users can automate the placement of windows quickly via mouse gestures or keyboard shortcuts. On the OS X side, there a more than a few tools to automate the placement of windows, most of which, like Windows 7, require user intervention to initiated the placement.
As is usual in such cases, open source has had these kinds of systems for years before they appeared on the commercial operating systems, leaving Linux with dozens of tiling window managers on X. Some ore simple tools like those found on Windows and OS X that allow the user to invoke commands to place windows, but others represent a revamp of the windowing system from the ground up, all but precluding the idea of “floating” windows entirely. In these more advanced implementations, the moment a window is created, it is sized and placed in a specific slot on the screen, and usually doesn’t even have borders. All window manipulation is handled through the use of keyboard bindings, freeing up the screen real-estate usually reserved for title bars, minimize buttons, maximize buttons and menu buttons user-relevant content. The end result is a highly efficient, clean user experience that completely eliminates the need for users to spend even one second dragging or resizing windows to see the data they want.
Why isn’t this approach used universally? I think it’s purely an issue of shell shock. If any commercial OS were to switch to such a system, the user backlash would be immense. The only viable option would be to provide a setting somewhere that could turn on this sort of functionality and users could adopt it as they liked. Hopefully, over the course of 10 or 20 years, the number of users would grow large enough that the old interface could be retired completely.
Interestingly, we’re seeing this already happening on mobile devices. There is no major mobile OS that puts the burden of window management onto the user. Rather every app runs full screen and without pesky window management borders. In essence, the operating system is handling all the window management for the user, and surprisingly, no one seemed to mind. It would be nice to achieve the same thing on the desktop.
Lack of tiling may be a major shortcoming in many environments, but poor handling of window management in multi-monitor setups is a plague that needs to be eradicated. The general problem description is that when a user adds a monitor to a system, it effectively doubles the number of places any given application window might reside. This is daunting for a user that has four virtual desktops suddenly double to eight when a new display is plugged in. Every operating system’s default window management system handles this situation poorly. I didn’t even realize how bad it was until I started using XMonad, which introduces a single new abstraction, and, in exchange, vastly simplifies the multi-monitor situation. So, how does it work?
The portal concept introduces a single new abstraction to window management. Namely, it introduces the notion of a portal. A portal can be thought of as an imaginary (or abstract) display: it is a rectangular array of pixels that displays a set of tiled windows. In the most basic case, it shows just one window, the background, and nothing else. In a more complex setup, it may show a few different application windows tiled in a particular way. But that’s it. A portal is just a virtual screen that can show windows.
A user’s setup can have any number of portals the user needs. On a laptop, for example, a user might define a four different portals: one for web browsing, one for chat, one for editing documents and one for their music application. At any given time, the laptop’s display would be showing any one of these four portals, and the user could switch among them as needed to get his or her work done.
Now, let’s suppose the user plugged in an external display. In a “normal” setup, suddenly each portal (sometimes called a “virtual desktop” or “space”) would suddenly have two parts to it: one would be the laptop’s display, and the other would be the external display. This would immediately result in four new empty screens appearing on the user’s desktop. If the user wanted to actually use the new display that was plugged in, he or she would have to stop whatever the current task was and start reorganizing windows. This failing is particularly agregious in public settings, when a user has to plug a laptop into a large screen or projector immediately prior to a presentation. Despite this system’s failings, it is used almost universally.
How can we do better?
The biggest problem with the design of the system is that when a new display is plugged in, it unexpectedly introduces a new abstraction: a virtual screen that spans multiple physical screens. This can be useful in the (relatively rare) case that a user really wants to have a single application windows span displays, but provides terrible usability for almost every other scenario. A portal-based system treats new displays as a simple extension of the single display system. Using a single display, a user simply choses which among a number of portals to be displayed on that display. When two physical screens, the user simply selects which portal to display on each screen.
This approach scales quite nicely, as it turns out. While managing three or four physical displays using the traditional system can become unwieldly almost immediately, using the a portal-based system provides an easy-to-understand abstraction for four or more outputs. Since the user is already managing four, six or even eight portals on a single-display system, it is fairly simple to apply those existing portals to multiple outputs.
This approach also eliminates the disorientation that comes with the traditional virtual desktop system.
So, why isn’t the tile-based and portal-based system more prevalent? I honestly don’t know. I can only hope that by discussing the design of these kinds of systems and increasing user awareness of them, power users will start to demand them. Hopefully companies will listen. In the meantime, anyone can have these features today on any Linux distribution. If you use Debian or Ubuntu, you can get started simply:
sudo apt-get install xmonad