/plan9-contd

Opinion piece about plan9 in the context of the modern computing landscape

MIT LicenseMIT

Plan 9 - Continued

Plan 9 did much to bring the clunky and complex Unix environment into the modern era twenty years ago. It proved in a variety of ways that you can actually get more with less using an equal measure of simplicity and consistency. One example was the adoption of the file system as the common method of communication between processes, the operating system kernel and even the across networks. This helped to redefine the Plan 9 system as more than just a regular computer operating system into an operating system for an entire network of systems. Since the system is highly consistent it is easier to be a generalist because the behaviour and code of much of the system follows the same design and conventions from kernel to compiler to user applications.

Since the 1980’s a number of factors have changed or evolved since the original inception of Plan 9. In some cases the new developments are extensions of the ones that drove the original design. Computers are increasingly networked together to provide common services as a single logical entity, but increasingly those entities are consolidated into vast data centers distributed globally. Accesses to those systems is not always reliable or stable, such as over a mobile connection or even when moving between wifi hotspots with your laptop. Meanwhile, the computing power and storage has continued its steady expansion over the years.

These changes have occurred on top of an ancient software base (ie. Unix/Linux/BSD and Microsoft Windows) whose origins date back at least forty years. Those bases were so lacking and unappealing that large software stacks erupted in order to drive the current cloud revolution and build magnificent user interfaces. Instead of simple, small and consistent the picture looks much the opposite. It also means that your smart phone and smart watch have relic teletype handling code in them waiting for something to exploit it.

If the current computing landscape was built on outdated base imagine what could be accomplished by lifting the weights off with a relatively modern alternative. Further adaptations and modernization are still be required, but certainly far less than the effort spent and continues to be spent on the legacy systems. In contrast, imagine how much more limiting and costly it would have been if we had continued to push MS-DOS and Multics forward.

Let's consider how the computing environment has changed since Plan 9 was under active development and discuss how it can adapt to the new circumstances while retaining the core benefits of the system. It may still have a unique role to play.

Filesystems and unstable connections

Within the confines of corporate networks of the 1980's network connections were relatively stable. Machines were rarely disconnected as they had physical connections to the network. They rarely moved or changed IP addresses because they were expensive pieces of equipment with physical security policies limiting their movement. Since workstations were shared resources they rarely changed IP address or at least their host names were stable enough to be relied on. This allowed persistent connections to be made for the most part with little disruption.

As a result, Plan 9 doesn't handle stale network connections particularly well in its core because it assumes a level of stability. The current world of wireless access points, dynamic IP address assignment and user-serviced equipment makes things much more dynamic. The web has solved this problem through a stateless protocol, HTTP, and the use of client-side caches. Using HTTP or porting the current web plumbing, such as web browsers, API's is antithetical to the simple and consistent underpinnings of Plan 9. It would add another set of incompatible tools to the base system and adds another layer of complexity for the rest of the system to work around. Existing web support in the system is often confusing to new users since it is not feature complete and also inconsistent with the rest of the system inhibiting reuse and learning.

Can the 9P protocol, which underpins the Plan 9 filesystem, be used or adapted somehow to work under these conditions? It is both stateful and relies on a persistent connection with timeouts in case of failure. One could presumably live with the problem in a number of ways. Carefully mounting and unmounting remote file systems when needed is an option, but it would be easy to forget a filesystem mount in one of your windows or even accidentally bind some other file systems or directories on top of it.

Perhaps you can leave everything on your home network and (re)boot your device every time using a network boot from an image provided by your home network. This is similar to the historical Plan 9 topology with CPU servers, file servers and user terminals that boot except that the terminal can be on a remote network. If you build your profile and update it frequently then it could bring you to workable state quickly with all of your persisted state saved remotely. One trouble would be if you have situations where some resources are accessible while others are not. This is certainly a possible scenario if you consistently use a mix of services each with varying availability. Imagine a case where your home network is available but a cloud service that you use frequently is not. What happens if you don't have a home network and rely instead only on your mobile device and cloud services?

9P itself requires a constant stateful connection to a filesystem, but what if there was a 9P proxy that can connect, reconnect and cache information to a remote filesystem? It may be possible to give the filesystem the commands to establish a connection to an unreliable remote file system so that it can manage the connections. In cases of total loss of connection it can also cache information for read and queue up modifications to replay once a connection is re-established. All that is required on the user's part is to mount any remote filesystems with this mechanism. However, depending on the type of filesystem the coherence of the cache/queueing mechanism can be a problem.

Service discovery

Plan 9 was conceived in a more closed computing environment than today. Machines were often physically connected with statically assigned IP addresses on a stable network. The group were aware of the names of the CPU servers and file servers. Standard filesystem mounts were shared amongst the group. There was little need for discovery services to detect available computers on the network and query for their capabilities.

Computer networks are often much more ad-hoc, even within a large data centre. Home networks have computers that startup/shutdown/sleep at a variety of times depending on their type and usage patterns. Data centres are constantly swapping in and out computing equipment to increase their capacity and manage the lifecycle of the machines. In both cases, there is a need for more automated discovery of services to perform a task, whether it is storage, CPU, network or specific data. As devices become smaller, more numerous and more specialized this trend will only increase over time.

There are a variety of approaches to service discovery in the current computing environment ranging from simple DNS to services such as etcd or kubernetes. In Plan 9 there should be something equivalent to match the current and future need, except done in the usual pragmatic, simple and composable way.

One problem in service discovery is simply communicating with other devices on the network without knowing their host name or IP addresses. Probing IP address ranges to find other machines simply isn't practical since the address spaces can be prohibitively large, especially with IPv6 and scanning can take significant time whenever the user wants to run a task. DNS is appealing here, but it requires oversight and careful design to make it work. Data centres sometimes make use of this approach, but it is unclear how well it can work in ad-hoc home environments.

The IPv6 specification now requires that each interface have a link-local IP address and be capable of handling multicast traffic. There are different pre-allocated addresses devoted to all nodes on the network, routers or other specific types of services. Plan 9 can make use of these to automatically discover other nodes on the local network. A command-line tool or filesystem that provides the addresses of local nodes would help to discover other nodes. If those nodes are also Plan 9 systems then in theory 9P connections can be made with them enabling further discovery of their capabilities based on the available files provided that an authentication scheme can be found to support it.

Free software and Open Source

Linux proved that opening up the source code is the best way to gain widespread adoption. Open source code means that people can take full control of their own systems. It eliminates the need for all patches and fixes to come from a single provider allowing for more collaboration. Debugging problems becomes much easier when you have the original source code. Also, from a software engineering perspective it helps to improve confidence in the system when you can reason about how the system works by directly consulting the code. Any ugliness or complexity is quite transparent.

While the source code was likely available to Bell Labs engineers the business disallowed anyone outside from having access to it in the early days. It took many years for the system to be open sourced. Luckily, all source code to the system is now available so that everyone can benefit from it.

Networks within the machine

There is a growing need to carve up idle computing resources and run software with them. Sometimes that software was originally written assuming that it has full access to a computer. Other times the software can't be fully trusted to access the host system because it is written by a third party or the source code is not available. Current systems have developed a variety of tools to handle these cases such as virtual machines, containers and jails/sandboxes. The net effect is that a program appears to run in isolation within a virtual network of one physical machine. Multiple CPU core, hardware hyperthreading and vast RAM have helped to sustain the growth.

In the virtualization space things are quite complicated. There is little consistency between virtual machines, containers and jails in terms of how you deploy them. Also, consider that each major operating system (and some of the less common ones) have their own ways of handling each with their own caveats. The picture is large requiring a great deal of specialization and specialists in order to maintain systems.

Plan 9 was able to take a more general approach due to its simple and consistent design. Each process has a "namespace," which represents its view of the filesystem. The vast majority of interactions any process makes with the system is done through accessing files, including network connections. Plan 9 allows each process's namespace to be customized. Each process can also mount filesystems into their namespace visible only to it and its child processes. As a result, it is straight forward to mock or virtualize network connections and other inter process communication. Namespaces can also be constrained using pruning or permissions to restrict the areas of a system that a process can access. Finally, there is a special flag on rfork that allows you to prevent a process tree from breaking out of its namespace making it a complete jail within the constraints of the hardware and OS bugs.

While the Plan 9 approach covers what is currently considered containers and jails it has only recently begun to support virtual machines. One reason for this could be that the underlying system already supports a wide variety of process isolation and customization options as described above alleviating the need for full machine virtualization. Another could be that the idea of running truly untrusted or legacy programs on your machine is something that should be avoided. Recent hardware exploits have demonstrated the extreme difficulty in maintaining absolute isolation between processes. In the end it could be that legacy or untrusted applications should be physically isolated as much as possible making virtual machines unnecessary.

Automated testing

There is considerable emphasis in automated testing in modern software development compared to decades past. The emphasis was more on coding for correctness. Plan 9's coding style leans more towards making it readable through conventions and specific changes to the C language dialect so that problems are more apparent to someone reading it. While these are generally good practices mistakes still creep in and in some cases have severe consequences in areas such as security.

In order to combat mistakes in code there has been an emergence in unit testing frameworks for virtually all major programming languages to encourage their use to validate the correctness of code. Even Plan 9's spiritual successor language Go has the unit testing framework baked into the core library and tools. This is noticeably missing as a first class construct from the Plan 9 base system. It is possible to create a test target in your mkfile that would execute tests, but there is no established precedent or convention for it. Since Plan 9 C doesn't manage memory, there would be a need for some way to detect unsafe memory management while running the tests, like valgrind or tcmalloc.

Authentication

Historically, authentication was achieved by being assigned a user name for a system and creating a unique but memorable password that perhaps you change once every couple of months. This approach worked well when the number of systems you used was limited in number and those systems were often physically isolated or firewalled from the broad internet. This is no longer the case. The average user has accounts on a dizzying number of services making it difficult not to use the same password for multiple of them. There are many high profile cases of password list theft. Attempts to unify authentication systems have either failed to gain traction or are provided by some of the large technology companies in an effort to lock users into their technology. Password vault systems are popular but are single points of failure for future hacks.

In an effort to unify access to a network of machines Plan 9 attempted to provide a single sign-on technology called factotum. Users of the system can authenticate once against a trusted server, which issues an access token that can be used for any 9P connection made in the network. To access foreign resources, factotum can be configured to store additional credentials and access tokens available to you once you authenticate. The service was built directly into the core of the operating system making it largely transparent to the end user.

This approach requires that every system that the user authenticates be capable of knowing and trusting the user's authentication server to validate their identity. Passwords have been shown to be repeatedly misused leading to additional security problems. There must be a better way.

Instead of relying on authentication servers, Plan 9 could conceivably make use of hardware security modules and asymetric encryption and trust chains instead. If a 9P connection is established and the private key signed a server-generated token and the public key is trusted then the connection is authenticated as the trusted user. Ideally, the hardware security module would employ an additional security factor such as biometrics or a small personal identification number. Special purpose hardware could perform all encryption in the device itself as well as logging all data encrypted on behalf of the user and the list of trusted keys.

The changing hardware landscape

Computers used to be expensive pieces of equipment and so they were carefully managed, even given special names that people would remember. Now there are machines that are so cheap that they can be easily discarded and replaced. Home networks are full of names like "PS3," "Sarah's MacBook Pro" and "Kitchen Chromebook." Large data centres use names that better reflect the box's physical location to help with maintenance.

The rise of cheap machines has been made possible in part by economies of scale and improved manufacturing processes. What made them even cheaper is major technology companies selling cheap machines at a loss in order to fuel the remainder of their business and increase vendor lock-in. Smartphones are a great example of this but not the only one. As a result, what are otherwise general purpose computers have become special purpose in the sense that the hardware only works well with the intended software stack and use. The device drivers are written for only one particular OS (often Linux), sometimes as binary blobs that are difficult to decode and for hardware that doesn't have any good documentation to allow alternative drivers to be written. The same is true of the firmware.

You have tons of cheap machines everywhere but with different limitations on what you can do with them. This creates an environment where it is difficult for an alternative to establish itself, such as Plan 9 because it is an uphill battle every time. Interestingly, this sort of problem was solved quite recently for a previously underdog company, Apple.

Apple wanted very much to provide a fully integrated, yet capable system to compete against the dominant Windows platform. One way that they accomplished this was to intentionally limit the hardware configurations that they support, but make those configurations of a good quality supporting the kinds of workloads that users would want. As a result, the software no longer had to compete with Windows on a level playing field. Instead, it could make assumptions and dramatically limit the development and testing effort required.

In the way of analogy Plan 9 could attempt a similar approach. Support a limited set of hardware configurations, but pick ones that are sufficiently affordable, open, capable and of a good quality. This helps to alleviate pressure on the system to support vast arrays of hardware. Such pressures can force a system to become overly complicated, Windows and Linux being good examples of this. Since many computers today are systems on a chip (SoC) with on-board processor, memory, buses, video and audio capabilities there is already less variety in the configurations than what was possible with the PC's of decades past.

Support for standard sets of hardware also helps to improves the user experience. Tutorials for new users can be much more focused and reproducible. Advanced users are more able to share relevant information and experience with others.

Picking the standard hardware for Plan 9 will be certainly be a challenge. A suggestion would be to pick one or two variants for each class of server, desktop and mobile. Server grade equipment would support better performance and extensibility, such as memory and storage. Also, they would be capable of being integrating into high speed networks and support long uptimes. In the original Plan 9 parlance these would become your CPU servers and/or file servers. In modern home networking terminology this could be your internet router as well as the persistent server for your home.

For desktop the need for extensibility and performance is less but there should be better graphics capabilities and various input devices. This fits roughly the original idea of the Plan 9 terminal. In a modern context, these devices should probably be capable of running on any network, whether home or remote. It is possible that certain immovable desktop devices, such as ones connected to a television could benefit from network boot so that they update their OS automatically on (re)boot without additional storage.

Mobile devices are a new and interesting category. These are truly general purpose hardware requiring a little of everything serving sometimes the need of a desktop plus additional capabilities such as photo capture, audio capture and location tracking. The form factor is small and has limited power. Mobile devices can also fit roughly into the Plan 9 concept of a terminal, but more like a power efficient and integrated version of a desktop system mentioned above. Ideally, a mobile device can also be used as a low end desktop system by plugging in power and peripherals such as a keyboard, mouse and display.

These categories should be sufficient to handle most aspects of computing that users expect today. There is another popular category of hardware, which is the embedded device. They often trade off some of the features needed to run a multi-user concurrent OS in favour of even smaller size and power requirements. While Plan 9 could not reasonably support them there is a need for it to interoperate. Popular embedded communications methods such as SPI, I2C, Bluetooth and Zigbee would be needed for Plan 9 to be able to interact. Also, to develop software for these platforms there should be some embedded language (eg. C), compiler and interface to program the devices.

A variant of the embedded device is software programmable processors, such as FPGA (Frame Programmable Gate Array) that allow one to design and run special purpose hardware. While the power requirements of such devices is higher than most mobile devices it permits the development of custom hardware to perform specialized computation. Plan 9 could support programming some of these devices using an HDL (Hardware Definition Language) along with the toolchain to program them. Again, support for one or two devices would be sufficient to unlock this area.

The same kind of minimalist but broad category support can also be applied to buses (USB, thunderbolt, PCIe, ...), graphics (Intel, Nvidia, ...) and other areas of computers. The key is to pick the subset to support out of the box and encourage the use of that subset unless there are good reasons to deviate. Picking the subset will be challenging but could be very rewarding in the long run. Hopefully, the open hardware movements, such as RISC-V and the increasing library of freely available hardware designs will help make it easier to find open, cheap and well built hardware to support. The prices remain higher for these items but perhaps that will change over over the next few years if there is sufficient demand.

Inseparability of computer systems, business, culture and society

It used to be that computers were niche aspects of society helping to drive military, science, mathematics and business. This was still true even twenty years ago. At the present time it is difficult to identify any aspect of human society that hasn't been impacted by computers. In fact, most of humanity is now highly dependent on them for their daily lives, whether it is keeping up with news, communicating with others, paying bills or even voting. Apple computer started as a company building computers and is now involved in the music recording business as well as making other content, such as movies. Google is an advertising distributor as well as a search engine and cloud provider. Warfare is increasingly focused on attacking other nations' critical computer systems that control their infrastructure. Issues within the realm of computer systems are therefore becoming issues for everybody, whether they realize it or not.

Modern computer systems are very highly complicated systems rivaling the most complex mechanical or electronic systems made in the past. They are also extremely opaque in that most users have very little idea of what goes on inside. Even computer experts tend to be highly specialized making it difficult for them to know how other areas of the systems work. The complications have been hidden by amazing advances in hardware performance and capacity as equipment becomes smaller, faster and able to store more data. Also, major technology companies have developed monstrous business models to fuel the development and maintenance of the technology that would otherwise be difficult to bear by any public or academic efforts. As a result, computer systems and are often distorted to suit one business model or another and none are interested in truly lowering the bar to enabling users to truly understand what goes on inside.

Since users are now very much at the mercy of the technology they should be afforded more control over it and by extension their own lives. In order for this to happen the technology needs to be open and approachable as much as it can be. The source code and design should be open for inspection. They should be simple enough that generalists can understand them if they want. From an OS perspective Plan 9 is well positioned to satisfy this since it was designed to be open, simple and consistent from the beginning while doing all of what a modern operating system should be capable of doing. The interfaces to the various subsystems are centered around files and filesystems. Once you've understood those concepts you unlock access to most of the system. The code is written in a similar style and using the same libraries throughout from compilers, to kernels and even user programs.

An operating system is not enough though. It consists only a portion of any computer system, albeit an important portion. It does however have the potential to establish and drive certain standards that can apply to the applications and services written on top. Arguably, it's model of simplicity and consistency might be applied to the hardware level as well. This amounts to a kind of network effect that with sufficient momentum can drive significant change.

The business side is often overlooked by computer experts, but is equally important to the technology. Software requires effort to improve and maintain. People need to be paid directly or indirectly for their efforts. If users do not pay a sufficient amount to cover the expenses of maintaining the systems then businesses rely on techniques to extract value from their users, such as violating their privacy, peddling influence or firmly establishing establishing various kinds of vendor lock-in or monopoly. One way to deal with this problem is to lower the cost of maintaining the system by simplifying it.

Another way to better align business with users interest is to create direct revenue streams. It may be possible to employ a subscription model of some form. It could be bundled into internet access, news media and content production since those other industries are in a similar situation with problems accessing the revenue needed to continue their efforts ethically and sustainably.

Efforts to help with the business models could also come from public resources as well. As mentioned previously, the technology has a profound influence on people's lives and it is in the public's interest for it to be managed ethically.

Do you have any issues with this document? If so, you can raise them here github.com/sirnewton01/plan9-contd/issues