LINUX EXAMPLES for PERSISTENT MEMORY PROGRAMMING This is the top-level README for the linux-examples repository. PLEASE NOTE: THIS REPOSITORY IS PROBABLY NOT WHAT YOU WANT! These examples were created as a way to explore the programming issues that come up with persistent memory. Since these examples were first written, the NVM Library has been developed as a tested and documented way to support persistent memory. You want NVML instead of the files here (this repo is just here for historical interest). GO HERE: https://github.com/pmem/nvml OR READ MORE ABOUT PMEM PROGRAMMING HERE: http://pmem.io This repository contains some exploratory ideas, used to illustrate the challenges of Persistent Memory programming. The goal is to build up a collection of simple examples to help people understand how Persistent Memory is exposed to applications. These are NOT PRODUCTION-QUALITY examples. They are educational snippets of code to illustrate some basic concepts. See the REFERENCES section below for more mature frameworks. These examples are built on the Linux Persistent Memory API described in: LINUX_PMEM_API.txt PERSISTENT MEMORY Persistent Memory is an emerging technology providing a new tier between traditional storage and volatile memory with attributes from both. The two main qualities of Persistent Memory are that it appears byte-addressable to applications (allowing direct load/store access rather than the block-based access of traditional storage) and that it is persistent. The term Persistent Memory (PM), and the related term Storage Class Memory (SCM), may not have formal definitions that everyone agrees on, but these terms usually refer to memory that an application can access directly, without demand-paging blocks in and out of a storage device. The performance of PM is typically such that a processor load instruction can reasonably wait for data from PM (instead of context switching while a page fault is serviced). Therefore technologies like hard disks and NAND Flash are not considered fast enough by themselves. A PM product built on those technologies would have some additional hardware built in to maintain the performance expected by the processor during direct access. A number of Non-Volatile Memory (NVM) technologies emerging, like Phase Change Memory, Memristor, Spin-Toque, and others, may bring us into an era of very large PM capacities with performance more like DRAM than traditional storage. This is appealing for applications requiring access to large data sets, without the huge DRAM footprint and non-deterministic access times seen when demand-paging is used. Once you imagine applications storing information in PM, you quickly realize the need for a way to name regions of PM so that applications can re-connect to their data on start-up. Next, the need for a permission model and an administrative model for managing PM comes up (creating, deleting, backing it up, etc.). The industry direction for exposing PM to applications is for PM regions to be exposed via the native file APIs on an OS, or something very similar (see the SNIA information in the REFERENCES section below). That's the approach taken with the examples found here. An example of a Persistent-Memory-Aware file system that provides PM via native Linux file APIs is PMFS, available at: https://github.com/linux-pmfs So when PM is accessed by memory mapping files, how is that different from traditional memory mapped files that have been a feature of many operating systems for decades? If one writes a program that memory-maps a normal file using the Linux mmap(2) system call, accesses it with loads and stores, and makes changes durable using the Linux msync(2) system call, will that program work without modification on a PM file? Yes. But with a traditional file system, access to the memory-mapped file is actually paging data in and out of DRAM; mapping a file and changing a single byte involves a page fault (including a context switch or two), where a 4k block is read from storage into memory and a subsequent write of that 4k block to make the change durable. With PM, it is possible to change that byte directly from the program, without waiting for page faults or using any DRAM buffers. On PM the impact of changing a byte of data is likely much less than a page (probably more like a cache-line read so the change can happen in the processor cache, followed by a cache-line flush to make the change durable). EXAMPLES Each example occupies a sub-directory here and the README in each sub-directory tells you why the example was written. The README also described how to run the example and what unit tests are available. At this top level, "make all" builds every example, "make clean" clears out intermediate files for all examples, and "make clobber" removes anything that is recreatable for all examples. Since these are for educational use, no attempt has been made to optimize them for performance, or even make then run on anything but Linux x86_64 (they were all tested on a 64-bit Debian-based distro running the 3.2 kernel and gcc 4.7.2 initially). Most of the examples are not MT-safe (unless making something MT-safe was part of the example). All the examples are written in C and depend on nothing but libc and the code in this repo. This is to keep things simple and illustrate the low-level primitives for Persistent Memory. Hopefully over time some higher-level language support will become commonplace and we'll add additional examples as they become available. See the TODO section below for some ideas if you're looking to contribute. As mentioned in the previous section, calling msync(2) to make stores to PM durable will work but since PM is directly-accessible, there may be more optimal ways available with some PM products. For the purposes of these examples, a library, libpmem, has been provided and it includes this function which can be used instead of msync: pmem_persist(addr, len, flags) See libpmem/README for more details on the methods available and the assumptions each method makes about the platform. What you'll find here: LICENSE The BSD license under which this code is available. LINUX_PMEM_API.txt The Linux Persistent Memory API, which is primarily the decades-old memory-mapped file API (part of POSIX) with a few optional things added to it. FAQ.txt Frequently asked questions about Persistent Memory and about these examples. trivial A simple, self-contained, trivial example. If you don't know how to use mmap(2)/msync(2) or if you need a reminder how they are used, start here. It is only a small step above just typing "man mmap" but the example is explained, runnable, and includes a trivial test you can play with. basic A very basic example. It can store a few strings to PM and it can read the strings already stored there. This is the simplest example based on libpmem, so it is a good place to start to understand how that library is used. See basic/README for details on how to run & test this example, and how to use it with the "icount" fault injection test mechanism. libpmem This library provides an optional part of the Linux Persistent Memory API. It is "optional" because it is perfectly possible to use PM without this library (as shown by the "trivial" example described above). But with this library, PM programming is more convenient and some operations are optimized. If you're looking for examples of PM programming, reading this code won't help you -- look at examples like trivial, basic, and binarytree. libpmemalloc A simple PM malloc-like library. If you're looking for some basic ideas to get you going managing data structures in PM in a way that keeps them consistent across crashes, this might be a good place to start. libpmemalloc/README explains how the malloc-like library is used and tested, and the file libpmemalloc/design.txt explains the simple design that went into making it resilient against crashes. Don't forget to read about much more complete PM memory allocation solutions in the REFERENCES section below. binarytree Building on the libpmemalloc example, this is a small example of a binary tree used to count the frequency of variable length strings. util Some common routines for error reporting and debug prints. These have nothing to do with Persistent Memory. icount Routines to count instructions and simulate crashes. These are used for the fault injection runs where we run a test, count the instructions executed by the code snippet under test, and then re-run the test again and again, simulating a crash between every pair of instructions in the code snippet. See icount/README for details or you can see this is action by running "make allcounts" in the basic sub-directory. CODING CONVENTIONS Some of the examples store relative pointers in PM data structures so that those pointers remain valid even if the PM gets mapped into a different virtual address on subsequent runs (which is almost always the case). There are multiple discussions on how to deal with this in papers mentioned in REFERENCES below. For these examples, we take a very basic approach, making it the programmer's responsibility to keep track of which pointers are relative and which are absolute. In the examples, you often see pointers with a trailing underscore, like this: sp->root_ the convention is that the trailing underscore is a reminder to the programmer that you cannot dereference that pointer without first adding the "base" of the PM to it. The libpmemalloc example creates a macro to do this called PMEM(). Compilers like the Microsoft C++ compiler have a nifty "based pointer" mechanism so that the compiler largely takes care of this for the programmer. But gcc has no such feature, so we're forced to do it by hand. At least until languages begin to grow more mature PM programming features. Another small convention is that global variables have their first letter capitalized. Beyond that, the code is basically K&R-style. TESTING For many years, file system implementers have carefully designed their data structures to recover from unexpected system crashes (power failures, kernel crashes, hardware failures). This means designing in enough persistent information so that recovery after a crash can return the on-disk format to a consistent state. The emergence of Persistent Memory brings all that logic into application space. A traditional program might allocate memory like this: struct node *np; np = malloc(sizeof(*np)); /* ... fill in *np ... */ parent->next = np; With Persistent Memory, using a malloc-style interface as shown above would lead to a memory leak of the Persistent Memory between the point when malloc() removes the chunk from the free pool and the point where a persistent pointer is set to point at the newly-allocated memory. Many strategies for handling this issue are available, including logging and reference counting. But these algorithms can be difficult to test because it is difficult to simulate failures at the most vulnerable points in the algorithms. Most of the examples include test programs. To validate that PM-resident data structures are managed in a way that is resilient to crashes, the icount code is used. icount (described more in icount/README) provides a way to count the instructions executed in a test snippet of code, and then re-run that code again and again, simulating a crash between every possible pair of instructions in the test snippet. Run "make allcounts" in the basic sub-directory for a demo. Using the icount mechanism can be very, very time consuming. Note the difference between a "make allcounts" run for the basic example and a "make allcounts" run for the tree example. In the basic example, the program just stores to PM and the simulated crashes leave the PM in random states, showing partial strings and other garbage. Compare that to the tree example, which uses libpmemalloc to make sure things stay consistent across crashes. In that example, when a tree_insert() is interrupted by a crash, the string being inserted either makes it completely into the tree or it doesn't. And in the case where it doesn't, no Persistent Memory is leaked. That's because the tree example is built on the atomic mechanisms in libpmemalloc, and that's the point of these examples. CONTACTS If you'd like to contribute more Persistent Memory programming examples (and we really hope you will!), please fork this project on github, add your example, and send a pull request. For more information, questions or comments, contact: andy.rudoff@intel.com REFERENCES Some very interesting research has been happening in the area of Persistent Memory and more is emerging. While the examples provided here are meant as introductory and simple, some publications cover the topic in much more depth and include complete transaction systems, compiler/language enhancements, etc. Here are some of the most important publications in this space (please send links to more and I'll include them). - A Persistent Memory aware file system for Linux. Under development at: https://github.com/linux-pmfs - One of the most impressive bodies of work in this area is Mnemosyne. Read the paper: Haris Volos, Andres Jaan Tack, Michael M. Swift: Mnemosyne: Lightweight Persistent Memory, The 16th ACM Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2011), March 2011, Newport Beach, California. and explore the website, which includes full source and documentation: http://research.cs.wisc.edu/sonar/projects/mnemosyne/ - Another very relevant project is NV-Heaps. Read the paper: J. Coburn, et al.:NV-Heaps: Making Persistent Objects Fast and Safe with Next Generation, Non-Volatile Memories, The 16th ACM Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS 2011), March 2011, Newport Beach, Ca. - The June 2013 issue of USENIX ;login: contains an article that discusses the NVM Programming Model, including a discussion of Persistent Memory. https://www.usenix.org/publications/login/june-2013-volume-38-number-3 - The NVM Programming Technical Work Group (TWG) in SNIA has over 35 companies working together on NVM programming models, including Persistent Memory. You can read more about the TWG here: http://www.snia.org/tech_activities/work/twgs and see a draft of their specification here: http://www.snia.org/tech_activities/publicreview - Viking Technology has an excellent FAQ on their NVDIMMs, providing an excellent overview of the platform requirements including a description of the ADR feature assumed by some of these example programs: http://www.vikingtechnology.com/nvdimm-faq - The basics of NVDIMM at Wikipedia, with lots of additional links: http://en.wikipedia.org/wiki/NVDIMM TODO Here are some ideas of examples we'd like to add to this collection next: - A transaction-based example, using an undo or redo log for consistency - An example using many files, perhaps with pointers between them - MT-safe versions of some the examples - Shared memory examples (multi-process sharing in addition to multi-thread) - An example of a Persistent cache built on PM - Bindings for other languages like Java, Python, etc. - Abstractions that fit into other languages (like Java Persistence API)