/optimization-manual

Contains the source code examples described in the "Intel® 64 and IA-32 Architectures Optimization Reference Manual"

Primary LanguageAssemblyBSD Zero Clause License0BSD

Intel® 64 and IA-32 Architectures Optimization Reference Manual Code Samples

This repository contains buildable versions of the example source files in the Intel Optimization Manual available here (https://software.intel.com/en-us/articles/intel-sdm). Assembly source code is provided for GCC, Clang and MSVC, using the Intel syntax. Unit tests are also provided for each of the samples.

Building on Linux and macOS

To run the unit tests

  1. cd to the root folder of this project
  2. mkdir build
  3. cd build
  4. cmake ..
  5. make && make test

GCC 8.1 (or clang 12 on macOS) or higher is required to build the unit tests. However, many of the newer examples, e.g, those that use AMX or AVX-512 FP16 instructions, require newer versions of the compilers to build; GCC 12 or clang 14. No errors will be reported when building, but examples built with toolchains that do not support the instructions that they test will simple report an error when run and exit.

The unit tests are compiled with --march=haswell and so a fourth-generation Intel® Core™ (Haswell) CPU or later is required to run them. Tests that execute instructions not present on fourth-generation Intel® Core™ (Haswell) will be skipped if the CPU on which they are run does not support those instructions.

The code samples can also be compiled with clang:

  1. cd to the root folder of this project
  2. mkdir clang-build
  3. cd clang-build
  4. CC=clang CXX=clang++ cmake ..
  5. make && make test

Building on Windows

To run the tests on Windows machine- Dependency- Visual Studio 2022

  1. go to optimization repo on your local machine.
  2. mkdir bld
  3. cd bld
  4. (inside x64 Native tools command prompt) "cmake -G "Visual Studio 17 2022" .." => this will generate visual studio solution files. open optimization.sln file using visual studio.
  5. To Build- build "ALL_BUILD" project
  6. To Run tests- build "RUN_TESTS" project.

Building the Benchmarks

Benchmark code is supplied for some of the code samples. These benchmarks are built using Google's Benchmark project. If Benchmark is installed and discoverable by CMake, the benchmarks for the code samples will be automatically built when you type make.

In Windows, ensure you build the benchmark code with the same build type (Release/Debug) as Google's Benchmark to prevent debug level mismatch errors while linking.

CPU Requirements

The code samples assume that they are being run on a fourth-generation Intel® Core™ (Haswell) processor or later and do not perform runtime checks for the instructions that they use that are present in fourth-generation Intel® Core™ (Haswell), for example, FMA or AVX-2. Some of the code samples may then crash if they are run on a device that does not support these instructions.

The code samples do however check for post fourth-generation Intel® Core™ (Haswell) instruction sets such as AVX-512 and VNNI before running. Tests will skip if they detect that the post fourth-generation Intel® Core™ (Haswell) instructions they need are not present. Some of the newest examples use new instructions only found in seventh-generation Intel® Core™ (SkylakeX) or later processors. If you have an older CPU in your PC you may find that everything builds on your system but that some of the tests are skipped or crash (if you don't have AVX2) when run. In this case, to fully run the tests, you need to run them under the SDE.

https://software.intel.com/en-us/articles/intel-software-development-emulator

Code Sample Constraints

Many of the code samples in the Optimization Manual are code snippets. They contain the minimum amount of code needed to illustrate a particular concept that is discussed in the manual. The code samples typically make assumptions about the data they process. These assumptions are often not documented in the manual. They are however documented in this repository. Each code sample is implemented as a function and each of these functions is accompanied by a wrapper function that documents and enforces the assumptions of the code sample. For example, for two functions are defined for Chapter 18 example 22

void lookup128_novbmi(const uint8_t *in, uint8_t *dict, uint8_t *out,
		      size_t len);
bool lookup128_novbmi_check(const uint8_t *in, uint8_t *dict, uint8_t *out,
			    size_t len);

lookup128_novbmi corresponds to the code in the Optimization Manual and lookup128_novbmi_check is a wrapper function that checks the validity of its parameters and then calls lookup128_novbmi. The code for lookup128_novbmi_check is as follows.

bool lookup128_novbmi_check(const uint8_t *in, uint8_t *dict, uint8_t *out,
			    size_t len)
{
	/*
	 * in, dict and out must be non-NULL.  dict must contain at least 128
	 * bytes.
	 */

	if (!in || !dict || !out)
		return false;

	/*
	 * len must be > 0 and a multiple of 32.
	 */

	if (len == 0 || len % 32 != 0)
		return false;

	lookup128_novbmi(in, dict, out, len);

	return true;
}

Note how the input constraints are documented and, where possible, enforced.

Register usage

Assembly language code samples in the .s files, that are designed to be compiled by gcc or clang on Linux, contain almost exact copies of the code snippets that appear in the manual. The core of these functions use the same set of registers as used by the corresponding examples in the manual. Sometimes these code samples in the repository contain some additional setup code that ensures that the registers are set up in the way that the code snippets in the manual expect. This setup code is kept to a minimum by carefully choosing the order of the parameters in the prototypes for the code samples. This is why the ordering of the parameters may seem a bit weird and inconsistent from one example to the next. As the MASM versions of the code samples in the .asm files use the same prototypes as the samples in the .s files and as Windows has a different calling convention to Linux, large amounts of setup code would need to appear in the .asm files for the MASM versions of the code samples to use the same set of registers that are used by the code snippets in the manual and the .s files. Consequently, the MASM versions of the code samples, tend to use different sets of registers to keep the setup code to a minimum.