/GPBB

Graphics Programming Black Book by Michael Abrash.

Primary LanguageAssembly

GBPP

Graphics Programming Black Book by Michael Abrash.

This repository contains a mirrored copy of the book in its entirety, as well as the original source code written by the author himself. All rights are reserved by the author, Michael Abrash.

Table of Contents

  1. The Best Optimizer is between Your Ears
  • The Human Element of Code Optimization
  • Understanding High Performance
    • When Fast Isn't Fast
  • Rules for Building High-Performance Code
    • Know Where You're Going
    • Make a Big Map
    • Make Lots of Little Maps
    • Know the Territory
    • Know When It Matters
    • Always Consider the Alternatives
    • Know How to Turn On the Juice
  1. A World Apart
  • The Unique Nature of Assembly Language Optimization
  • Instructions: The Individual versus the Collective
  • Assembly Is Fundamentally Different
    • Transformation Inefficiencies
    • Self-Reliance
    • Knowledge
  • The Flexible Mind
    • Where to Begin?
  1. Assume Nothing
  • Understanding and Using the Zen Timer
  • The Costs of Ignorance
  • The Zen Timer
    • The Zen Timer Is a Means, Not an End
    • Starting the Zen Timer
  • Time and the PC
  • Stopping the Zen Timer
  • Reporting Timing Results
  • Notes on the Zen Timer
  • A Sample Use of the Zen Timer
  • The Long-Period Zen Timer
    • Stopping the Clock
  • Example Use of the Long-Period Zen Timer
  • Using the Zen Timer from C
    • Watch Out for Optimizing Assemblers!
    • Further Reading
    • Armed with the Zen Timq Onward and Upward
  1. In the Lair of the Cycle-Eaters
  • How the PC Hardware Devours Code Performance
  • Cycle-Eaters
  • The Nature of Cycle-Eaters
    • The 8088's Ancestral Cycle-Eaters
  • The 8-Bit Bus Cycle-Eater
    • The Impact of the 8-Bit Bus Cycle-Eater
    • What to Do about the 8-Bit Bus Cycle-Eater?
  • The Prefetch Queue Cycle-Eater*
    • Official Execution Times Are Only Part of the Story
    • There Is No Such Beast as a True Instruction Execution Time
    • Approximating Overall Execution Times
    • What to Do about the Prefetch Queue Cycle-Eater?
    • Holding Up the 8088
  • Dynamic RAM Refresh: The Invisible Hand
    • How DRAM Refresh Works in the PC
    • The Impact of DRAM Refresh
    • What to Do About the DRAM Refresh Cycle-Eater?
  • Wait States
  • The Display Adapter Cycle-Eater
    • The Impact of the Display Adapter Cycle-Eater
    • What to Do about the Display Adapter Cycle-Eater?
    • Cycle-Eaters: A Summary
    • What Does It All Mean?
  1. Crossing the Border
  • Searching Files with Restartable Blocks
    • Searching for Text
  • Avoiding the String Trap
  • Brute-Force Techniques
  • Using memchr()
    • Making a Search Restartable
  • Interpreting Where the Cycles Go
    • Knowing When Assembly Is Pointless
  • Always Look Where Execution Is Going
  1. Looking Past Face Value
  • How Machine Instructions May Do More Than You Think
    • Memory Addressing and Arithmetic
  • Math via Memory Addressing
    • The Wonders of LEA on the 386
  • Multiplication with LEA Using Non-Powers of Two
  1. Local Optimization
  • Optimizing Halfway between Algorithms and Cycle Counting
    • When LOOP Is a Bad Idea -The Lessons of LOOP and JCXZ
    • Avoiding LOOPS of Any Stripe
  • Local Optimization
  • Unrolling Loops
    • Rotating and Shifting with Tables
    • NOT Flips Bits -- Not Flags
    • Incrementing with and without Carry
  1. Speeding Up C with Assembly Language
  • Jumping Languages When You Know It'll Help
    • Billy, Don't Be a Compiler
  • Don't Call Your Functions on Me, Baby
  • Stack Frames Slow So Much
  • Torn Between Two Segments
    • Why Speeding Up Is Hard to Do
  • Taking It to the Limit
    • A C-to-Assembly Case Study
  1. Hints My Readers Gave Me
  • Optimization Odds and Ends from the Field
    • Another Look at LEA
    • The Kennedy Portfolio
    • Speeding Up Multiplication
    • Optimizing Optimized Searching
    • Short Sorts
    • Full 32-Bit Division
    • Sweet Spot Revisited
    • Hard-core Cycle Counting
    • Hardwired Far Jumps
    • Setting 32-Bit Registers: Time versus Space
  1. Patient Coding, Faster Code
  • How Working Quickly Can Bring Execution to a Crawl
    • The Case for Delayed Gratification
  • The Brute-Force Syndrome
    • Wasted Breakthroughs
  • Recursion
    • Patient Optimization
  1. Pushing the 286 and 386
  • New Registers, NewInstructions, New Timings, New Complications
  • Family Matters
  • Crossing the Gulf to the 286 and the 386
  • In the Lair of the Cycle-Eaters, Part II
    • System Wait States
    • Data Alignment
  • Code Alignment
    • Alignment and the 386
    • Alignment and the Stack
    • The DRAM Refresh Cycle-Eater: Still an Act of God
    • The Display Adapter Cycle-Eater
  • New Instructions and Features: The 286
  • New Instructions and Features: The 386
    • Optimization Rules: The More Things Change...
    • Detailed Optimization
  • popf and the 286
  1. Pushing the 486
  • It's Not Just a Bigger 386
    • Enter the 486
  • Rules to Optimize By
    • The Hazards of Indexed Addressing
    • Calculate Memory Pointers Ahead of Time
  • Caveat Programmor
    • Stack Addressing and Address Pipelining
    • Problems with Byte Registers
    • More Fun with Byte Registers
    • Timing Your Own 486 Code
  • The Story Continues
  1. Aiming the 486
  • Pipelines and Other Hazards of the High End
    • 486 Pipeline Optimization
  • BSWAP: More Useful Than You Might Think
  • Pushing and Popping Memory
  • Optimal 1-Bit Shifts and Rotates
  • 32-Bit Addressing Modes
  1. Boyer-Moore String Searching
  • Optimizing a Pretty Optimum Search Algorithm
  • String Searching Refresher
  • The Boyer-Moore Algorithm
  • Boyer-Moore: The Good and the Bad
  • Further Optimization of Boyer-Moore
    • Know What You Know
  1. Linked Lists and Unintended Challenges
  • Unfamiliar Problems with Familiar Data Structures
  • Linked Lists
  • Dummies and Sentinels
  • Circular Lists
  • Hi/Lo in 24 Bytes
  1. There Ain't No Such Thing as the Fastest Code
  • Lessons Learned in the Pursuit of the Ultimate Word Counter
  • Counting Words in a Hurry
    • Which Way to Go from Here?
  • Challenges and Hazards
    • Blinding Yourself to a Better Approach
    • Watch Out for Luggable Assumptions!
  • The Astonishment of Right-Brain Optimization
  • Levels of Optimization
    • Optimization Level 1: Good Code
  • Level 2: A New Perspective
    • Level 3: Breakthrough
    • Enough Word Counting Already!
  1. The Game of Life
  • The Triumph of Algorithmic Optimization in a Cellular Automata Game
  • Conway's Game
    • The Rules of the Game
  • Where Does the Time Go?
  • The Hazards and Advantages of Abstraction
  • Heavy-Duty C++ Optimization
  • Bringing In the Right Brain
    • Re-Examining the Task
    • Acting on What We Know
    • The Challenge That Ate My Life
  1. It's a Wonderful Life
  • Optimization beyond the Pale
  • Breaking the Rules
  • Table-Driven Magic
  • Keeping Track of Change with a Change List
    • A Layperson's Overview of QLIFE
  1. Pentium: Not the Same Old Song
  • Learning a Whole Different Set of Optimization Rules
  • The Return of Optimization as Art
  • The Pentium: An Overview
    • Crossing Cache Lines
    • Cache Organization
  • Faster Addressing and More
  • Branch Prediction
  • Miscellaneous Pentium Topics
    • 486 versus Pentium Optimization
    • Going Superscalar
  1. Pentium Rules
  • How Your Carbon-Based Optimizer Can Put the "Super" in Superscalar
  • An Instruction in Every Pipe
  • V-Pipe-Capable Instructions
  • Lockstep Execution
  • Superscalar Notes
    • Register Starvation
  1. Unleashing the Pentium's V-pipe
  • Focusing on Keeping Both Pentium Pipes Full
  • Address Generation Interlocks
  • Register Contention
    • Exceptions to Register Contention
  • Who's in First?
  • Pentium Optimization in Action
    • A Quick Note on the 386 and 486
  1. Zenning and the Flexible Mind
  • Taking a Spin through What You've Learned
  • Zenning
  1. Bones and Sinew
  • At the Very Heart of Standard PC Graphics
  • The VGA
  • An Introduction to VGA Programming
  • At the Core
    • Linear Planes and True VGA Modes
    • Smooth Panning
    • Color Plane Manipulation
    • Page Flipping
  • The Hazards of VGA Clones
  • Just the Beginning
  • The Macro Assembler
  1. Parallel Processing with the VGA
  • Taking on Graphics Memory Four Bytes at a Time
  • VGA Programming: ALUs and Latches
  • Notes on the ALU/Latch Demo Program
  1. VGA Data Machinery
  • The Barrel Shifter, Bit Mask, and Set/Reset Mechanisms
  • VGA Data Rotation
  • The BitMask
  • The VGA's Set/Reset Circuitry
    • Setting All Planes to a Single Color
    • Manipulating Planes Individually
  • Notes on Set/Reset
  • A Brief Note on Word OUTs
  1. VGA Write Mode 3
  • The Write Mode That Grows on You
  • A Mode Born in Strangeness
  • A Note on Preserving Register Bits
  1. Yet Another VGA Write Mode
  • Write Mode 2, Chunky Bitmaps, and Text-Graphics Coexistence
  • Write Mode 2 and Set/Reset
    • A Byte's Progress in Write Mode 2
    • Copying Chunky Bitmaps to VGA Memory Using Write Mode 2
    • Drawing Color-Patterned Lines Using Write Mode 2
  • When to Use Write Mode 2 and When to Use Set/Reset
  • Mode 13H--320x200 with 256 Colors
  • Flipping Pages from Text to Graphics and Back
  1. Reading VGA Memory
  • Read Modes 0 and 1, and the Color Don't Care Register
  • Read Mode 0
  • Read Mode 1
  • When all Planes "Don't Care"
  1. Saving Screens and Other VGA Mysteries
  • Useful Nuggets from the VGA Zen File
  • Saving and Restoring EGA and VGA Screens
  • 16 Colors out of 64
  • Overscan
  • A Bonus Blanker
  • Modifying VGA Registers
  1. Video Est Omnis Divisa
  • The Joys and Galling Problems of Using Split Screens on the EGA and VGA
  • How the Split Screen Works
    • The Split Screen in Action
    • VGA and EGA Split-Screen Operation Don't Mix
  • Setting the Split-Screen-Related Registers
  • The Problem with the EGA Split Screen
  • Split Screen and Panning
    • The Split Screen and Horizontal Panning: An Example
  • Notes on Setting and Reading Registers
  • Split Screens in Other Modes
  • How Safe?
  1. Higher 256-Color Resolution on the VGA
  • When Is 320x200 Really 320x400?
  • Why 320x200? Only IBM Knows for Sure
  • 320x400 256-Color Mode
    • Display Memory Organization in 320x400 Mode
    • Reading and Writing Pixels
  • Two 256-Color Pages
  • Something to Think About
  1. Be It Resolved: 360x480
  • Taking 256-Color Modes About as Far as the Standard VGA Can Take Them
  • Extended 256-Color Modes: What's Not to Like?
  • 360x480 256-Color Mode
  • How 360x480 256-Color Mode Works
    • 480 Scan Lines per Screen: A Little Slower, But No Big Deal
    • 360 Pixels per Scan Line: No Mean Feat
    • Accessing Display Memory in 360x480 256-Color Mode
  1. Yogi Bear and Eurythmics Confront VGA Colors
  • The Basics of VGA Color Generation
  • VGA Color Basics
    • The Palette RAM
    • The DAC
    • Color Paging with the Color Select Register
    • 256-color Mode
    • Setting the Palette RAM
    • Setting the DAC
  • If You Can't Call the BIOS, Who Ya Gonna Call?
  • An Example of Setting the DAC
  1. Changing Colors without Writing Pixels
  • Special Effects through Realtime Manipulation of DAC Colors
  • Color Cycling
  • The Heart of the Problem
    • Loading the DAC via the BIOS
    • Loading the DAC Directly
  • A Test Program for Color Cycling
  • Color Cycling Approaches that Work
  • Odds and Ends
    • The DAC Mask
    • Reading the DAC
    • Cycling Down
  1. Bresenham Is Fast, and Fast Is Good
  • Implementing and Optimizing Bresenham's Line-Drawing Algorithm
  • The Task at Hand
  • Bresenham's Line-Drawing Algorithm
    • Strengths and Weaknesses
  • An Implementation in C
    • Looking at EVGALine
    • Drawing Each Line
    • Drawing Each Pixel
  • Comments on the C Implementation
  • Bresenham's Algorithm in Assembly
  1. The Good, the Bad, and the Run-Sliced
  • Faster Bresenham Lines with Run-Length Slice Line Drawing
  • Run-Length Slice Fundamentals
  • Run-Length Slice Implementation
  • Run-Length Slice Details
  1. Dead Cats and Lightning Lines
  • Optimizing Run-Length Slice Line Drawing in a Major Way
  • Fast Run-Length Slice Line Drawing
    • How Fast Is Fast?
    • Further Optimizations
  1. The Polygon Primeval
  • Drawing Polygons Efficiently and Quickly
  • Filled Polygons
    • Which Side Is Inside?
  • How Do You Fit Polygons Together?
  • Filling Non-Overlapping Convex Polygons
  • Oddball Cases
  1. Fast Convex Polygons
  • Filling Polygons in a Hurry
  • Fast Convex Polygon Filling
    • Fast Drawing
    • Fast Edge Tracing
  • The Finishing Touch: Assembly Language
    • Maximizing REP STOS
  • Faster Edge Tracing
  1. Of Songs, Taxes, and the Simplicity of Complex Polygons
  • Dealing with Irregular Polygonal Areas
  • Filling Arbitrary Polygons
    • Active Edges
  • Complex Polygon Filling: An Implementation
    • More on Active Edges
    • Performance Considerations
  • Nonconvex Polygons
    • Details, Details
  1. Those Way-Down Polygon Nomenclature Blues
  • Names Do Matter when You Conceptualize a Data Structure
  • Nomenclature in Action
  1. Wu'ed in Haste; Fried, Stewed at Leisure
  • Fast Antialiased Lines Using Wu's Algorithm
  • Wu Antialiasing
  • Tracing and Intensity in One
  • Sample Wu Antialiasing
    • Notes on Wu Antialiasing
  1. Bit-Plane Animation
  • A Simple and Extremely Fast Animation Method for Limited Color
  • Bit-Planes: The Basics
    • Stacking the Palette Regsters
  • Bit-Plane Animation in Action
  • Limitations of Bit-Plane Animation
  • Shearing and Page Flipping
  • Beating the Odds in the Jaw-Dropping Contest
  1. Split Screens Save the Page-Flipped Day
  • 640x480 Page Flipped Animation in 64K...Almost
  • A Plethora of Challenges
  • A Page Flipping Animation Demonstration
    • Write Mode 3
    • Drawing Text
    • Page Flipping
    • Knowing When to Flip
  • Enter the Split Screen
  1. Dog Hair and Dirty Rectangles
  • Different Angles on Animation
  • Plus Ça Change
  • VGA Access Times
  • Dirty-Rectangle Animation
    • So Why Not Use Page Flipping?
  • Dirty Rectangles in Action
  • Hi-Res VGA Page Flipping
  • Another Interesting Twist on Page Flipping
  1. Who Was that Masked Image?
  • Optimizing Dirty-Rectangle Animation
  • Dirty-Rectangle Animation, Continued
  • Masked Images
  • Internal Animation
    • Dirty-Rectangle Management
  • Drawing Order and Visual Quality
  1. Mode X: 256-Color VGA Magic
  • Introducing the VGA's Undocumented "Animation-Optimal" Mode
  • What Makes Mode X Special?
  • Selecting 320x240 256-Color Mode
  • Designing from a Mode X Perspective
  • Hardware Assist from an Unexpected Quarter
  1. Mode X Marks the Latch
  • The Internals of Animation's Best Video Display Mode
  • Allocating Memory in Mode X
  • Copying Pixel Blocks within Display Memory
    • Copying to Display Memory
  • Who Was that Masked Image Copier?
  1. Mode X 256-Color Animation
  • How to Make the VGA Really Get up and Dance
  • Masked Copying
    • Faster Masked Copying
    • Notes on Masked Copying
  • Animation
  • Mode X Animation in Action
  • Works Fast, Looks Great
  1. Adding a Dimension
  • 3-D Animation Using Mode X
  • References on 3-D Drawing
  • The 3-D Drawing Pipeline
    • Projection
    • Translation
    • Rotation
  • A Simple 3-D Example
    • Notes on the 3-D Animation Example
  • An Ongoing Journey
  1. Sneakers in Space
  • Using Backface Removal to Eliminate Hidden Surfaces
  • One-sided Polygons: Backface Removal
    • Backface Removal in Action
  • Incremental Transformation
  • A Note on Rounding Negative Numbers
  • Object Representation
  1. Fast 3-D Animation: Meet X-Sharp
  • The First Iteration of a Generalized 3-D Animation Package
  • This Chapter's Demo Program
  • A New Animation Framework: X-Sharp
  • Three Keys to Realtime Animation Performance
    • Drawbacks
    • Where the Time Goes
  1. Raw Speed and More
  • The Naked Truth About Speed in 3-D Animation
  • Raw Speed, Part 1: Assembly Language
  • Raw Speed, Part 11: Look it Up
    • Hidden Surfaces
    • Rounding
  • Having a Ball
  1. 3-D Shading
  • Putting Realistic Surfaces on Animated 3-D Objects
  • Support for Older Processors
  • Shading
    • Ambient Shading
    • Diffuse Shading
  • Shading: Implementation Details
  1. Color Modeling in 256-Color Mode
  • Pondering X-Sharp's Color Model in an RGB State of Mind
  • A Color Model
  • A Bonus from the BitMan
  1. Pooh and the Space Station
  • Using Fast Texture Mapping to Place Pooh on a Polygon
  • Principles of Quick-and-Dirty Texture Mapping
    • Mapping Textures Made Easy
    • Notes on DDA Texture Mapping
  • Fast Texture Mapping: An Implementation
  1. 10,000 Freshly Sheared Sheep on the Screen
  • The Critical Role of Experience in Implementing Fast, Smooth Texture Mapping
  • Visual Quality: A Black Hole ... Er, Art
  • Fixed-Point Arithmetic, Redux
  • Texture Mapping: Orientation Independence
  • Mapping Textures across Multiple Polygons
    • Fast Texture Mapping
  1. Heinlein's Crystal Ball, Spock's Brain, and the 9-Cycle Dare
  • Using the Whole-Brain Approach to Accelerate Texture Mapping
  • Texture Mapping Redux
    • Left-Brain Optimization
    • A 90-Degree Shift in Perspective
  • That's Nice--But it Sure as Heck Ain't 9 Cycles
    • Don't Stop Thinking about Those Cycles
  • Texture Mapping Notes
  1. The Idea of BSP Trees
  • What BSP Trees Are and How to Walk Them
  • BSP Trees
    • Visibility Determination
    • Limitations of BSP Trees
  • Building a BSP Tree
    • Visibility Ordering
  • Inorder Walks of BSP Trees
    • Know It Cold
    • Measure and Learn
  • Surfing Amidst the Trees
    • Related Reading
  1. Compiling BSP Trees
  • Taking BSP Trees from Concept to Reality
  • Compiling BSP Trees
    • Parametric Lines
    • Parametric Line Clipping
    • The BSP Compiler
  • Optimizing the BSP Tree
  • BSP Optimization: an Undiscovered Country
  1. Frames of Reference
  • The Fundamentals of the Math behind 3-D Graphics
    • 3-D Math
    • Foundation Definitions
  • The Dot Product
    • Dot Products of Unit Vectors
  • Cross Products and the Generation of Polygon Normals
  • Using the Sign of the Dot Product
  • Using the Dot Product for Projection
    • Rotation by Projection
  1. One Story, Two Rules, and a BSP Renderer
  • Taking a Compiled BSP Tree from Logical to Visual Reality
  • BSP-based Rendering
  • The Rendering Pipeline
    • Moving the Viewer
    • Transformation into Viewspace
    • Clipping
    • Projection to Screenspace
    • Walking the Tree, Backface Culling and Drawing
  • Notes on the BSP Renderer
  1. Floating-Point for Real-Time 3-D
  • Knowing When to Hurl Conventional Math Wisdom Out the Window
  • Not Your Father's Floating-point
  • Pentium Floating-point Optimization
    • Pipelining, Latency, and Throughput
    • FXCH
  • The Dot Product
  • The Cross Product
  • Transformation
  • Projection
  • Rounding Control
  • A Farewell to 3-D Fixed-point
  1. Quake's Visible-Surface Determination
  • The Challenge of Separating All Things Seen from All Things Unseen
  • VSD: The Toughest 3-D Challenge of All
  • The Structure of Quake Levels
  • Culling and Visible Surface Determination
    • Nodes Inside and Outside the View Frustum
  • Overdraw
  • The Beam Tree
  • 3-D Engine du Jour
    • Subdividing Raycast
    • Vertex-Free Surfaces
    • The Draw-Buffer
    • Span-Based Drawing
    • Portals
  • Breakthrough!
  • Simplify, and Keep on Trying New Things
  • Learn Now, Pay Forward
  • References
  1. 3-D Clipping and Other Thoughts
  • Determining What's Inside Your Field of View
  • 3-D Clipping Basics
    • Intersecting a Line Segment with a Plane
  • Polygon Clipping
    • Clipping to the Frustum
    • The Lessons of Listing 65.3
  • Advantages of Viewspace Clipping
  • Further Reading
  1. Quake's Hidden-Surface Removal
  • Struggling with Z-Order Solutions to the Hidden Surface Problem
  • Creative Flux and Hidden Surfaces
    • Drawing Moving Objects
    • Performance Impact
    • Leveling and Improving Performance
  • Sorted Spans
    • Edges versus Spans
  • Edge-Sorting Keys
    • Where That l/Z Equation Comes From
    • Quake and Z-Sorting
    • Decisions Deferred
  1. Sorted Spans in Action
  • Implementing Independent Span Sorting for Rendering without Overdraw
  • Quake and Sorted Spans
  • Types of l/z Span Sorting
    • Intersecting Span Sorting
    • Abutting Span Sorting
    • Independent Span Sorting
  • l/z Span Sorting in Action
    • Implementation Notes
  1. Quake's Lighting Model
  • A Radically Different Approach to Lighting Polygons
    • Problems with Gouraud Shading
    • Perspective Correctness
    • Decoupling Lighting from Rasterization
    • Size and Speed
    • Mipmapping To The Rescue
    • Two Final Notes on Surface Caching
  1. Surface Caching and Quake's Triangle Models
    • Letting the Graphics Card Build the Textures
    • The Light Map as Alpha Texture
    • Drawing Triangle Models Fast
    • Trading Subpixel Precision for Speed
    • An Idea that Didn't Work
    • An Idea that Did Work
    • More Ideas that Might Work
  1. Quake: A Post-Mortem and a Glimpse into the Future
    • Lighting
    • Dynamic Lighting
    • BSP Models
    • Polygon Models and ZBuffering
    • The Subdivision Rasterizer
    • Sprites
    • Particles
  • How We Spent Our Summer Vacation: After Shipping Quake
    • Verite Quake
    • GLQuake
    • WinQuake
    • Quakeworld
    • Quake 2
  • Afterword
  • Index