simias/rustation

Specification of the OpenGL renderer architecture

Opened this issue · 114 comments

Overview of the PlayStation GPU

GPU Rasterizer

The GPU uses 1 megabyte of video RAM organized as a framebuffer of 512 lines of 2048 bytes. The CPU can upload textures to this buffer using GPU commands (it's not directly memory mapped in the CPU address space), it can also read back portions of the framebuffer using other commands.

The GPU also contains a relatively simple 2D rasterizer capable of drawing lines and triangles (and "quads" which are really just two triangles side-by-side). It supports solid colors, textures (truecolor and paletted), several transparency modes and gouraud shading. It can also apply a dithering pattern before outputing the 16bit color. The GPU has a small texture cache to speed up rendering of textured primitives.

The rasterizer always outputs 16bits per pixel (1555 RGB, where the MSB is the "mask" bit) so as far as it's concerned the VRAM is a framebuffer of 1024x512 pixels. Therefore all the draw commands use this system of coordinates.

Note that the GPU is fully 2D, it's not capable of 3D projection and therefore has no notion of depth (so no depth buffer or anything like that). The PlayStation does 3D projection on the CPU using the Geometry Transform Engine (GTE) coprocessor. That means for instance that the GPU cannot do perspective-correct texture mapping which is the source of some easily recognizeable PlayStation graphical artifacts.

The coordinate system used by the GPU is simply the 16bit per pixel coordinate in the video RAM, so (0, 0) is the top-left of the framebuffer while (1023, 511) is the bottom right. You can see a list of the GPU draw commands in the No$ specs.

GPU video output

Once a scene has been rendered/uploaded to the framebuffer it needs to be displayed on the TV through the NTSC/PAL analog video output. In order to do this the GPU video output can be configured to select a rectangle in the framebuffer and stream it to the TV.

The size of this window depends on the video timings used. For NTSC it ranges from roughly 256x240 to 640x480 while for PAL it's from 256x288 to 640x576. I say roughly because since it's an analog output you can tweak the timings in many ways so you can actually "overscan" the output to increase the resolution or crop it furthermore depending on what you're trying to do.

Interestingly even though the rasterizer only outputs 16bits per pixel the video output can be configured to use 24bits per pixel. That's of course mostly useless for graphics generated by the rasterizer but it can be used to display pre-rendered 24bit images, for instance videos decoded with the console's MDEC and uploaded on the GPU VRAM. An other application could be 24bit images dumped directly from the disc and used as static load screens.

Design of the emulated OpenGL renderer

Features

First and foremost I think accuracy should be the main focus. Of course if it was the only objective a software renderer would be better suited but I think with modern OpenGL and its programable pipeline it should be possible to reach a decent level of accuracy (except maybe for the GPU cache, see below).

OpenGL would also make it easier to implement certain enhancements to the game's graphics compared to the original console, for instance increased internal resolution, texture replacement, normal maps etc...

Later on we could even attempt to salvage the raw 3D coordinates from the GTE and use them to render the 3D scene directly with OpenGL. That would allow us to have higher precision for our vertex coordinates, perspective correct mapping and many other things only possible with a fully 3D scene.

I think it's important to keep those features in mind when designed the basic renderer architecture so that we don't end up breaking everything when we try to implement one of them.

Potential difficulties

As you can see from the previous sections the PlayStation GPU is extremely simple compared to a modern graphic card, however it features some quirks and "exotic" modes which don't fit the OpenGL pipeline very well as far as I can tell.

Textures and palettes

At first I thought the most obvious approach to emulate the GPU video memory would be to use a single 1024x512 texture/FBO (or bigger if we increase the internal resolution) and use it as our Video RAM. There are a few potential issues with that approach however.

Upscaling and filtering

When a game wants to upload textures and palettes to the video RAM it must use one of the "Copy Rectangle" commands saying where the data should end up in the framebuffer (always using the same 1024x512 coordinate system) and then send the data 32bits at a time.

At this point there's no easy way to know what the data contains, it can be a 24bit RGB image from a video, it could be a 16bit "truecolor" texture, it can be a paletted texture, it can be a palette or several of those things at once. We'll only know how to interpret the data when the GPU actually uses it, either through a textured draw command or by the video output configuration if it's just meant to be dumped on the TV screen without further processing.

This seriously limits what we can do with the raw framebuffer data if we don't want to break anything.

For instance if we have a single big FBO representing the entire framebuffer at an increased resolution (say, 2048x1024). When the CPU attempts to upload data to the GPU we could upscale and optionally filter it and store it in our big buffer. Easy.

Except of course upscaling and filtering palettes and paletted textures won't work as intended, the intermediate pixel values will be meaningless since we basically have a non-linear color space. We cannot risk destroying palettes, therefore we can't really mess with the uploaded data until we know what it's going to be used for. Or at least whatever we do must be reversible if we need to go back to the original value later on.

I'm not sure what's the best way to deal with this. Maybe we could have two framebuffers instead: one at the native 1024x512 resolution containing the raw framebuffer data with no fancy enhancements that would be used for paletted textures and a bigger framebuffer containing the rasterizer's output at increased resolution. Keeping the two coherent will be a challenge however and I don't know where 24bit images fit in there. Maybe we could use a completely different rendering mode when the video output is set to 24bit mode so that we can ignore it the rest of the time.

If we want to implement texture replacement we also need to figure out when it should take place. That's a complex subject however, maybe we can leave it for later.

OpenGL texture sampling

An other potential issue if we use a single texture/FBO for the entire video RAM is that we need to be able to render into it while we sample a texture in an other location in the same buffer. So we would be rendering to an FBO while it's also bound as a texture.

As far as I know this kind of configuration is not well supported by OpenGL and can quickly lead us into undefined behavirour territory.

I believe that this should be achievable using the GL_ARB_texture_barrier extension which is part of OpenGL 4.5 but maybe we can work around it.

Otherwise we could maybe use two framebuffers and "Ping pong" between the two between each frame instead, this way we would write to the current FBO while we use the previous one for input. That could be innacurate if a game decide to use a polygon rendered during the current frame to texture a subsequent one, I know some games use similar features to create some fancy visual effects.

Semi-transparency

The PlayStation GPU rasterizer has several pixel blending modes used for semi-transparent primitives (copied from the No$ specs):

  B=Back  (the old pixel read from the image in the frame buffer)
  F=Front (the new halftransparent pixel)
  * 0.5 x B + 0.5 x F    ;aka B/2+F/2
  * 1.0 x B + 1.0 x F    ;aka B+F
  * 1.0 x B - 1.0 x F    ;aka B-F
  * 1.0 x B +0.25 x F    ;aka B+F/4

Unfortunately I don't think the OpenGL blending fixed function is flexible enough to accomadate all these modes without a significant number of hacks. Besides for accuracy's sake we might want to handle the blending calculations in our own shader code to make sure we don't have weird rounding and saturation discrepancies (if we want to be bit accurate with the real hardware).

For this reason I think it would be better to handle the blending in the fragment shader. Once again this is not generally how things are done in OpenGL as far as I know, but it should be possible at least using the same OpenGL 4.5 GL_ARB_texture_barrier extension mentioned before or by "ping ponging" the buffers.

Masking

The MSB of the 16bit pixel is used to store a "mask" field. When the GPU renders a primitive it can be configured to set this bit to either zero or one. An other configuration flag can tell the GPU to treat framebuffer pixels with the mask bit set as "read only" and refuse to overwrite them. It effectively works like a simplified stencil test in OpenGL.

The problem if we decide to use a stencil buffer to emulate this masking feature is that we'd potentially need to update the stencil buffer after each primitive, since it's possible for any primitive draw to set the mask bit if the GPU is configured to do so. I don't know if it's possible to meaningfully modify the stencil buffer in an OpenGL fragment shader. Barring that we won't be able to use the stencil test to accurately emulate the masking.

Alternatively we could use the same trick I proposed to handle the semitransparency modes above: we fetch the target FBO pixel in the fragment shader and if the masking is enabled and its MSB is set we don't change its value. If we already handle transparency that way it might be relatively straightforward to add, theoretically.

Video output

So far I only talked about rending things inside the framebuffer, we also have to implement the video output to display the relevant part of the framebuffer on the screen.

The PlayStation video output streams the pixels from the framebuffer directly on the screen without any kind of intermediate buffer. In other words the display is "continuous", if a game wants to implement double buffering it does it by rendering to two different parts of the framebuffer and swapping the video output source position and GPU rasterizer draw offset when it wants to "flip" the buffers.

In order to emulate this properly we'd have to basically render one pixel at a time, keeping track of the output video pixel clock. I don't think that can be done very efficiently in OpenGL, nor does it sound necessary to emulate correctly the vast majority of games.

Instead we could display the entire visible portion of the framebuffer at once at the end of every frame or the beginning of the next one maybe. If the game uses double buffering we want to it right after it swaps its buffers to reduce latency. I'm guessing the swapping would take generally take place during the vertical blanking to reduce tearing artifacts, so maybe we could do the frame rendering at the end of the vertical blanking. I need to do more tests to make sure that's really how it works in practice though.

24bit mode

As explained before while the PlayStation rasterizer can only output 16bit RGB to the framebuffer the video output is capable of treating it as 24bits per pixel to display pre-rendered graphics.

When displaying in 24bit mode we'll have to find a way to take our 16bit RGB framebuffer and display it as a 24bit image. I don't know how difficult that is with OpenGL. I guess at worse we could sample two 16bit pixels in the fragment shader and reconstitute the correct 24bit value there. We could also implement 24bit image upscaling and filtering there to avoid having to handle it in the 16bit code. After all 24bit mode is a very limited special case on the original hardware.

Texture cache

The PlayStation GPU has a small texture cache used to speed up the rendering of textured primitives. Normally it only affects the speed at which a primitive is rendered, however if the cache becomes "dirty" (for instance by overwriting a texture while some of it is in the cache) it could potentially change the aspect of the resulting triangle.

I have no idea how to emulate this particular feature in OpenGL. As far as I can tell the only way to emulate it accurately would be to draw each primitive pixel-by-pixel, updating the cache state in the process but that goes against the spirit of the massively parallel GPUs we have today.

Fortunately I believe that for the vast majority of the games we can completely ignore this cache, so maybe we can ignore it or at least put it very low in our priority list.

I was about to open an issue about OpenGL versions supported - I can't get rustation to work on x86 with OpenGL 1.x, nor on arm (OpenGL 1.x via glshim) or even using Mesa (OpenGL 2.0).

Does this mean OpenGL 3 is mandatory?

@petevine Currently yes but you can use Pete's OpenGL1 and 2 plugins however such ancient opengl versions are not good for accuracy.

Yeah, I'm still not sure which version of OpenGL we'll end up targeting but OpenGL < 3 seem out of the question. It's simply lacking too many features to emulate what we need accurately.

Maybe a very accurate software renderer (based on mednafen's code?) would be nice to have at some point though.

MaskBit can be emulated by combination of Stencil Buffer and Destination alpha http://arek.bdmonkeys.net/SW/pcsx/
devmiyax/yabause@c03e903
@simias
By the way can output be configured to display 32bit per pixel for images and pre-rendered graphics instead of 24bit?

A pity though, even some not too ancient x86_64 laptops have just hardware OpenGL 2.x.

On a tangent, I've noticed some completely trivial programmes like minesweeper default to v3 but that's probably down to the commonly used rust GL libs.

I honestly didn't know OpenGL 3 and later was still so poorly supported. I mean, 3.0 was released in 2008, 3.3 in 2010...

I expected that since rustation has pretty high CPU requirements due to the lack of dynarec OpenGL 3.3 would be a good "baseline" to target. Looks like I was wrong...

@ADormant no I believe there's only 16 (really 15 since the MSB is the mask bit) and 24bit output on the original console. But the more I think about it the more I think 24bit handling should be a special case, since it can only be used to display pre-rendered images increasing the internal resolution would be pretty pointless. In 24bit mode we could just take the images at the original resolution and upscale/filter them before displaying them, like it's done with 2D consoles.

And regarding the stencil you just made me discover glStencilOp, indeed it seems like the right way to emulate the mask bit without shader trickery.

Alternatively if we're sampling the destination pixel for "manual blending" we could have a stencil buffer always set completely to 0 and a glStencilFunc set to test GL_EQUAL 0. In the fragment shader we extract the max bit from the target pixel and use it to set gl_FragStencilRefARB which is used for the stencil test. The advantage would be that we won't have to ever touch the stencil buffer or worry to maintain coherence between the pixel's MSB and the stencil buffer (which might be important if the CPU wants to read back a portion of the framebuffer and expects the mask bits to be set properly). Not sure if that's a good idea though.

Do you mean 24 bit for pre-rendered backgrounds only or everything because I believe pete's plugins can use 32bit color depth. Probably emulating it through shaders would be more accurate or maybe the best option would combination of stencil buffering and shader blending?

Well, I assume that 32bit color depth is really 24bit RGB + 8bit alpha or something similar, 8bit per component (~16.8 million colors). Few computer monitors are able to display more than that anyway, it sounds completely overkill for PSX graphics.

That being said as far as the emulated renderer is concerned it's possible to increase the color depth arbitrarily, although you'll always be limited by the color depth of the original textures unless you replace them. That's probably what pete's plugin is refering to with "32bit graphics", it just means that it doesn't accurately truncates the rasterizer's colors to 16bits with dithering like the real console but outputs at a greater color depth. That results in a sharper, less noisy image (for better or worse).

For instance currently rustation's output is using whatever color depth the SDL window is using, so probably RGB888 or similar. The SCE logo gradient as displayed by rustation looks much smoother than the one displayed by a real console (or mednafen PSX) because of the increased color depth.

I think increasing the color depth has basically the same caveats as increasing the internal resolution, we need to be very careful with palettes and paletted textures but it should mostly "just work" with the rest. At least in theory...

Definitely when bladesoft increases color depth it stops using dithering since it's no longer needed. I guess pete and bladesoft are 8 bits per channel although 10 bits per channel(1.07 Billion colors) should be doable with new especially 4k monitors too.

It's unfortunately pretty common to say "32bits" when you actually mean 24bit RGB. For instance Windows' display settings used "32bit" to mean 24bit RGB until at least Windows XP.

Of course if you can increase the color depth to 24bits you can go as high as your hardware will allow, but I doubt you'll see a significant difference above 24bits for PlayStation graphics unless you replace the textures or manage to hack in things like HDR.

I think we should still support 16bit+dithering because some games tend to look very "flat" without the dithering noise IMO. Especially untextured polygons.

The so called 32bit is a variant called RGBA color space. These plugins indeed seems to have 24bit+8 alpha https://en.wikipedia.org/wiki/Color_depth#True_color_.2824-bit.29
By the way is there any problem with implementing MSAA and SSAA in a PSX emulator?

Yeah but there's no meaningful alpha on a computer screen, it's only useful for blending so it's really only 24 meaningful bits.

I think multi/supersampling is analogous to increasing the internal resolution, if one works the others should work as well, at least as far as I can see.

I wonder about hardware accelerated framebuffer emulation and framebuffer effects it that a problem like with N64 emulation? @simias
Also I wonder if it's possible to emulate GTE and MDEC on a GPU? Maybe even merge them into one?

I don't know the N64 very well but I believe one of the issue with hardware renderers on this console is that the video memory is shared with the CPU, therefore you have to be super careful about interactions between the CPU and GPU.

On the PSX the VRAM is dedicated to the GPU and the CPU has to go through the GPU registers to access it so the situation is very different. Basically the CPU can't go "behind your back" and mess with the framebuffer without getting notified in the GPU code. I also believe that the N64 GPU is much, much more complicated than the PSX GPU.

Emulating the GTE alone on the GPU would be tricky since it's many small atomic operations, given the cost of sending the data to the GPU and bringing it back it would probably end up being very slow and I don't see how you could batch the commands without assuming a lot of things about what the game is doing. I think it would be better to try and implement it using SIMD instructions to speed it up.

However if GTE "accuracy" (real 3D rendering on the GPU with full precision) is implemented then it should be possible to replace the GTE code with a simple placeholder since the real work (projection, shading, zbuffer etc...) will be done on the GPU. It might harm compatibility however.

Regarding the MDEC I honestly don't know but since it's not generally used "interactively" I'm not sure if it's really worth it. I don't really see what we would gain from a GPU based MDEC except slightly lower CPU consumption while playing videos.

Well some games seem to use MDEC for textures and perhaps it'd allow for things like FMV dumping/replacement or applying shaders to FMVs.

MDEC-decoded images are then uploaded to the GPU just like any other textures, the replacement/filtering could take place there without any MDEC-specific code.

Doing it at the MDEC's output would be annoying because you would have to match the original resolution otherwise it will mess up the DMA-to-RAM transfer. It's better to handle all that on the GPU side IMO.

I'm not really familiar with the Saturn but guessing from the name and those commits it seems more related to how different layers of sprites are rendered (some sprites can have higher "priority" than others and overlap them). The PlayStation doesn't even have real sprites so the concept doesn't carry over (although you could say the mask bit is kind of a priority bit).

For the rest the Saturn architecture is so different from the PSX that I don't really know if there are things from yabause that could be carried over.

What do you mean by zbuffer exactly? The PlayStation doesn't have a z-buffer.

Just to be sure: the GPU has 1M of ram (+ cache). A (fixed?) part of it is used as a framebuffer in which the GPU renders. It then shows the framebuffer on the screen. Am I right?

Mostly but nothing is really fixed. Basically the game has 1MB of video memory to hold everything: textures and framebuffers. This video memory is addressed as a 2D buffer of 1024x512, 16 bits per pixel. All the render commands (draw triangle, draw quad, draw line...) use this system of coordinates. You can give an offset to the GPU that will be added to the vertice coordinates (currently implemented as the offset uniform) and define a drawing area outside of which the GPU won't render (currently implemented with the scissor box).

Here's a VRAM dump of my real console displaying the PAL version of Crash Bandicoot:

crash-pal

Here's what it looks like on rustation today, running the japanese (NTSC) version:

crash-opengl

You can see that the game uses dual buffering so you have two framebuffers at the top of the image. Once it's done rendering one buffer it configures the video output "vram start" coordinate to display this part of the VRAM, then it changes the offset and display area to switch to the other buffer and draw the next frame.

The rest of the VRAM holds textures, most of them looks weird in this image because in order to save space many (most?) games use 4 or 8 bit paletted textures . You can see that for now I completely ignore texture uploads so this part of the VRAM remains blank in rustation.

But note that nothing here is fixed, the VRAM organization is left to the game. For instance here's a dump of the VRAM for Spyro (PAL):

spyro-pal

You can see that the developers decided to stack their framebuffers on the left of the VRAM.

One last example, Metal Gear Solid (PAL):

mgs-pal

We can see that the game uses a smaller horizontal resolution in order to stuff more textures in the VRAM. The PlayStation video output supports several horizontal resolutions by changing the speed at which the pixels are sent to the analog output (see https://github.com/simias/rustation/blob/master/src/gpu/mod.rs#L1148-L1178)

Idea: instead of copying the framebuffer, why not copy the textures currently used ? It would solve the synchronisation problem between two buffers. It may not be possible to batch commands, as we have to copy each time the texture. We could use a monotonically increasing z coordinate and reorder the commands, but I'm not sure how it would play with transparency.

@simias I got a question:Is it possible to implement a more universal widescreen hack? The current one used in PS1 emulators doesn't work very well for games with mixed 2D/3D and pre-rendered backgrounds.
By the way something about implementing perspective-correct rendering is here
http://problemkaputt.de/psx-spx.htm#gpumisc

widescreen correct
widescreen hack bugs

Heh, that does look pretty bad. The problem is how would be the right way to render this in widescreen? Scale the background and have black bars on the sides? I guess that could be doable if we could figure out a reliable heuristic to detect backgrounds.

I'm not sure how the widescreen hacks are implemented in other emulators but my first guess would be to modify the GTE to force a different aspect ratio. If that's the case then the 2D graphics that don't go through the GTE remain untouched.

Even for 3D games I'm pretty sure it won't work well 100% of the time, here's a quick hack I made by scaling the X coordinates in the GTE:

Normal "accurate" rendering:

crash

Rendering with the X coordinate scaled by 2/3 in the GTE (effectively rendering with an ultra-wide 16:8 aspect ratio):

crash-ws

You can see that the aspect ratio is definitely wider, however the game doesn't expect to render like that so we're seeing missing polygons on the sides.

I don't know if there are better ways to do this but it seems to be a common problem with widescreen hacks:

https://www.youtube.com/watch?v=BKRWonevCmM

Maybe there's a better way to do it but I can't think of any generic way to fix this. Of course if you're willing to use game-specific hacks it should be possible to trick the specific game engine to render at a different aspect ratio, but that's a whole other problem...

By the way I wonder if trilinear and anistotropic filtering is possible without depth-buffer? Currently existing PSX emulators have problems like black outlines and boxes with even basic bilinear texture filtering and need things like multi-pass alpha blending and alpha testing to mitigate those problems.
http://www.emulation64.com/guides/17/04/
http://www.emulation64.com/guides/17/06/psx-plugins-lewpy-s-glide-gpu.html/
http://www.emulation64.com/guides/17/03/psx-plugins-lewpy-s-glide-gpu.html/
http://ngemu.com/threads/alpha-multipass-in-petes-ogl-driver.2110/
http://www.fpsece.net/forum2/viewtopic.php?t=3615
https://docs.google.com/spreadsheets/d/1gsjK1WvLV4xpSD5jRupuBiM8XBnfz9C8VNcUp71HP_w/edit#gid=0
About dithering I think it should be possible to emulate it with a fragment shader but there should be an option to disable it.
Some examples of bilinear filtering and transparency/multi-pass alpha blending bugs from Pete's plugins and FPSE(Texture Barrier and shader blending should be good for these):
black borders
black lines 2
black lines 2png
black lines
boxes
bug outlines
lines bug
2uetugi lines

@simias

I'm not sure what causes those problems, I've seen things like that while upscaling textures with an alpha channel (where the alpha ends up "leaking" through the filter) but I don't see why that would be a problem on the PlayStation since there's no real alpha channel. I guess we'll have to try and fix the issues as they come up.

The tearing seen on some of your screenshots (like the Grandia start screen) could be due to rounding errors though, I've seen similar glitches when playing with weird resolutions with my current renderer.

This is what I found about Anisotropic/Trilinear filtering and perspective-correction(which works as shown by Edgbla) for PS1.
http://ngemu.com/threads/perspective-correction-whats-up-with-that.21080/
@simias

Perspective correct mapping and anisotropic filtering "simply" requires getting the Z coordinates from the GTE somehow. It might be easier to implement it alongside the GTE "accuracy" hack since it's also about getting more data from the GTE to the GPU. Basically what Lewpy says in your link.

I'm starting to think that the most efficient way to do that might be to completely bypass the GTE and send the raw vertex coordinates directly to the renderer. This is easier said than done though.

There's also an interesting bit of information in your link regarding the black halos you mentioned earlier:

Bilinear filtering does not require any more information than the PSX GPU already gets passed, so it can be enabled. BUT, bilinear filtering does require carefull layout of textures in VRAM, which most PSX games do not do. This means there are glitches when enabling bilinear filtering, such as black halos (due to chroma-keying issues).

I'm not entirely sure I understand what he means by that though.

I'm not entirely sure I understand what he means by that though.

Pete plugins have alpha multi-pass option to mitigate those filtering bugs whilst Lewpy's plugin has Alpha testing option.
By the way do you know to dump and replace textures in PSX GPU? Bladesoft can already do it but has very low limit on the size of textures.
@simias
https://www.opengl.org/discussion_boards/showthread.php/128738-Blending-and-alpha-black-border
https://www.opengl.org/discussion_boards/showthread.php/176060-Border-of-Alpha-Blended-Textures-get-black
https://www.opengl.org/discussion_boards/showthread.php/172705-Blending-textures-hav-a-shadow-on-the-border-Why
http://blender.stackexchange.com/questions/31420/how-to-get-rid-of-out-of-frame-black-borders-around-a-scaled-down-movie-overlay

@simias I found explanation about this whole black border problem.

http://www.razyboard.com/system/morethread-an-idea-for-texture-filtering-without-black-borders-pete_bernert-41709-1143926-0.html

http://www.razyboard.com/system/morethread-an-idea-for-texture-filtering-without-black-borders-pete_bernert-41709-1143926-10.html

http://www.razyboard.com/system/morethread-an-idea-for-texture-filtering-without-black-borders-pete_bernert-41709-1143926-20.html

http://nehe.gamedev.net/tutorial/masking/15006/

http://www.razyboard.com/system/morethread-problem-with-flickering-border-fix-pete_bernert-41709-457403-0.html

While the basic idea ("how to do something like an alpha-test without texture alpha values" ) is nice (and, of course, it has certain disadvantages), it will not help very much with the "black border" problem.

First, there are two general "border" issues when texture filtering is enabled in psx emulation:

  1. the general color interpolation problem:

example:

the original (not filtered) psx texture is something like that:

BBBBBBBBB
BBBBXBBBB
BBBXXXBBB
BBXXXXXBB
BXXXXXXXB
BBBBBBBBB

B=Black (in texture, transparent while drawn), X=some color

now you activate filtering, and the gfx card hardware will interpolate the texel colors while drawing:

BBBBBBBBB
BBBBGBBBB
BBBGXGBBB
BBGXXXGBB
BGGGGGGGB
BBBBBBBBB

Those "G"'s are the interpolated colors of the real texture color (the one you want to see) and the surrounding Black (transparent) colors. The "G"s will not be exactly black, but in the final drawing they will appear as some kind of dark color.

  1. the background masking problem

Some games, like FF7, are having "background" gfx, and some "front" gfx, for example a table (front) before a kitchen (back). The main character can walk between the table and the background scene.

How is the background scene and the table done?

background (kitchen), A=some colors, B=solid black (not transparent!)

AAAAAAAAA
AAABBBAAA
AAAABAAAA
AAAABAAAA
AAAAAAAAA

Foreground table, X=Some color, B=Black (transparent)

XXX
BXB
BXB

Now the table will be dran on top of the background, without filtering no problem:

AAAAAAAAA
AAAXXXAAA
AAAAXAAAA
AAAAXAAAA
AAAAAAAAA

But if you do filtering, the shapes will be not 100% correct anymore, the solid black part of the background will be interpolated in the other bkg colors... tada, a black border around the table.

And both problems will also happen with the "mask transparency" trick

Ah thank you! That makes sense. If I understand correctly it's because the GPU doesn't understand that black colors are supposed to be transparent and extrapolates using them, resulting in non-transparent dark values.

Now of course we could (and probably should?) not use the basic OpenGL texture filter and implement it in the fragment shader instead. This way we could special-case black pixels in this case. It'll also let us use more clever filters although without texture cache we'll have to do it for each frame. I don't know if it's a problem for modern graphic cards given the average number of textured polygons on the PlayStation.

The problem of superposing several textures (like the table in the kitchen in his example) will need special care though because we can't have any seam between the two bitmaps otherwise the background will leak through.

The good news is that I can now kind-of start a few games so I have a bigger sample size to see how games typically use the GPU.

crash-japan

An other thing I'm not sure how to handle are lines. It's the third type of primitives used by the GPU after quads and triangles. How should we draw them? If we draw them as lines what happens when we increase the internal resolution, do we just make them thicker? I wonder how the other emulators/plugins handle that.

A good way to test it could be the intro screen for MediEvil:

medievil-mednafen

The raindrops are shaded lines.

I hadn't considered geometry shaders for that but it's quite clever. One situation I wasn't sure how to handle was non-standard aspect ratio (like wide screen hacks) where vertical and horizontal lines would have to be drawn with different thickness depending on the angle. I guess that could be solved by converting them to quadrilaterals in the geometry shader.

A potential annoyance is that line primitives would have to be rendered with a different draw call (since as far as I know you can't render two types of primitives in a single draw call?) so every time we switch from one primitive type to an other we'll have to insert a new draw call. In a worst case scenario (a series or alternating triangles and lines) that could probably be quite bad.

For opaque primitives this could be solved by enabling the Z-buffer and then draw the primitives in whichever order we want. This way we could also render everything in the opposite order used by the PlayStation GPU since we receive the primitives from farthest to closest but in order to limit overdraw we want to render them the other way around (if a primitive is hidden by an other there's no need to bother rendering it). Of course semi-transparent primitives will have to be rendered afterwards in the right order to display properly anyway.

Regarding wireframe I agree that it's useful (and cool looking) and I already play with it using PolygonMode (not the geometry shader):

spyro-moon-bg-wf

The difficulty is to figure out when to clear the buffer, otherwise the wireframe of each successive frame is drawn on top of the previous one and it becomes messy real fast. Designing the right heuristic to figure out when the image should be erased is the tricky part. For the image of spyro's background above I just bound a key to clear the framebuffer manually when I wanted to get a fresh image...

@simias For debugging GL KHR Debug extension is good https://github.com/citra-emu/citra/pull/1196
Dolphin's pulls related to aspect ratio and fullscreen:
dolphin-emu/dolphin#1231
dolphin-emu/dolphin#2769
dolphin-emu/dolphin#2765
dolphin-emu/dolphin#2796
dolphin-emu/dolphin#2791
dolphin-emu/dolphin#506
dolphin-emu/dolphin#726
dolphin-emu/dolphin#1688
dolphin-emu/dolphin#1764

Quadrilaterals are a good idea. Quad Rasterizer in gpubladesoft fixed the majority of warping.

megaman legends quad
need for speed quad
spyro quad
spyro quad 2
test drive quad
threads of fate quad
tomba 2 quad rendering
twisted metal 2 quad
twisted metal 3 quad

Not sure, but I think that with glium we don't need GL_KHR_debug. @tomaka

Multi-primitive graphics rendering in one draw call seems doable but you will probably need newer OpenGL than 3.3 @simias

http://www.songho.ca/opengl/gl_vbo.html
http://www.songho.ca/opengl/gl_vertexarray.html
http://www.openglsuperbible.com/2013/12/09/vertex-array-performance/
http://in2gpu.com/2014/09/24/render-to-texture-in-opengl/
http://www.cs.kent.edu/~zhao/gpu/lectures/OpenGL_FrameBuffer_Object.pdf
http://www.songho.ca/opengl/gl_fbo.html
https://www.opengl.org/registry/specs/ARB/pixel_buffer_object.txt
https://www.opengl.org/registry/specs/ARB/texture_rg.txt
https://www.opengl.org/registry/specs/ARB/wgl_render_texture.txt
http://www.google.com/patents/US20140098117
http://www.informit.com/articles/article.aspx?p=2033340&seqNum=4
http://stackoverflow.com/questions/27946183/draw-multiple-shapes-in-one-vbo
https://www.opengl.org/wiki/Vertex_Rendering
https://www.opengl.org/wiki/Vertex_Specification
https://www.opengl.org/wiki/Primitive
https://www.opengl.org/registry/specs/NV/bindless_multi_draw_indirect.txt
https://www.opengl.org/registry/specs/NV/vertex_buffer_unified_memory.txt
https://www.opengl.org/registry/specs/ARB/indirect_parameters.txt'
https://www.opengl.org/registry/specs/ARB/multi_draw_indirect.txt
https://www.opengl.org/registry/specs/ARB/draw_indirect.txt
https://www.opengl.org/registry/specs/ARB/multi_bind.txt
http://www.g-truc.net/post-0642.html
https://www.opengl.org/registry/specs/ARB/bindless_texture.txt
https://www.opengl.org/registry/specs/NV/bindless_texture.txt
https://www.opengl.org/registry/specs/ARB/sparse_texture.txt
https://www.opengl.org/registry/specs/ARB/sample_shading.txt
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_NV_fragment_shader_interlock.txt
https://www.opengl.org/registry/specs/ARB/stencil_texturing.txt
https://www.opengl.org/registry/specs/ARB/uniform_buffer_object.txt
https://www.opengl.org/registry/specs/ARB/gpu_shader5.txt
https://www.opengl.org/registry/specs/ARB/gpu_shader_fp64.txt
https://developer.nvidia.com/sites/default/files/akamai/opengl/specs/GL_EXT_raster_multisample.txt
https://www.opengl.org/registry/specs/NV/conservative_raster.txt
https://www.khronos.org/registry/gles/extensions/EXT/EXT_primitive_bounding_box.txt
This extension sounds good for quad rendering: https://www.opengl.org/registry/specs/NV/fill_rectangle.txt
https://www.opengl.org/registry/specs/ARB/occlusion_query2.txt
https://www.opengl.org/registry/specs/EXT/polygon_offset.txt
https://www.opengl.org/registry/specs/EXT/polygon_offset_clamp.txt
https://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt
https://www.opengl.org/registry/specs/ARB/seamless_cubemap_per_texture.txt
https://www.opengl.org/registry/specs/ARB/texture_cube_map_array.txt
https://www.opengl.org/registry/specs/ARB/texture_gather.txt
https://www.opengl.org/registry/specs/ARB/arrays_of_arrays.txt
https://www.opengl.org/registry/specs/ARB/instanced_arrays.txt
https://www.opengl.org/registry/specs/ARB/texture_query_lod.txt
https://www.opengl.org/registry/specs/EXT/texture_lod_bias.txt
https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_texture_lod.txt
https://www.opengl.org/registry/specs/ARB/texture_query_levels.txt
https://www.opengl.org/registry/specs/ARB/invalidate_subdata.txt
https://www.opengl.org/registry/specs/ARB/copy_image.txt
https://www.opengl.org/registry/specs/ARB/clear_texture.txt
https://www.opengl.org/registry/specs/ARB/texture_view.txt
https://www.opengl.org/registry/specs/ARB/texture_storage.txt
https://code.google.com/p/glextensions/wiki/GL_EXT_timer_query
https://www.opengl.org/registry/specs/ARB/debug_output.txt
https://www.opengl.org/registry/specs/ARB/shader_precision.txt
https://www.opengl.org/registry/specs/ARB/shading_language_420pack.txt
https://www.opengl.org/registry/specs/ARB/texture_rectangle.txt
https://www.opengl.org/registry/specs/ARB/texture_non_power_of_two.txt
https://www.opengl.org/registry/specs/ARB/shader_image_size.txt
https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_io_blocks.txt
https://www.opengl.org/registry/specs/ARB/shader_image_load_store.txt
https://www.opengl.org/registry/specs/EXT/shader_image_load_formatted.txt
https://www.opengl.org/registry/specs/ARB/shader_storage_buffer_object.txt
https://www.opengl.org/registry/specs/ARB/enhanced_layouts.txt
https://www.opengl.org/registry/specs/ARB/transform_feedback3.txt
https://www.opengl.org/registry/specs/ARB/shader_texture_image_samples.txt
https://www.opengl.org/registry/specs/ARB/draw_buffers_blend.txt
https://www.opengl.org/registry/specs/ARB/blend_func_extended.txt
https://www.opengl.org/registry/specs/KHR/blend_equation_advanced.txt
https://www.khronos.org/registry/gles/extensions/EXT/EXT_discard_framebuffer.txt
https://www.opengl.org/registry/specs/ARB/shader_subroutine.txt
https://www.opengl.org/registry/specs/ARB/shader_draw_parameters.txt
https://www.khronos.org/registry/gles/extensions/KHR/texture_compression_astc_hdr.txt
https://www.opengl.org/registry/specs/EXT/shader_integer_mix.txt
https://www.khronos.org/registry/gles/extensions/EXT/EXT_shader_pixel_local_storage.txt
https://www.opengl.org/registry/specs/ARB/compressed_texture_pixel_storage.txt
https://www.opengl.org/registry/specs/ARB/framebuffer_no_attachments.txt
https://www.opengl.org/registry/specs/ARB/explicit_uniform_location.txt
https://www.opengl.org/registry/specs/ARB/texture_storage.txt
https://www.opengl.org/registry/specs/ARB/texture_storage_multisample.txt
https://www.khronos.org/registry/gles/extensions/EXT/EXT_multisampled_render_to_texture.txt
https://www.opengl.org/registry/specs/ARB/sampler_objects.txt
https://www.opengl.org/registry/specs/ARB/texture_buffer_object.txt
https://www.opengl.org/registry/specs/ARB/texture_buffer_object_rgb32.txt
https://www.opengl.org/registry/specs/INTEL/fragment_shader_ordering.txt
https://www.opengl.org/registry/specs/ARB/shader_atomic_counters.txt

Heh, you don't have to link the entire OpenGL spec! Also you don't have to mention me every time, I receive a notification every time something is posted anyway.

I think an alternative to quads could be perspective correct rendering with triangles. If we can retrieve the Z coordinate from the GTE then I think we can just map the scene properly without too much texture warping. Maybe quad rendering could be a simpler hack though. Well, first we need to have triangle texturing...

How would the GPU memory frame and texture uploads interact with a hipothetical widescreen hack support? I know it breaks games in general (without other per game hacks for menus etc), but if you can increase the view camera width, increase the size of the memory 'region' that holds the frames and hopefully convert coordinates of the texture memory zone to the new enlarged frame zone it might work.

Or it might not be like that. I don't know how mednafen is doing it (i heard it now has widescreen support like PCSX2).

edit: nvm i didn't see the discussion above. I guess a generic method works fine for primitive renderers without culling that just use the camera, but when the typical 2d optimized screen rendering that occurs on the psx expects the image to end they mostly stop drawing. It would need per game hacks to enlarge the game area i guess.

With regard to quad rasterization and perspective-correction the best option would be to have both at the same time like gpubladesoft is supposed to have because warping caused by affine texture mapping occurs mostly during movement as shown here https://www.youtube.com/watch?v=inFqJvEGGYc however triangle warping in PSX games is permanent as shown on the clips above.

Here's how the widescreen hack is implemented in beetle psx (a fork of mednafen PSX):

https://github.com/libretro/beetle-psx-libretro/blob/master/mednafen/psx/gte.cpp#L1035

It's a simple modification to the aspect ratio of the screen projection in the GTE.

Here's what it looks like in rustation today, first without the hack (Spyro french disc):

spyro-pal

And with the GTE widescreen hack:

spyro-pal-widescreen

For this game it doesn't seem to render quite well, there doesn't appear to be too much missing geometry on the sides of the screen, in the regions that wouldn't normally be displayed. Crash Bandicoot doesn't fare so well:

crash-widescreen

Anyway, if you look at the two Spyro screenshots you can see that the actual resolution of the image in VRAM doesn't change, you have to rescale the image to get the desired aspect ratio in the backend.

You can also see that the size of the big gray rectangle doesn't change between the two images, that's because it's the 2D "Spyro" logo and it probably doesn't go through the GTE at all, it's drawn on top of the 3D image. It's similar to the backgrounds in RE, I assume. If you display the resulting image in widescreen the logo will look stretched.

Would it be possible to increase the image resolution in VRAM instead? Well that would be tricky. Remember that Rustation doesn't handle textures and the framebuffer is really supposed to look like this:

spyro-pal

With the right side holding all the texture data. Obviously if you widen the rendered image you will overwrite the textures which is not... ideal. With some very HLE tricks it might be possible to get it to work but it sounds tricky. It's probably not worth the hassle.

A better way might be to identify draw commands that didn't go through the GTE and scale those in the renderer instead.

I agree that quad rendering and perspective correct mapping should probably both be implemented, quad rendering is probably much more straightforward (no need to mess with the GTE).

Do we really want to emulate real interlacing? Interlaced video is always a pain...

As Pete says in one of your links:

the Peops soft gpu plugin emulates interlaced gfx in a very simply way: exactly as non-interlaced gfx (well, in interlaced mode the screen will get updated on every emulated vsync, but that's all).
There is no need to do some semi-clever frame mixing, etc... the PC can display a native height of 512 (or 480) pixels without problems, eh?

That's basically what I had in mind, I don't really think it's worth going beyond that unless a game relies on interlaced video for some weird visual effect. Other than that interlaced video is just a pain to deal with in general, you'll probably have to deinterlace it later on anyway unless you're outputting it to a real TV screen.

Although as far as the actual rendering is concerned this bit is interesting:

I wrote some test code, ran it on my SCPH 7002 unit, and had a look at the resulting VRAM dumps. Looks like the PS1 uses the odd/even bit in the status to decide which rows to render to, and that bit is only valid during the active area of the display (i.e. outside VSYNC area) as I learned some week ago while messing with the root counters.

I assume that's when you disable the "render to display" in the GPU config. It sounds annoying to emulate (and probably not really necessary?) but it's worth keeping in mind.

PSX doesn't render quads but triangle strips with 4 vertices

Yes that's how it's currently implemented although I don't use OpenGL strips to render them so I duplicate the shared vertex:

https://github.com/simias/rustation/blob/master/src/gpu/opengl/mod.rs#L174-L177

This diagram is nice, I wonder where it comes from:

psx-fb

Mmh, you just made me realize that when the mask bit is not forced to 1 for draw commands then its value is taken from the texture (if the poly is textured, obviously). So we can not know its new value until we're in the fragment shader.

This is annoying because I'm not sure it's possible to set the stencil value in the fragment shader. In your link I see that arekkusu says he uses the stencil + the alpha but that sounds lame. And Pete abuses the Z-buffer for that but I really don't think it's worth bothering with that (we have greater plans for the Z-buffer anyway).

So maybe the stencil is not the right way to go for that, maybe we could only use the alpha and discard the fragments in the shader depending on the mode. That might not be great for performance but at least it should be relatively straightforward.

https://www.opengl.org/wiki/Early_Fragment_Test
https://www.opengl.org/wiki/Per-Sample_Processing
https://www.opengl.org/wiki/Rendering_Pipeline_Overview

Fragment shaders are not able to set the stencil data for a fragment, but they do have control over the color and depth values.

Yeah I think it might be simpler to store the framebuffer as GL_RGB5_A1 and use the alpha bit as a mask bit. Or something like that. We want to use non-normalized integer operations as much as possible for accuracy, I'm not sure if GL_RGB5_A1 can be used non-normalized.

Alternatively we could use something like GL_R16UI to have a single "raw" 16bit per pixel and we would do all the color handling in the shaders. That's pretty much what we'll do anyway. Might require more special-casing if we want to support increased color depth though.

That would require a texture cache though, right?

As a first approach I thought the filtering/depth conversion could be done on the fly in the fragment shader for each render (a wasteful but for simple nearest/bilinear probably not too bad) and then an actual texture cache would come later and allow things like texture replacement and fancier shaders.

Probably yes but do you mean PS1 specific texture cache or just a general OpenGL texture cache? General texture cache should greatly improve performance as well. Seems like PSX can use triple-buffering.
http://www.razyboard.com/system/morethread-native-resolution-pete_bernert-41709-4996857-0.htm

Mmm... I will try an easy explanation: you can imagine the whole PSX framebuffer RAM as a 1024x512 rectangle area (with 15 bit color depth).

Now in this area the PSX can define smaller rectangles, which will be used as backbuffer, frontbuffer, or even for triple-buffering (and
everything which is not used as such a "display area" will be filled with texture data and color table data).

The "front buffer" rectangle is the one you will see on your TV screen, while the gpu is rendering in the "back buffer" (or triple) area.

This screen area rectangle typically has a width of 256, 320, 368, 384, 512 or 640 pixel. The height can be anything up to 512 pixel.

Multiple buffering shouldn't change much in this case since I want to manipulate the entire VRAM as a single texture. Things like offscreen rendering should Just Work unless I'm missing something.

A good texture cache would improve the performance without a doubt but it's pretty tricky to get right. That's why I'd rather start without a cache, just sampling the VRAM buffer for textures and palettes.

I've added line drawing support in f6514ed

I added a queue for draw commands, now we can queue triangles and lines (and possibly other things later) and it will use as many draw commands as necessary to render all the primitives in the scene. Of course if a game interleaves many triangle and line drawing commands we'll end up using many draw commands to render the entire scene which will hurt perfs.

When the internal resolution is increased I change the OpenGL LineWidth so that the lines don't appear smaller than they ought to be. Of course that only works if the horizontal and vertical upscaling factors are the same (otherwise the line width depends on the angle). I can't see much use for that anyway besides widescreen hacks.

Now the rain is rendered in the Medievil start screen (using shaded lines):

medievil-rain-upscale

Here's the BIOS using lines to draw its UI:

psx-bios-upscale

I've added line drawing

Was it done with geometry shaders?

No, just using the builtin OpenGL line primitive: https://github.com/simias/rustation/blob/master/src/gpu/opengl/mod.rs#L190-L193

Geometry shaders might be useful if we want to emulate the PlayStation line drawing algorithm exactly but I don't know if it's really significant. For horizontal and vertical lines used in menus it shouldn't change anything at least.

For lines with other angles there might be a difference unless OpenGL happens to use the exact same line drawing algorithm as the PSX GPU (unlikely). It remains to be seen whether it causes problems in practice. I doubt people will be able to tell the difference in Medievil's rain!

I've started implementing the two pass renderer where the commands are drawn to a framebuffer texture before they're displayed to the screen. It's in the gpu_rewrite branch.

Here's how it looks like in Medievil:

medievil

At the bottom right I overlay the entire framebuffer texture for debugging purposes.

By default I use a 16bit ARGB1555 texture since that's what the real console uses, that's why you can see this extreme banding in the gradients (I don't implement dithering yet).

Here's the same scene with 2x internal resolution, 32bit color depth and widescreen hack:

medievil-2x-ws

I think I've figured out how to handle semi-transparency correctly for textured polys, unfortunately it means rendering those polygons in two passes.

The tricky part is that semi-transparent polygons can contain fully opaque, semi-transparent and fully transparent pixels. I think the solution is to render those polygons twice, once only with the opaque pixels and then a second time with the blending equation set properly and rendering only the semi-transparent pixels.

I don't know how bad it will be performance-wise but since this is only for semi-transparent polys I'm hoping it won't be too bad.

The alternative is to handle all the blending in the fragment shader but that means relying on OpenGL extensions and probably reducing portability.

Texture uploading is implemented and seems to work fine in the few games I'm able to boot up.

Here's Spyro:

spyro-pal-vram

We can see that it's pretty close to the VRAM dump I posted above:

spyro-pal

Here's Crash Bandicoot, it uploads the loading screen directly in the displayed framebuffer so it already works as expected:

crash-textures

I've implemented a very basic texture mapping shader in the textures branch. It's quite ugly but at least it shows that doing all the work in the fragment shader seems to work. There's no support for semi-transparency or texture blending.

crash

spyro-start

einhander

Those screenshots were made with 2x internal resolution.

Fixed texture blending. Now it's starting to look decent.

crash-blending

spyro-texblend

@ADormant I tried the quad rasterization thing with mixed results.

Basically the thing works well when the quad represents something that's supposed to be rectangular in the game (since we can basically guess the perspective correction in this situation) but if the quad draws something that's supposed to be a random quadrilateral then the algorithm ends up stretching the texture weirdly.

Here's an example:

quad-mapping

If you look at the book stands the texture does look better in the corrected version, however it also appears slightly stretched. The pixels at the top loop bigger than the pixels at the bottom. That's because the stand is not rectangular, it's wider at the top than at the base.

And the less rectangular the quad the more obvious it becomes. Here's an almost triangular quad at the top of a tomb in medievil:

quad-correct4

It's also visible in the snowy mountain tops in Spyro:

spyro-quad-corrected

So depending on the situation it might look better or worse but in the end the right way to do it is probably to use the actual Z coordinate used by the GTE, this way it'll work for all shapes including triangles without those aberrations.

Does that stretching occur with gpubladesoft's quad rendering too? Either way it would still be good to have this option.

No idea, I've never tried it. Maybe it uses a different algorithm.

I agree that it could still be an option, it's not a whole lot of code anyway.

Perhaps a heurestic can be implemented which detects shape of objects to prevent stretching? After you finish quad rendering I wonder if you could copy GTE accuracy implementation from here
https://pcsxr.codeplex.com/discussions/264234 it is reportedly better than the version implemeted in PCSXR and has some sort of depth data(partial depth buffer?).
and CPU overclocking from here
SonofUgly/PCSX-Reloaded@3f11d29
https://pcsxr.codeplex.com/discussions/647809

Well the problem is that quad mapping uses the shape of the object to guess the perspective correction to apply but there's no other cue to know if an object is a rectangle seen with a perspective or just something that's not rectangular.

Here's a page that describes the algorithm I'm using: http://www.reedbeta.com/blog/2012/05/26/quadrilateral-interpolation-part-1/

Maybe I could reject quads that look too "un-rectangular" to avoid the extreme stretching seen in the Medievil screenshot above. I'll do more testing once I get more 3D games to work.

I think I figured out how to handle both semi-transparency and masking in all cases, I'm making a note here before I forget. There are many corner cases to consider but I really hope this covers everything:

For opaque polygons

  • If "mask test" mode is enabled we can use the alpha blending to discard masked fragments (RGB blend equation: GL_ONE_MINUS_DST_ALPHA, GL_DST_ALPHA). Otherwise we output the texel alpha as usual using GL_ONE, GL_ZERO.
  • If "mask set" mode is enabled we can update the target alpha/mask bit by using GL_ONE, GL_ZERO as alpha blend equation and forcing the texel alpha to 1.0. Otherwise we use the regular texel alpha value.

For semi-transparent polygons

This is where the fun begins.

The tricky part here is that we need to do actual alpha blending for the semi-transparency to work properly but then it means that we can't abuse it for mask testing. Instead we can use the stencil to emulate the masking.

  • We need to render semi-transparent polygons in two passes: first the opaque pixels in the texture (where the alpha is not set) are rendered like opaque polygons. We can probably use the same shader and draw them along. The semi-transparent texels are ignored (i.e. discarded in the fragment shader). Non-textured semi-transparent polygons can skip this step since they're completely semi-transparent.
  • Multi pass rendering can be implemented using the Z-buffer and giving each polygon an arbitrary Z-value decreasing for every new primitive. This could be reused if we later implement a real Z buffer using the GTE Z-value.
  • Then we must draw the semi-transparent texels (opaque texels are discarded this time around). We can use the following blending equations to emulate all the modes:
Playstation semi-transparency mode OpenGL RGB blending equation OpenGL blending parameters Constant alpha
dst / 2 + src / 2 ADD CONSTANT_ALPHA, CONSTANT_ALPHA 0.5
dst + src ADD ONE, ONE Don't care
dst - src REVERSE_SUBTRACT ONE, ONE Don't care
dst + src / 4 ADD CONSTANT_ALPHA, ONE 0.25

The constant alpha is set with glBlendColor​.

  • If "mask test" mode is enabled this time we can't use the blending equation to mask the pixels since the function is used to emulate the semi-transparency. Instead before we render the semi-transparent texels we can create a stencil buffer set for all pixels in the framebuffer where the alpha bit is set.
  • Once this stencil is built we can enable the stencil test and render the semi-transparent with the stencil test enabled
  • If the "mask set" mode is enabled we can just tell OpenGL to update the stencil value for each pixel written to the framebuffer
  • If "mask set" mode is not enabled we have to be careful since the stencil value should only be set when drawing textured polygons. Since we only draw semi-transparent texels in this pass we know that the mask bit will always be 1 for textured polygons, however for monochrome/shaded polygons it will be 0. I think the only way to handle that is to change the glStencilOpSeparate​ to either update or keep the previous value in the stencil buffer depending on the type of polygon we're about to draw. We must also write the correct 1.0 or 0.0 value to the destination alpha to make sure that the alpha/mask bit remains coherent with the stencil for subsequent commands.

Overall that means that drawing semi-transparent polygons can turn out to be quite expensive since we potentially need two passes + a bunch of juggling with the stencil buffer and the semi-transparency modes (in the worst case they could change between each draw command).

Since semi-transparent primitives have to be drawn in-order we can't batch similar primitives together so in the worst case we might end up having to use a lot of draw commands to render the scene.

This might not be as bad as it sounds though:

  • Semi-transparent polygons are also take longer to draw on the real console so game devs had to be careful not to overuse them
  • The worst case is when drawing semi-transparent polygons with the "mask test" mode set and interleaving many monochrome and textured polygons. That sounds specific enough to be uncommon (although who knows...).
  • By enabling the Z-buffer and rendering all the opaque polygons first we might hopefully hide some of the semi-transparent polygons which will be drawn last.
  • We can also draw opaque primitives front-to-back and reduce overdraw dramatically for the entire scene which should speed the rendering quite significantly for most games (no need to render hidden pixels). The real console doesn't have a Z-buffer so it draws everything using the painter's algorithm which results in a tremendous amount of overdraw.
  • However we have to be careful to draw semi-transparent primitives in the right order since the end result is order-dependent.

Algorithms
http://www.scratchapixel.com/lessons/3d-basic-rendering/rasterization-practical-implementation/perspective-correct-interpolation-vertex-attributes
http://www.cs.cornell.edu/courses/cs4620/2012fa/lectures/notes.pdf
https://www.comp.nus.edu.sg/~lowkl/publications/lowk_persp_interp_techrep.pdf
http://web.cs.ucdavis.edu/~amenta/s12/perspectiveCorrect.pdf
http://www.lysator.liu.se/~mikaelk/doc/perspectivetexture/
http://www.inf.ufrgs.br/~oliveira/pubs_files/PG01_Adaptive_subdivision.pdf
http://ngemu.com/threads/edgblas-gpubladesoft.144037/page-7
https://www.particleincell.com/2012/quad-interpolation/
http://www.iquilezles.org/www/articles/ibilinear/ibilinear.htm
http://math.stackexchange.com/questions/13404/mapping-irregular-quadrilateral-to-a-rectangle
http://stackoverflow.com/questions/26332165/projective-interpolation-of-textures-in-2d-trapeziums-with-opengl
http://vcg.isti.cnr.it/publications/papers/quadrendering.pdf
https://www.inf.ethz.ch/personal/dpanozzo/papers/Demystifying-2015.pdf
http://dl.acm.org/citation.cfm?id=1058131
http://graphics.cs.williams.edu/papers/ClipJGT11/McGuire-Clipping.pdf
http://www.mathworks.com/matlabcentral/answers/222379-how-to-create-patches-of-quadrilaterals-4-vertices-1-at-a-time-and-render-them-all-at-once
http://stackoverflow.com/questions/7532867/pixel-shader-to-project-a-texture-to-an-arbitary-quadrilateral
http://www.cs.cmu.edu/afs/cs/academic/class/15462-f10/www/lec_slides/a1-jensen.pdf
http://help.autodesk.com/view/ACD/2015/ENU/?guid=GUID-253D0647-1CEF-4183-8776-9B48C7000304
http://pc2.iam.fmph.uniba.sk/amuc/_contributed/algo2005/vanecek-svitak-kolingerova-skala.pdf
ftp://ftp.sgi.com/sgi/opengl/contrib/mjk/tips/projtex/distortion.txt
https://en.wikipedia.org/wiki/Multivariate_interpolation
http://stackoverflow.com/questions/26345156/perspective-correct-shader-rendering
http://stackoverflow.com/questions/12414708/correct-glsl-affine-texture-mapping
http://web.eecs.umich.edu/~sugih/courses/eecs487/lectures/24-TextureMapping.pdf

Great! Thank you. I've been googling for alternative quad mapping algorithm without much success. I think bilinear interpolation will solve the stretching I had, however it won't give the right perspective effect for things like wall spans. I'll give it a try.

http://qtandopencv.blogspot.com/2013/10/perspective-correction-for.html
http://opencv-code.com/tutorials/automatic-perspective-correction-for-quadrilateral-objects/
http://stackoverflow.com/questions/15242507/perspective-correct-texturing-of-trapezoid-in-opengl-es-2-0
https://www.opengl.org/discussion_boards/showthread.php/177420-Texture-Perspective-Correction!
https://github.com/tilaprimera/perspective-correction
https://gist.github.com/fbatista/4756418
https://docs.coronalabs.com/guide/graphics/3D.html
https://github.com/stackgl/shader-school
https://github.com/mattdesl/gl-quad
https://github.com/Igalia/piglit
https://github.com/McNopper/OpenGL
https://github.com/progschj/OpenGL-Examples
https://github.com/tomdalling/opengl-series
https://github.com/NVIDIAGameWorks/OpenGLSamples
https://github.com/daw42/glslcookbook
https://github.com/3b/cl-opengl
https://github.com/openglsuperbible/sb6code
https://github.com/g-truc/ogl-samples
https://github.com/p3/regal
https://github.com/dbuenzli/tgls
https://github.com/cginternals/glbinding
https://github.com/alleysark/OpenGL-Tutorials
https://github.com/otaku690/sparsevoxeloctree
http://on-demand.gputechconf.com/gtc/2015/presentation/S5752-Alexey-Panteleev.pdf
http://graphics.cs.cmu.edu/courses/15869/fall2013content/lectures/26_voxelization/voxelization_slides.pdf
http://ocw.mit.edu/courses/electrical-engineering-and-computer-science/6-837-computer-graphics-fall-2012/lecture-notes/MIT6_837F12_Lec21.pdf
http://www.purdue.edu/discoverypark/vaccine/assets/pdfs/publications/pdf/Conservative%20Voxelization.pdf
http://on-demand.gputechconf.com/gtc/2015/presentation/S5442-Chris-Wyman.pdf
https://github.com/mapmapteam/mapmap
https://github.com/AgentD/swrast
Heh shader that makes graphics look like PSX graphics https://github.com/keijiro/Retro3D

Dithering:

dithering

When increasing the internal res the dithering pattern keeps the same size (pixel-wise) so it appears to be getting smaller as the resolution increases:

dithering-2x

It would be easy to scale the dithering as well but that makes it even more obvious so I'm going to leave it that way for the moment.

I've added 24bit display mode support, Spyro used it to display the "Universal" logo at the start. Until now it looked like this, with the 24bit pixels rendered as 16bits:

spyro-no24bpp

Now it renders correctly (I hope):

spyro-24bpp

Since in 24bit mode I have to rebuild the pixel values from the 16bit framebuffer texture there's no linear filtering implemented. It's not really an issue I think, filtering of the output should be handled by the frontend anyway.

Is there an option to disable dithering? Moreover ditering should be automatically disabled at 24bit rendering mode.

There will be one, currently I don't have any configuration system implemented but I'm trying to code in a way that won't make it hard to add it in later.

For configuration systems, remember that people will want to change configuration per game. A two level config system that works like
[GAME_SPECIFIC_OPTION] ? return [GAME_SPECIFIC_OPTION] : [GLOBAL_OPTION]

would likely be ideal (even if at first everything is global options because there's no way to change game specific ones).
It makes sense that whatever data structure you use to represent the 'two levels' you can use either one to the same GUI (activated by different buttons), so you can share code of-course.

This is pretty important especially in key configuration and key macros. For some reason not many emulators manage this in this controller area very well. Dolphin for instance, has per game options but not per-game controller config options (yet), so it's harder than it seems to unify if you don't plan it.

Can RA manage that? Sounds like it would best be handled by the frontend.

Ehh, there are more design issues, like for example, when to save to disc from a config change and to which file, default options, which options are 'safe' and which ones require a game reset and when to change, and stuff like that. I doubt that there is a ready to use solution with a '2 levels' approach in a widget system.

I don't know rust so maybe there is though.

edit: also a per game option system (with corresponding per game config files) would be useful for 'hidden' hacks too if they're supposed to apply to more than one game but not all.

edit2: Oh, you mean retroarch with RA? Maybe so, maybe so. I don't really know how that works, but it would be weird if it didn't allow per game options... If you want to reuse the interface they use for that sure, but i doubt you don't end reimplementing all of it eventually if you want your own GUI.

I plan on implementing the libretro API and then use RetroArch as the frontend. I have enough work with the core, I don't want to reinvent the frontend.

I'm using GL_R16UI for the "raw" VRAM now, this way I can do everything with integers in the shader. The "out" buffer is either GL_RGB5_A1 (accurate 16bit mode with dithering) or GL_RGBA8 (enhanced 32bit mode without dithering).

I'm going to implement the GTE widescreen hack soon, it's pretty straightforward. It'll only work with fully 3D games though.

I've prototyped the GTE accuracy/subpixel precision in mednafen/beetle: https://github.com/simias/beetle-psx-libretro/commits/subpixel_accuracy

subpixel

I based the code on the simple implemnetation from PCSX-R. Unfortunately it's a bit too simplistic to work well in all cases.

The main issue is that it works by using the associating the native (low precision) x/y coordinates with the extended precision coordinates in the GTE. Then when the GPU has to render a triangle it can lookup the cache to find a vertex with the same x/y coordinates and use the high-precision values instead.

Unfortunately if two different vertices happen to share the same native position we can't know which one matches the extended precision data. In the gif above you can observe that on the score at the bottom left, it warps when "subpixel precision" is enabled because it aligns with some of the ground vertices.

And it would be even worse if we used the z-coordinates for texture mapping because they could differ wildly for two vertices sharing the same on-screen position. Then the mapping could potentially be completely wrong.

I've tried implementing a more costly but hopefully more accurate solution but I don't think it will be fast enough to be usable in mendafen. I think I'm going to postpone that until Rustation is in a better shape.

I wonder if ePSXe has a more clever implementation of if it does it like PCSX-R.

Yeah I've stumbled upon the same issue when implementing increased GTE precision in beetle/mednafen as I mentioned in my previous post. iCatButler has the right diagnostic, the lookup table is too hacky to work properly all the time.

A potentially better solution would be to store the increased resolution data alongside the regular PSX precision in a hidden cache and always keep them paired, from the GTE all the way towards the GPU.

Of course this would be more costly since you'd have to check if there's some GTE data for each RAM and DMA access.

I tried implementing that in mednafen and almost got it to work but then I noticed that many games don't send the GTE data straight to RAM but rather pass through a CPU register first. In this case we have to keep the increased precision data paired with the CPU registers and I was worried that it would slow things down too much.

Here's the patch I came up with: simias/beetle-psx-libretro@2b5c6f6

I guess I could try to finish the CPU part to see if it works in practice.

I tried the quad rasterization thing with mixed results.

@simias could post a link to the code of your quad mapping attempt.

It's in the quad_mapping branch: 4b0b5f2

But after our discussion I think I had the bad approach, I should have tried to implement bilinear mapping instead of guessing the perspective correction.

I've tried so many that I'm not sure which one you're talking about :)

Doesn't ring a bell and I can't find any mention of "heuristic" in my commits. Do you remember what it did? You're talking about beetle/mednafen right? Not rustation.

By upscaling you mean increasing the internal resolution, right? The only version that worked well is the one that's currently commited upstream in beetle/mednafen. It's got a heuristic for upscaling 2D elements correctly at 2x (without any seams).

That's interesting but I'm not sure I understand what it's doing exactly. It looks like it might be similar to what I'm trying to implement in the "subpixel" branch. Basically instead of forwarding subpixel data directly from the GTE to the GPU you pair it with the 32bit "native" data and pass it along to the RAM, DMA etc... Of course it adds a performance cost to all RAM access and adds an overhead to all CPU register operations so it's rather slow, but it should be less hacky and buggy than the naive hack that PCSX-R uses.

But that's in theory because I still can't get it to work at the moment even though I got all the pipeline ready, I'm currently debugging to figure out where the subpixel data gets lost.

At any rate if that's what this guy is doing I don't think it'll work without tweaking the CPU itself to be subpixel-data aware since many games seem to like to pass vertex position data through CPU registers.

@simias @Tapcio
Well perhaps you two can exchange informations about this problem.
Nucleoprotein/PeteOpenGL2Tweak#4

Sure, why not. I'll try to create a new issue with my current discoveries. But first I need to fix my debugger...

Yes this tries to work in the way you described but it really hard to implement such thing in PSEmu Pro architecture. I don't know also why I get some incorrect data, but this maybe happening because of hackish way to get source address of data.

GPU can access CPU registers ? Because in PCSX-R and PEOPS OpenGL 1.78 I do not see a different way of getting vertex data by plugin than DMA transfer - all polygons are drawn at GPUwriteDataMem which happens in GPUdmaChain.