Feedback from recent usage

Question

Feedback from recent usage

Closed this issue 2 years ago · 12 comments

I recently tried to do green screen (chroma key / background) removal and compositing on another background which is a pretty routine thing to do.

I had already done it in Evision and Nx so I had a pretty good idea of a strategy. It turns out vips really wants to use an alpha channel for that purpose which is kind of the natural way to think of things from an image processing perspective.

I ended up getting two versions working thanks to @kipcole9 .

The unenlightened way.

{:ok, fore} = Image.open("/home/kevinedey/Downloads/greenscreen.jpg", access: :random)

{:ok, back} = Image.open("/home/kevinedey/Downloads/background.jpg", access: :random)

# Lower bound green
{:ok, l_green} = Image.Math.greater_than(fore, [0.0, 100.0, 0.0])
# Upper bound green
{:ok, u_green} = Image.Math.less_than(fore, [100.0, 255.0, 95.0])

{:ok, color_fore_mask} = Image.Math.boolean_and(l_green, u_green)

{:ok, fore_mask} = Vix.Vips.Operation.bandbool(color_fore_mask, :VIPS_OPERATION_BOOLEAN_AND)

{:ok, masked} = Image.Math.subtract(fore, fore_mask)

{:ok, inverted_fore_mask} = Vix.Vips.Operation.invert(fore_mask)

{:ok, masked_back} = Image.Math.subtract(back, inverted_fore_mask)

{:ok, masked_bin} = Vix.Vips.Image.write_to_buffer(masked, ".jpg")
{:ok, masked_clone} = Vix.Vips.Image.new_from_buffer(masked_bin)

{:ok, masked_back_bin} = Vix.Vips.Image.write_to_buffer(masked_back, ".jpg")
{:ok, masked_back_clone} = Image.from_binary(masked_back_bin)

{:ok, comp} = Vix.Vips.Operation.add(masked_back_clone, masked_clone)

The libvips way - note the use of bandjoin

{:ok, fore} = Image.open("/home/kevinedey/Downloads/greenscreen.jpg")

{:ok, back} = Image.open("/home/kevinedey/Downloads/background.jpg")

# Lower bound green
{:ok, l_green} = Image.Math.greater_than(fore, [0.0, 100.0, 0.0])
# Upper bound green
{:ok, u_green} = Image.Math.less_than(fore, [100.0, 255.0, 95.0])

{:ok, color_fore_mask} = Image.Math.boolean_and(l_green, u_green)

{:ok, fore_mask} = Vix.Vips.Operation.bandbool(color_fore_mask, :VIPS_OPERATION_BOOLEAN_AND)

{:ok, inverted_fore_mask} = Vix.Vips.Operation.invert(fore_mask)

{:ok, masked_person} = Vix.Vips.Operation.bandjoin([fore, inverted_fore_mask])

{:ok, comp} = Image.compose(back, masked_person)

Suggestions:

There's nothing too intuitive that in order to add an alpha channel I need a function called bandjoin. Maybe call it add_alpha with a value for what you want in the alpha channel by default so you could cut transparency to 0.5 for the image with one step.

It really threw me when I was getting the error:

** (MatchError) no match of right hand side value: {:error, "Failed to write VipsImage to memory"}

@kipcole9 informed me it all happens in one pipeline and if you try to buffer twice you'll get that error. I think In Elixir people expect an immutable copy by default. I would open the files with access: :random by default to avoid this type of confusion. When they know what they're doing and they want to optimize, then they can pass in the option that is more efficient. Many will not make it that far.

It seems likely that people are used to Photoshop or imagemagick which just deal with images as giant matrices. Perhaps an explanation about how vips works and how to get along with it would be helpful.
In Elixir, I expect immutable by default. You should tell them how it's different with vips.

Adding a function for composing images with a matrix / tensor the size of the image with boolean values similar to Nx.select would be pretty useful for combining two images selectively without the need for an alpha channel.

For this usecase it would be helpful to have access to functions which will help me blend the foreground and background .

This may fall outside the remit of this library.

These would include ablation and dithering.
I'm may be remembering the wrong term but what I mean by ablation is to strip off the outer pixels. In the case of the green screen removal, there is usually green ghosting around the clipped image. By removing these outer N layers of pixels, it makes it easier to blend the image.
Dithering is just blending the foreground with the background. If you have an alpha channel, this is just turning down the opacity on the border pixels. If the user doesn't have an alpha channel, maybe it makes sense for the user to choose the background dither channel. Anyone who's used gimp a lot may recognize this.

It's a very big stretch but it would be really cool to have a programmatic drawing lib like PIL/Pillow in Python. Even better if it was super performant. After having done the same thing in Nx, Evision and Vix it feels like they all bring something to the table if I was to do a home grown Elixir solution I'd probably start with vips and Nx.

Answer 1 · 2022-10-17T02:42:36.000Z

Thanks for the comments, feedback and suggestions - very welcome indeed. A few immediate thoughts (and later some questions to expose my ignorance):

Image.chroma_key/2

I'll add Image.chroma_key/2 which will mask the chroma. It will do the masking in LCh color space which is a much better way to do this. I will have an option to default to common chroma like :green, :blue. I'll do some experimentation to tune this but will apply the techniques described here. Of course parameters to tune the behaviour too. But I always aim for sensible defaults and then tuning as required. I think this one function will make life a lot easier for the example you have been working on.

Options

:feather option to apply a gaussian blur to the mask (alpha channel) which can help smooth the composition onto a background
:between to specify a color range within which the mask is applied (in any color space)
:outside to specify a color range outside of which is masked
:image to use an image as a mask
:color to specify the color to mask (any color space, any CSS color name)
:threshold to specify the luminance threshold of :color. The combination of the two makes it easy to specify a color luminance range

Image.auto_level/2` (may change the name).

This would be used to remove a color cast and adjust levels, for example from a green screen or other lighting, from the foreground object.

Image.white_balance/2

Adjust the correlated color temperature of an image. Very helpful to adjust for lighting challenges and some kinds of color casts.

Answer 2 · 2022-10-17T02:51:05.000Z

Transparency handling

You indicated an impedance mismatch between Nx and Vix (Image). I agree - but from the other side given my predisposition. What I don't understand is how Nx represents transparency in this case. Assuming an RGB image then there are some differences in the conceptual model between Image and OpenCV for example. libvips images are always three dimensions with the last dimension being the bands (channels) of the image. OpenCV differentiates between dimensions and channels (which has tripped me up a few times). Anyway, in order to composite images there has to be some way to say "this space left intentionally blank". How is that represented in an Nx tensor?

Ablation and Dithering

I think when you say dithering, we photoshop/lightroom people would say "feathering". Basically a gaussian blur on a mask. I will implement this for Image.chroma_key/2.

For ablating (such a good word) this would be something like "increase mask" or "decrease mask". Or maybe "trim". I'll take a look at how I might achieve that.

Programmatic Drawing Lib

I look at Livebook as the primary tool to do that. Vix 0.14.0 now includes the code to preview an image automatically which helps a bit. In addition, libvips has a companion tool called nip2. The precompiled binary doesn't run on modern Mac releases but appears fine on Linux.

Not trying to constrain the idea - just exploring alternatives to see what you have in mind.

Answer 3 · 2022-10-17T12:00:09.000Z

Added Image.chroma_key/2 and Image.chroma_mask/2

On :image master there are now two new functions (and a few others too) that greatly simplify chroma keying an image. I've not yet implemented feathering of the image mask (tomorrow's job) but feedback is most welcome. As are suggested improvements in either function or developer ergonomics.

Example

Image.chroma_key/2 documentation

Chroma key an image.

Chroma keying is the process of removing a background color
from an image resulting in a foreground image that may
be composited over another image.

If the image already has an alpha band then the
image is flattened before adding the image mask
as a new alpha band.

Arguments

image is any t:Vix.Vips.Image.t/0.
options is a keyword list of options.

Options

:greater_than is an rgb color which represents the upper
end of the color range to be masked. The color can be an
integer between 0..255, a three-element list of
integers representing an RGB color or an atom
representing a CSS color name. The default is similar to
"chroma green".
:less_than is an rgb color which represents the lower
end of the color range to be masked. The color can be an
integer between 0..255, a three-element list of
integers representing an RGB color or an atom
representing a CSS color name. The default is similar to
"chroma green".

TODO before next release

Add :feather as an option to apply a gaussian blur to the alpha mask
Add keywords for chroma blue and chroma green (defaulting to chroma green)

Answer 4 · 2022-10-17T14:58:13.000Z

I also asked this in the vips discussion and the vips author responded.
There may be some inspiration here:
libvips/libvips#3097 (reply in thread)

Notably, he's sampling part of the background to use for the removal.

It may also be useful to include a bounding box option to only consider a portion of the image. This could be done manually of course using crop but it may be useful in some use cases.

Also, he calls trimming off the extra pixels 'erosion' whereas I called it 'ablation'. I would say to adopt his term as he knows what he's talking about. :)

Answer 5 · 2022-10-17T15:18:45.000Z

What I don't understand is how Nx represents transparency in this case.

The way I was using it avoided the need for transparency as it's merging two images and choosing which image's pixel(s) to use based on the tensor which is just loaded with boolean values.

0-> pick the first image's pixel
1-> pick the second image's pixel

The mindset you need for Nx is pretty different as ideally whatever operation you're performing is happening on every pixel at once.

StbImage allows for opening PNG and GIF so that would do the fourth channel but that's not the approach I took. If you went that way you'd have to do some special handling as Nx doesn't know what an image is unlike vips. You could take a similar approach as I did by creating a mask based on the state of the alpha. If alpha wasn't completely full or empty you'd run into extra complications and that would require figuring out how much of each pixel to take.

Other image libs I've used treat the image like a large matrix of pixels you iterate through. iirc this is the way that opencv, numpy, etc treat the image. I think numpy also has functions that access the GPU so you just have to know what's what.

Answer 6 · 2022-10-17T19:58:28.000Z

There may be some inspiration here:
libvips/libvips#3097 (reply in thread)

Oh that's so cool. John is really engaged and always helpful. I'll update my code - thanks for the pointer!

Answer 7 · 2022-10-17T19:59:08.000Z

The way I was using it avoided the need for transparency as it's merging two images and choosing which image's pixel(s) to use based on the tensor which is just loaded with boolean values.

Ah, that make sense, thanks for the clarification.

Answer 8 · 2022-10-17T21:06:04.000Z

Updated Image.chroma_mask/2 to use @jcupitt's vastly superior strategy. The documentation now reads:

Chroma key an image.

Chroma keying is the process of removing a background color
from an image resulting in a foreground image that may
be composited over another image.

If the image already has an alpha band then the
image is flattened before adding the image mask
as a new alpha band.

Arguments

image is any t:Vix.Vips.Image.t/0.
options is a keyword list of options.

Options

:color is an RGB color which represents the the
chroma key to be masked. The color can be an
integer between 0..255, a three-element list of
integers representing an RGB color or an atom
representing a CSS color name. The default is
:auto in which the average of the top left 10x10
pixels of the image is used.
:thresholdis a positive integer to indicate the
threshold around :color when calculating the mask.
The default is 20.

Answer 9 · 2022-10-17T21:31:04.000Z

Can we keep the :greater_than and ;less_than options? That seems more tunable to me. I think threshold is a great basic strategy but being able to set r, g, b independently would be really useful.

Answer 10 · 2022-10-17T22:08:18.000Z

Sounds reasonable. Will work on that tonight.

Answer 11 · 2022-10-17T23:01:15.000Z

Done. The documentation now reads:

Chroma key an image

Chroma keying is the process of removing a background color
from an image resulting in a foreground image that may
be composited over another image.

If the image already has an alpha band then the
image is flattened before adding the image mask
as a new alpha band.

Arguments

image is any t:Vix.Vips.Image.t/0.
options is a keyword list of options.

Options

There are two masking strategies available: the
thresholding strategy (default) and the color
range strategy.

Threshold strategy

:color is an RGB color which represents the the
chroma key to be masked. The color can be an
integer between 0..255, a three-element list of
integers representing an RGB color or an atom
representing a CSS color name. The default is
:auto in which the average of the top left 10x10
pixels of the image is used.
:thresholdis a positive integer to indicate the
threshold around :color when calculating the mask.
The default is 20.

Color range strategy

:greater_than is an RGB color which represents the upper
end of the color range to be masked. The color can be an
integer between 0..255, a three-element list of
integers representing an RGB color or an atom
representing a CSS color name.
:less_than is an RGB color which represents the lower
end of the color range to be masked. The color can be an
integer between 0..255, a three-element list of
integers representing an RGB color or an atom
representing a CSS color name.

Answer 12 · 2023-01-28T17:11:36.000Z

With the addition of Image.dilate/2 and Image.erode/2 I think much of your original suggestions have been implemented,. From your original message, Image now has:

Image.chroma_key/2
Image.if_then_else/3 to perform conditional processing (like merging images using a boolean image as a discriminant)
Image.erode/2 to erode edge pixels
Image.dilate/2 to expand edge pixels

For now a "programmatic drawing" capability, beyond Image.Draw functions is out of scope.

I will close this issue - but by all means open a new issue(s) if you have any other suggestions, ideas or .... issues!