TencentARC/GFPGAN

Image blending problem while caching the gfpgan model

Opened this issue · 8 comments

I have created an API for Real-ESRGAN using FastAPI, and it is working properly for multiple user requests. However, when I am initially loading the models (Real-ESRGAN and GFPGAN) using lru_cache (functools) to decrease the inference time, I am encountering following two errors during execution.

1. Sometimes I have getting faces of one user request mixed up with another user request.

image

2. In some requests, I have getting following error.

Traceback (most recent call last):
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/middleware/errors.py", line 164, in _call_
   await self.app(scope, receive, _send)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/middleware/exceptions.py", line 62, in _call_
   await wrap_app_handling_exceptions(self.app, conn)(scope, receive, send)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
   raise exc
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
   await app(scope, receive, sender)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/routing.py", line 758, in _call_
   await self.middleware_stack(scope, receive, send)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/routing.py", line 778, in app
   await route.handle(scope, receive, send)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/routing.py", line 299, in handle
   await self.app(scope, receive, send)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/routing.py", line 79, in app
   await wrap_app_handling_exceptions(app, request)(scope, receive, send)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/_exception_handler.py", line 64, in wrapped_app
   raise exc
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/_exception_handler.py", line 53, in wrapped_app
   await app(scope, receive, sender)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/routing.py", line 74, in app
   response = await func(request)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/fastapi/routing.py", line 299, in app
   raise e
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/fastapi/routing.py", line 294, in app
   raw_response = await run_endpoint_function(
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/fastapi/routing.py", line 193, in run_endpoint_function
   return await run_in_threadpool(dependant.call, **values)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/starlette/concurrency.py", line 42, in run_in_threadpool
   return await anyio.to_thread.run_sync(func, *args)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/anyio/to_thread.py", line 56, in run_sync
   return await get_async_backend().run_sync_in_worker_thread(
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/anyio/_backends/_asyncio.py", line 2134, in run_sync_in_worker_thread
   return await future
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/anyio/_backends/_asyncio.py", line 851, in run
   result = context.run(func, *args)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/api.py", line 102, in process_image
   intermediate_image = hd_process(img_array)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/api.py", line 58, in hd_process
   , , output = face_enhancer.enhance(img_array, has_aligned=False, only_center_face=False, paste_back=True)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context
   return func(*args, **kwargs)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/gfpgan/utils.py", line 144, in enhance
   restored_img = self.face_helper.paste_faces_to_input_image(upsample_img=bg_img)
 File "D:/Image Super Resolution/Models/Real-ESRGAN/env/lib/site-packages/facexlib/utils/face_restoration_helper.py", line 291, in paste_faces_to_input_image
   assert len(self.restored_faces) == len(self.inverse_affine_matrices), ('length of restored_faces and affine_matrices are different.')
 AssertionError: length of restored_faces and affine_matrices are different.

This is the small code snippet from my api:

@lru_cache()
def loading_model():
	real_esrgan_model_path = "D:/Image Super Resolution/Models/Real-ESRGAN/weights/RealESRGAN_x4plus.pth"
	gfpgan_model_path = "D:/Image Super Resolution/Models/Real-ESRGAN/env/Lib/site-packages/gfpgan/weights/GFPGANv1.3.pth"

	model = RRDBNet(num_in_ch=3, num_out_ch=3, num_feat=64, num_block=23, num_grow_ch=32, scale=4)
	netscale = 4

	upsampler = RealESRGANer(scale=netscale,model_path=real_esrgan_model_path,dni_weight=0.5,model=model,tile=0,tile_pad=10,pre_pad=0,half=False)
	face_enhancer = GFPGANer(model_path=gfpgan_model_path,upscale=4,arch='clean',channel_multiplier=2,bg_upsampler=upsampler)

	return face_enhancer


def hd_process(file):

	filename = file.filename.split('.')[0]
	save_path = os.path.join("temp_images", f"{filename}.jpg")

	content = file.file.read()
	with open(save_path, 'wb') as image_file:
	    image_file.write(content)

	img_array = cv2.imread(save_path, cv2.IMREAD_UNCHANGED)

	face_enhancer = loading_model()

	with torch.no_grad():
		_, _, output = face_enhancer.enhance(img_array, has_aligned=False, only_center_face=False, paste_back=True)

	output_rgb = cv2.cvtColor(output, cv2.COLOR_BGR2RGB)

	del face_enhancer
	torch.cuda.empty_cache()

	return output_rgb

So, when I went through the code of GFPGAN, I found that GFPGANer contains an "enhance" function which calls the "facexlib" library for face enhancement and face-related operations. The "enhance" function clears all list variables of "facexlib" after every execution by reinitializing them. This type of behavior is only observed when I load the model into the cache; otherwise, it works properly. Is there any way to cache the model and also resolve this error?

well you can move following lists from facexlib/face_restoration_helper.py to enhance function in gfpgan/utils.py, it will solve the problem because every request will have its own list and wont be mixed again.

self.all_landmarks_5 = []            
self.det_faces = []                  
self.affine_matrices = []            
self.inverse_affine_matrices = []    
self.cropped_faces = []              
self.restored_faces = []             
self.pad_input_imgs = []      

but the problem I am getting after doing it is that the quality is low when processing concurrent requests, any ideas ?

Thanks for the answer, I have already implemented this logic in my code. And I am not facing any quality issue during concurrent requests. If possible, can you tell me briefly about which api framework you are using and how you are able to implement concurrent requests in that framework, so that I can get more idea about the problem.

I am using waitress, and the issue is that when I send 2 requests at the same time the first works just fine but the output of second image is not good,

check the difference between the two images the first one is when send with another image simultaneously and the second one is when I send it alone

image

image

First of all, I have never used waitress for api development. But I can tell you some general point that you can check:

  1. Compare the input image and output image (First case from above) in terms of size and dimension, so that can you cross check whether the image enhancement process happening on the image or not.
  2. As per your description, I think waitress is not able to processes multiple requests, means there is some problem during handling the parallel requests.
  3. Try out test with different variety of images (Like single person image, small size image), so that you might find any clue from these outputs.

Also, have you worked on fastapi ever for api creation ? Actually I have used fastapi for this model but I don't know I am not able to achieve parallelism for more users. So, do you have any idea about this problem ?

nope, I never tried fast api sorry, when I send request from a single device it works perfectly, no matter how many faces in it, the problem only occur when it is working on multiple images at the same time, if I send 2 concurrent images with 1 face each, it works far better, kindly try 2 concurrent requests with multiple faces in the images and make sure they are different images.
How did you find out it is not running concurrently ? like the time it took or something else ?