18 pages

Please download to get full document.

View again

of 18
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
     Appl. Sci.   2019  ,  9  , 1009; doi:10.3390/app9051009  Article Image Shadow Removal Using End-to-End Deep Convolutional Neural Networks Hui Fan 1,2 , Meng Han 1,2  and Jinjiang Li 1,2, * 1  School of Computer Science and Technology, Shandong Technology and Business University, Yantai 264005, China; 2  Co-innovation Center of Shandong Colleges and Universities: Future Intelligent Computing, Yantai 264005, China; *  Correspondence: Received: 9 January 2019; Accepted: 5 March 2019; Published: 11 March 2019 Abstract: Image degradation caused by shadows is likely to cause technological issues in image segmentation and target recognition. In view of the existing shadow removal methods, there are problems such as small and trivial shadow processing, the scarcity of end-to-end automatic methods, the neglecting of light, and high-level semantic information such as materials. An end-to-end deep convolutional neural network is proposed to further improve the image shadow removal effect. The network mainly consists of two network models, an encoder–decoder network and a small refinement network. The former predicts the alpha shadow scale factor, and the latter refines to obtain sharper edge information. In addition, a new image database (remove shadow database, RSDB) is constructed; and qualitative and quantitative evaluations are made on databases such as UIUC, UCF and newly-created databases (RSDB) with various real images. Using the peak signal-to-noise ratio (PSNR) and the structural similarity (SSIM) for quantitative analysis, the algorithm has a big improvement on the PSNR and the SSIM as opposed to other methods. In terms of qualitative comparisons, the network shadow has a clearer and shadow-free image that is consistent with the srcinal image color and texture, and the detail processing effect is much better. The experimental results show that the proposed algorithm is superior to other algorithms, and it is more robust in subjective vision and objective quantization. Keywords: end-to-end; encoder–decoder; convolutional neural network; shadow removal 1. Introduction Today’s world is an era of rapid information development. With the continuous upgrading of multimedia technology and electronic products, human beings are exposed to more multimedia information such as text, images, and audio in their daily life. Most of this multimedia information is obtained through people’s visual systems. We define the multimedia information obtained through this channel as an image. However, when the image is acquired, it is susceptible to various conditions that generally degrade the quality of image. Shadows are one of them. Shadows are a phenomenon of quality degradation caused by imaging conditions. These conditions lead to missing or interfered information reflected by the target; therefore, the interpretation accuracy of the image is sharply reduced, and it affects various quantitative analyses and applications of the image. Shadow detection and removal [1–4] is one of the most basic and challenging problems in computer graphics and computer vision. The removal of shadow images are important pre-processing stages in computer vision and image enhancement. The existence of shadow not only affects the visual interpretation of the image, but also affects the analysis of the image and the subsequent processing results. For instance, a darker area (caused by shadows) introduces incorrect   Appl. Sci.   2019  ,  9  , 1009 2 of 18 segments in image segmentation [5,6]; radiation changes (caused by shadows) reduce the performance of the target recognition [7] system; and the presence of shadows reduces the performance of the target tracking [8] system. Therefore, it is necessary to perform shadow detection and analysis on the image in order to reduce or eliminate the influence of the image shadow, it also increases the visual authenticity and physical authenticity of the image through editing and processing. Shadows are generated by different illumination conditions, and the shadow image  s  I   is represented by the multiple of the shadow-free image  sf    I   and the shadow ratio    (pixel level  ) according to [9–11], as shown in (1).  s sf    I I      .  (1)Shadows can be divided into two categories according to different causes [12]: one type is self-shadow, which is produced by the occluded object itself not being illuminated by the light source; the other type is projection (cast shadow), which is caused by an object blocking the light source while producing shadows on the surface of other object. The projection is further divided into an umbra zone and penumbra zone, wherein the umbra zone is formed by completely direct ray-blocking, and the penumbra zone is partially blocked by light. Given a single shadow image, shadow removal aims to generate a high-quality, shadow-free image with the srcinal shadow image area restored to the shadow-free image in texture, color, and other features. Existing methods of removing shadow regions typically involve two steps: shadow detection and shadow removal. Firstly, shadow detection is used to locate the shadow area [13,14] or the user manually marks the shadow area [9,15,16], then the model is constructed to rebuild both and to remove shadows. However, shadow detection is an extremely challenging task. Traditional physics-based methods can only be applied to high-quality images, while statistical learning-based methods rely on features that users have to manually label [15,17]. With the development of neural networks, convolutional neural networks (CNNs) [14,18,19] are used to learn the features of shadow detection. CNNs overcome the shortcomings of traditional methods that require high-quality images and manual annotation features. However, they still limited to small network architectures, owing to the shortage of training data. Similarly, even if the shadow area is known, it is still a challenge to remove it. At the same time, it can be clearly understood that the effect of shadow detection will seriously affect the result of shadow removal. If the effect of shadow detection is not very good, it is impossible to obtain a high-quality, shadow-free picture in the subsequent removal process. Inspired by the classical adaptive algorithm, Calvo-Zaragoza and Gallego [20] used the convolutional automatic encoder for image binarization, which is an end-to-end network structure and shows good performance in image binarization processing. Image binarization can be formulated as a two-class classification task at the pixel level, which is similar to the shadow removal task. At the same time, one of the biggest drawbacks of the current image shadow removal is the lack of the end-to-end channel. Inspired by this, the purpose of this paper is to design an end-to-end and fully automated network structure to remove image shadows. Thereby, through the ex-trained network structure, the input of a single shadow image can be transformed to a high-quality, shadow-free image, which provides a good basis for subsequent target detection and target tracking. Different from traditional methods, this paper applies the most prominent features of the deep encoding–decoding model framework of end-to-end and convolutional neural network (CNN) design. Inputting a single shadow image, a shadow mask is obtained. It can describe the global structure of the scene and the high-level semantic information of shadow characteristics. Using the relationship between the shadow ratio and shadow image, Equation (1) realizes an end-to-end fully automated algorithm to restore the shadow-free image. At the same time, this paper designs a small network structure for refinement in order to better handle local information, thus predicting a more accurate alpha mask and sharper edge information.   Appl. Sci.   2019  ,  9  , 1009 3 of 18 2. Related Work In recent years, many scholars have analyzed the characteristics of shadows, established shadow generation models, and proposed a number of algorithms for shadow detection and shadow removal. Analyzing the research results comprehensively, it is not difficult to find that the shadow removal algorithm mostly follows these principles: detecting the shadow or marking it manually, then creating models to remove the shadow. Gong and Cosker [15] proposed a statistical-based, interactive shadow removal method for shadows in extreme conditions with different shadow softness, irregular shadow shapes, and existing dark textures. First, using the dynamic learning method, two users roughly marked the shadow and non-shadow areas in the image, meaning the shadow area was highlighted and the non-shadow area was resisted, so the shadow area could be extracted easily. On this basis, the model was constructed for shadow removal. Gong and Cosker used a rough hand-marked form to detect shadows while sacrificing finer shadows and a full range of simpler user input autonomy. Gryka [16] and Yu [21] et al. also proposed to manually mark the shadows in an interactive manner to achieve the purpose of shadow removal. Different from   Gong and Cosker, Yu [21] proposed a color line-based artificial interactive shadow removal algorithm, which required users to provide shaded areas, shadow-free areas, and areas with similar textures changing significantly in brightness. In the work of Gryka [16], they used the unsupervised regression algorithm to remove the shadows through inputting the shaded area as processed by user, and mainly focused on the soft shadows in the real scene. Finlayson [22] proposed a shadow removal algorithm based on Poisson’s equation, which had a good effect on a considerable part of the image. However, there was no consideration of the influence of ambient illumination and material changes, thus resulting in poor texture restoration in shadow areas. Generally, the brightness contrast between the shadow area and the non-shadow area in the image are relatively large while the features such as texture and color are similar. Therefore, the shadow area and the non-shadow area have similar gradient fields. Liu et al. [23] used this feature to propose an image shadow removal algorithm based on gradient domain, which solved some shortcomings of Poisson equation, but it had a poor performance on discontinuous or small shadow regions. Xiao et al. [24] proposed a shadow removal algorithm based on Subregion Matching Illumination Transfer. This method takes into account the different reflectivity characteristics of different materials, which makes the processing of shadow regions in complex scenes better. On the one hand, Zheng [25] proposed a projection shadow detection and elimination algorithm  based on the combination of texture features and luminance chrominance chroma (YUV) color space. Firstly, a moving object was detected using a pixel-based adaptive segmenter (PBAS) algorithm that resisted shadows. In addition, a portion of the projected shadow candidate area was obtained by texture detection. Then, another projected shadow candidate area was obtained by shadow detection that was based on the luminance chrominance chroma (YUV) color space. Finally, the two portions of the projected shadow candidate regions were filtered and merged by the shadow features. In the work of Murali [26], they proposed a simple method for detecting and removing shadows from a single red green blue (RGB) image. The shadow detection method was selected based on the average value of the RGB images in the A and B planes of the CIELab color model (LAB) equivalent image, and the shadow removal method was based on the recognition of the amount of light irradiated on the surface. The brightness of the shaded areas in the image was increased, and then the color of the surface portion was corrected to match the bright portion of the surface. Tfy et al. [27] trained a kernel least-squares support vector machine (LSSVM) to separate shadow and non-shadow regions. It was embedded it into the Markov random field (MRF) framework and pairwise contextual cues were added to further enhance the performance of the region classifier so the detection of shadow area was realized. A shadow removal method based on area re-lighting is raised based on this. On the other hand, for the umbra and penumbra regions in projection, researchers have proposed an image shadow removal algorithm based on convolutional neural network structure (CNN) [14,18] or intensity domain [9,11] in deep learning. Both Khan [14] and Shen [18] used convolutional neural networks (CNN) for learning. The former used multiple CNN structures for fusion applications and conditional random field (CRF) for image shadow region predictions, while   Appl. Sci.   2019  ,  9  , 1009 4 of 18 the latter applied structured label information to the classification process. The two methods were different in the process of extracting and removing shadows. The former used the Bayesian framework, while the latter applied the least-squares method of optimization to perform the final recovery optimization. Arbel et al. [9] used the Markov random field to determine the penumbra region, and then created a smoothing model in the shadow region to construct a shadow mask. Wu [11] extracted the shadow mask by considering features such as color shift, texture gradation, and shadow smoothness. Meanwhile, it reserved the texture of the shadow area during the extraction process. Analyzing and summarizing the above methods, we find that the current method of shadow removal can either restore the texture of the shadow area effectively, or neglect the influence of environment and material of the object. Most methods are interactive rather than fully automated, which greatly reduces the efficiency of use. Therefore, the purpose of this paper is to propose an end-to-end, fully automatic network structure to remove image shadow. With the development of artificial intelligence in recent years, some researchers have established fully automatic models for image shadow removal [28–31]. A shadow removal algorithm based on generative adversarial networks (GANs) is proposed in the work [30], which uses two GANs for joint learning in order to detect and remove shadows. Although this allowed fully automatic shadow detection using GANs, it relied on the shadow detection result of the previous generation against the network in the shadow removal stage, and it did not perform well when dealing with small shadow areas and shadow edges. In the work of Yang [28], a three-dimensional inherent image restoration method based on bilateral filtering and a two-dimensional intrinsic image is proposed. Two-dimensional srcinal images are derived from the srcinal image, and then the shadow restoration method of the three-dimensional srcinal image is proposed by deducing two-dimensional srcinal images and bilateral filtering. Finally, an automatic shadow removal method for a single image is realized. However, since the reconstruction of the image may result in changes in the non-shaded area, a high-quality image consistent with the shadow-free image generally cannot obtained. Therefore, this paper aims to explore the end-to-end deep convolutional neural network for image shadow removal. An encoding–decoding neural network is used, and a small neural network with a two-layer neural network structure is used to train learning. Then, the srcinal image (shadow image) is inputted into the trained network structure to realize fully automatic shadow removal. Finally, a high-quality, shadow-free image is obtained. 3. End-to-End Convolutional Neural Network This article aims to solve the shadow removal problem using an end-to-end deep convolutional network structure, which is named RSnet. It aims to learn a mapping function between the shadow image and its shadow matte. The RSnet model mainly includes two parts: an encoding–decoding stage and an elaboration stage. Specifically, we used a network structure of a deep convolutional encoder–decoder that took an image as the input, and it took into account penalties for the shadow mask factor (   ) that predicted loss and penalties for new composition loss. At the same time, a refined, small network structure was introduced in order to consider multi-context scenarios as well as process local information and edge information, etc. The related information of the two networks, the training of RSnet model, and how to get the shadow-free image will be described in detail in the following chapters. 3.1. Encoding–Decoding Phase The deep encoding–decoding network structure is a very common model framework in deep learning. The encoder and decoder parts can be any text, voice, image, or video data. The model can use CNN, recurrent neural network (RNN), long short-term memory (LSTM), gated recurrent unit (GRU), and so on. One of the most notable features of the encoder–decoder framework is that it is an end-to-end learning algorithm. It has been widely used in various fields and has succeeded in many other computer vision tasks, such as image segmentation [32], object edge detection [33], and image restoration [34]. The encoding–decoding network structure in this paper consisted of two parts, one
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks