A Study on Generating Stained Glass Animation from Video Clip

This paper presents a way to generate a stained glass animation from a given video input. We ﬁrst obtain low-frequency components from input video frames to get rid of textures which cause over-segmentation in image segmentation. Then we segment input video volume by employing mean-shift video segmentation. The segmented regions are too large to be architecturally stable, subdivision is required. To sub-divide regions temporally coherent, we obtain the panoramic image from the segmented regions, and sub-divide them by using weighted Voronoi diagram. To render these sub-divided regions as stained glass pieces, we ﬁnd the best match glass piece in the real stained glass piece image database and transfer its color to that region. Finally, we generate lead came at the boundary of regions, which results in a temporally coherent stained glass animation.


Introduction
Stained glass, which means that glasses are colored, has been produced since ancient times. Initially, stained glass was crafted for the need of building the high windows of the Gothic church. Through periods of changing architectural trends, stained glass not only became a prominent feature of the large church and the royal palace but also was the trend of interior decoration of the aristocracy and gradually more popular so far.
Stained glass is to connect all the pieces of glass with several shapes, sizes, colors, different nature to form a unified whole for a design idea. The product features art shows lively spirits of light and color.
Stained glass techniques were widespread around the world, especially Europe, America. However, the development of technology has changed somewhat traditional stained glass techniques. A few tools or tricks allow the shape to form products such as stained glass, but actually just a simulation. Besides the trend of industrialization of handcrafted items of high artistic, traditional stained glass techniques were extracted. The cause of the discrepancy caused in understanding of stained glass in some places where stained glass is still fresh.
After the start of studies related to Non-Photorealistic Rendering, this field now has conducted extensive research. However, there are few studies involving stained glass. Also, no study regarding stained glass animation has not been published. Therefore, in this study, we proposes a method involving generating stained glass animation in order to solve the difficulty in creating stained glass animation.
Before generating stained glass effect on each glass pieces, we need to obtain glass pieces from input video. Obtaining glass pieces in each frame/image are easy, but in video, we need to maintain the temporal coherence of glass pieces between all frames/images. In addition, generating stained glass effect also require to be temporally coherent animation. Both of them make a difficulty in creating stained glass animation and no study solving this issue has not been published yet.
In order to solve the difficulty in creating stained glass animation which maintain the temporal coherence of stained glass pieces between all frames/images, we propose a method described as follows: a) We first segment input video by proposing a video mean-shift segmentation using optical flow which estimates the motion of points in video. b) After getting segmented video from video segmentation step, we continue dividing all segments having big regions in frames into segments with smaller regions in frames. In order to do that, for each segment we need to divide, we first divide panoramic scene which computed from all layers in which represent this segment in all frames, we then obtain the divided region of each layers from the divided panoramic scene. c) Finally, we generate stained glass effect on each segment obtained from previous step. To do that, we extend image-stained glass rendering method to animation. Video decomposition is first applied to input video. And optical component is selectively applied to match the appearance of real stained glass.
We note that a preliminary version of this study was presented in [1].

Related Work
Many techniques model non-photorealistic modes, but there are few studies related to stained glass windows, especially no study involved stained glass animation is published.
The first model of stained glass technique using non-photorealistic rendering is a study for creating stained glass from 2D images by David Mould [2]. In this study, creating a glass pieces by using a segmentation technique based on the color of each area of input image. Then all regions are smoothed by performing the erosion and dilation operators from mathematical morphology. Also, big regions are subdivided and too small regions are eliminated. Next, the color of stained glass for each region is equal to the heraldic color which has the nearest distance from the average color of this region in input image. Finally, an image based stained glass is achieved after computing displacement map, representing the leading and imperfections in the glass surface.
Stephen Brooks [3] created image based stained glass from a real stained glass images database by proposing a method to generate stained glass. In this study, segmentation technique is also applied to input image and stained glass image. For each region in segmented image, find the best matching region in the stained glass image, then colorize this region to match the best matching region.
Stained glass filter [4] which has been used in Photoshop is based on Voronoi diagram to create glass pieces. Voronoi diagram is used to dividing a space into a number of regions. Voronoi sites are first created by random, then voronoi cell groups a set of points which have the shortest distance from the same voronoi site. This method is easily used, but glass pieces are created without considering the image content, so these glass pieces are not appropriate.

Methods
The process that proposed to generate stained glass animation in this paper mainly consists of two parts: First, it creates the layout of glass pieces which are composed of stained glass from input video (Section 3.1), then it renders them by using real stained glass example images (Section 3.2).

Generating temporally coherent stained glass layout
The basic unit of stained glass is glass pieces fixed by lead cames. These glass pieces are cut to better represent the shape of the subject to be represented. As this study aims to produce stained glass animations that maintain temporal coherence from the input video, we should let the glass pieces, which are the basic units of stained glass, be cut and combined temporally coherently to form a stained glass layout. To achieve this, we describe the method to create a temporally coherent stained glass layout in this section .   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65

Video segmentation that avoids over-segmentation
Previous studies [2,3,5] that produce stained glass from images use image segmentation methods to create the glass pieces. However, many tiny glass pieces that are generated due to over-segmerntation may be produced in texture area, if regions obtained by segmenting input image is uses as glass pieces. To avoid this, we propose a method to segment the input video using low frequency components only. We separate the high-frequency components from the input video by using the image decomposition method [6]. Image decomposition is a technique that finds local extremas from signals in the input image, creates the envelopes by using the extremas, and captures the mean signal and oscillation of the envelopes, separating the texture from the individual edges. We employ it to decompose the frame I i ∈ I of the input video I into multiple scales as follows: here, D j is the j-th finest local oscillations, M j is the mean, and p denotes each pixel of frame I i . We obtain M = {M i,m |i ∈ n} by performing frame-wise image decomposition for the every input frames. As shown in Figure 1, the texture which causes over-segmentation is eliminated in a low-frequency component image M .
We then apply mean-shift segmentation [7] on M , and obtain large regions which correspond to glass pieces. Mean-shift is a technique to seek local maxima of density in the feature space and widely used in image segmentation. If image-based mean-shift segmenation is frame-wisely applied to input video frames, incoherent regions are generated due to the lack of connectivity between frames. To prevent this, mean-shift video segmentation [8] performs segmentation in video volume doamin, consequently produces temporally coherent segments. In this paper, we obtain segmented regions from M by using mean-shift video segmentation, as shown in Figure 2. At this time, even   1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 though we frame-wisely decompose video frames without considering the connectivity between frames, it does not cause the temporal coherence problem, because video segmentation is applied to the low-frequency components of M .

Temporally coherent subdivision of large region
Although the segmented regions we obtain are not over-segmented, some regions are too large to be used as glass pieces. Generally, the size of each glass piece is not too large to be architecturally stable in real stained glass works. To achieve this, large segments obtained in Section 3.1.1 need to be proper-sized. A recent study for rendering stained glass [5] proposed an approach for re-segmenting such regions to obtain regular-size regions. Similarly, we sub-divide large segments as well. However, this subdivision must be performed while considering the temporal coherence between video frames to create coherent glass pieces in animation. If the regions are sub-divided frame-wisely, the temporal coherence will be insufficient between the boundaries of sub-divided regions between adjacent two frames. Moreover, due to moving objects, the regions that are being occluded and being appeared from occluded part are able to amplify the difference between sub-divided regions in frames. Therefore, to overcome this, we take an approach that sub-divides panoramic image which is generated by stitching regions in frames with considering motions.
We first obtain a panoramic image which is synthesized by stitching each segmented region between frames, we then sub-divide it into several small regions if its size is bigger than user-given predetermined size. To generate panoramic images, we extract the feature points between the regions in two adjacent frames by using the ASIFT algorithm [9] which is a fully affine invariant feature detector.
A feature point f i,k is in the frame I i , and its corresponding point f i+1,k is in I i+1 .
After obtaining feature point pairs between every adjacent two frames, we deform frame I i+1 to make it similar to I i by employing moving least squares [10], and obtain the deformed frame I i+1 : Here, D() is an image deformation function using feature pairs F . In our work, we do not need to determine the exact color of position of object in panoramic image from all relevant position of object in layers because our target is to segment panoramic image from objects having similar color. Figure 3 shows regions in frames and synthesized panoramic image. We then sub-divide the panoramic regions of which the size is larger than predefined threshold value. To achieve this, we employ a multiplicative weighted voronoi diagram [11] which is defined when the distance between points is multiplied by positive weight. First, we randomly generate a number of Voronoi sites. At this time, every sites has its own weight. Then we group a set of points which have the shortest distance from the same Voronoi site. The difference of multiplicative weighted voronoi diagram from classic Voronoi diagram is to create voronoi cells with curve of edges which are similar to the shape of glass pieces in real stained works. Figure 4(a) shows an artistically sub-divided panoramic image generated by using multiplicative weighted Voronoi diagram. From the sub-divided panoramic scene, we obtain artistically sub-divided regions in each frame in which obtained region maintains the temporal coherence in entire video, as shown in Figure 4(b) -4(d).

Stained Glass Rendering
Before rendering stained glass pieces obtained in Section 3.1, we find the best matching regions in the database containing real stained glass images ( Figure 5).
Brooks [3] has considered all the color and texture of the image to find the best matching regions. In this study, we choose the color from low-frequency component of input video to find the best matching region. In [3], the methods for computing 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65  color distance and texture distance are also used to find the best match, but we find the best match for entire video. Through finding the best match, real stained glass example images are used as a reference stained glass for transferring a region of video into stained glass style.
We then convert the color of each region into the color of the stained glass example which is retrieved as the best match to mimic the color of real stained glass works. To achieve this, we employ Reinhard's color transfer method [12] which is broadly used in transferring colors between images. Figure 6(a) shows the color transfer result.
Even though we applied color transfer from the region of real stained glass into corresponding region in base frames, the result still requires the lighting effect which is one of key features that make stained glass gorgeous. To give this effect to the result frames, we employ the glass filter introduced in [3] with adding Perlin noise [13] and generating many small facets of color in each region. Figure 6(b) shows the result after the lighting effect.

Results and Discussion
We have experimented on various input videos to examine different results. Input videos are first segmented with different parameters to compare the results for choosing a best result of each video that maintain the temporal coherence of segments. Also, extracting the motion flow by applying optical flows [14] is very important to estimate the movement of objects between frames in video and used to segment input video by applying mean-shift algorithm. The parameters for video segmentation using optical flow and their explanations are listed in Figure 7. In our experiment, we tested with various different parameters to compare the results and we chose D r equal to 4, D s equal to 8, t equal 5 and M is equal to 80. In addition, to divide segmented video by dividing panoramic frames, it takes about 2-3 minutes for a video with 5-seconds and the resolution of 400 × 168 pixels. Regarding to generating stained glass effect, we used various real stained glass sources to obtain the result as well as we can. In our implementation, we took about 3-4 minutes. Also, we have used a CPU of 4.1 GHz core i5-9400 CPU PC with 16GB memory. Moreover, the running time depends on the size of video. Figure 8(a) shows the result of segmentation which maintain the temporal coherence between regions in continuous frames. The result from video segmentation does not bring about desired glass pieces that are nearly same with both the shape and size of real stained glass pieces. To obtain desired glass pieces, we performed sub-dividing operation to divide big regions and complex region into desired regions as glass pieces. Figure 8(b) shows the result after sub-dividing operation on these regions having the shape and size as original desire. Also, the result satisfies the requirement of temporal coherence of region by region between frame by frame. Figure 9 shows the result of stained glass rendering process. We first obtained segments from the input video by mean-shift segmentation using optical flow. Then all big segments were divided to match the shape and size of real stained glass. Before generating stained glass effect on each segment, video decomposition was applied to input video to obtain base layers. Next, we performed a color transfer 1  2  3  4  5  6  7  8  9  10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60  61  62  63 64 65 from real stained glass database to base layers. Finally, we generated lead came after adding stained glass effect. Figure 11 shows a stained glass animation from this study. The figure shows the movement of several regions (stained glass pieces) of video and the temporal coherence of regions is basically kept. Figure 10 shows some frames of the resulting videos after generating stained glass animation from given videos.

Conclusion
This paper presents a method for generating stained glass animation from a given video. To achieve this, we segment video frames into coherent regions by using meanshift segmentation. We then sub-divide large regions into smaller sub-regions. To obtain temporally coherent regions, we merge each region in frames to a panoramic image, and sub-divide it by using weighted Voronoi diagram. To render these subdivided regions into stained glass pieces, we find the best matching glass piece from example stained glass piece database, and transfer its color to the region. Finally, we draw lead came on the boundary of regions, consequently obtain temporally coherent stained glass animation of input video.