오디오가이 :: 디지털처럼 정확하고 아날로그처럼 따뜻한 사람들
자유게시판

IEEE. 30 (2): 328-341. Doi:10.1109/TPAMI.2023.1166

페이지 정보

작성자 Britt
작성일

본문


Pixelwise stereo matching allows to perform real-time calculation of disparity maps by measuring the similarity of each pixel in one stereo image to each pixel within a subset in the https://sgmillerwebservices.com/ other stereo image. Given a rectified stereo image pair, for a pixel with coordinates ( x , y ) \displaystyle (x,y) the set of pixels in the other image is usually selected as ( x ^ , y ) \displaystyle \\hat x\geq x,\hat x\leq x+D\ , where D \displaystyle D is a maximum allowed disparity shift.[1]

A simple search for the best matching pixel produces many spurious matches, and this problem can be mitigated with the addition of a regularisation term that penalises jumps in disparity between adjacent pixels, with a cost function in the form

where D ( p , d p ) \displaystyle D(p,d_p) is the pixel-wise dissimilarity cost at pixel p \displaystyle p with disparity d p \displaystyle d_p , and R ( p , d p , q , d q ) \displaystyle R(p,d_p,q,d_q) is the regularisation cost between pixels p \displaystyle p and q \displaystyle q with disparities d p \displaystyle d_p and d q \displaystyle d_q respectively, for all pairs of neighbouring pixels N \displaystyle \mathcal N . Such constraint can be efficiently enforced on a per-scanline basis by using dynamic programming (e.g. the Viterbi algorithm), but such limitation can still introduce streaking artefacts in the depth map, because little or no regularisation is performed across scanlines.[4]

A possible solution is to perform global optimisation in 2D, which is however an NP-complete problem in the general case. For some families of cost functions (e.g. submodular functions) a solution with strong optimality properties can be found in polynomial time using graph cut optimization, however such global methods are generally too expensive for real-time processing.[5]

Algorithm[edit]

The idea behind SGM is to perform line optimisation along multiple directions and computing an aggregated cost S ( p , d ) \displaystyle S(p,d) by summing the costs to reach pixel p \displaystyle p with disparity d \displaystyle d from each direction. The number of directions affects the run time of the algorithm, and while 16 directions usually ensure good quality, a lower number can be used to achieve faster execution.[6] A typical 8-direction implementation of the algorithm can compute the cost in two passes, a forward pass accumulating the cost from the left, top-left, top, and top-right, and a backward pass accumulating the cost from right, bottom-right, bottom, and bottom-left.[7] A single-pass algorithm can be implemented with only five directions.[8]

The cost is composed by a matching term D ( p , d ) \displaystyle D(p,d) and a binary regularisation term R ( d p , d q ) \displaystyle R(d_p,d_q) . The former can be in principle any local image dissimilarity measure, and commonly used functions are absolute or squared intensity difference (usually summed over a window around the pixel, and after applying a high-pass filter to the images to gain some illumination invariance), Birchfield-Tomasi dissimilarity, Hamming distance of the census transform, Pearson correlation (normalized cross-correlation). Even mutual information can be approximated as a sum over the pixels, and thus used as a local similarity metric.[9] The regularisation term has the form

where P 1 \displaystyle P_1 and P 2 \displaystyle P_2 are two constant parameters, with P 1
for each pair of pixels p \displaystyle p and q \displaystyle q .[10]

The accumulated cost S ( p , d ) = ∑ r L r ( p , d ) \displaystyle S(p,d)=\sum _rL_r(p,d) is the sum of all costs L r ( p , d ) \displaystyle L_r(p,d) to reach pixel p \displaystyle p with disparity d \displaystyle d along direction r \displaystyle r . Each term can be expressed recursively as

where the minimum cost at the previous pixel min k L r ( p − r , k ) \displaystyle \min _kL_r(p-r,k) is subtracted for numerical stability, since it is constant for all values of disparity at the current pixel and therefore it does not affect the optimisation.[6]

The value of disparity at each pixel is given by d ∗ ( p ) = argmin d S ( p , d ) \displaystyle d^{*}(p)=\operatorname argmin _dS(p,d) , and sub-pixel accuracy can be achieved by fitting a curve in d ∗ ( p ) \displaystyle d^{*}(p) and its neighbouring costs and taking the minimum along the curve. Since the two images in the stereo pair are not treated symmetrically in the calculations, a consistency check can be performed by computing the disparity a second time in the opposite direction, swapping the role of the left and right image, and invalidating the result for the pixels where the result differs between the two calculations. Further post-processing techniques for the refinement of the disparity image include morphological filtering to remove outliers, intensity consistency checks to refine textureless regions, and interpolation to fill in pixels invalidated by consistency checks.[11]

The cost volume C ( p , d ) \displaystyle C(p,d) for all values of p = ( x , y ) \displaystyle p=(x,y) and d \displaystyle d can be precomputed and in an implementation of the full algorithm, using D \displaystyle D possible disparity shifts and R \displaystyle R directions, each pixel is subsequently visited R \displaystyle R times, therefore the computational complexity of the algorithm for an image of size W × H \displaystyle W\times H is O ( W H D ) \displaystyle O(WHD) .[7]

Memory efficient variant[edit]

The main drawback of SGM is its memory consumption. An implementation of the two-pass 8-directions version of the algorithm requires to store W × H × D + 3 × W × D + D \displaystyle W\times H\times D+3\times W\times D+D elements, since the accumulated cost volume has a size of W × H × D \displaystyle W\times H\times D and to compute the cost for a pixel during each pass it is necessary to keep track of the D \displaystyle D path costs of its left or right neighbour along one direction and of the W × D \displaystyle W\times D path costs of the pixels in the row above or below along 3 directions.[7] One solution to reduce memory consumption is to compute SGM on partially overlapping image tiles, interpolating the values over the overlapping regions. This method also allows to apply SGM to very large images, that would not fit within memory in the first place.[12]

A memory-efficient approximation of SGM stores for each pixel only the costs for the disparity values that represent a minimum along some direction, instead of all possible disparity values. The true minimum is highly likely to be predicted by the minima along the eight directions, thus yielding similar quality of the results. The algorithm uses eight directions and three passes, and during the first pass it stores for each pixel the cost for the optimal disparity along the four top-down directions, plus the two closest lower and higher values (for sub-pixel interpolation). Since the cost volume is stored in a sparse fashion, the four values of optimal disparity need also to be stored. In the second pass, the other four bottom-up directions are computed, completing the calculations for the four disparity values selected in the first pass, that now have been evaluated along all eight directions. An intermediate value of cost and disparity is computed from the output of the first pass and stored, and the memory of the four outputs from the first pass is replaced with the four optimal disparity values and their costs from the directions in the second pass. A third pass goes again along the same directions used in the first pass, completing the calculations for the disparity values from the second pass. The final result is then selected among the four minima from the third pass and the intermediate result computed during the second pass.[13]

In each pass four disparity values are stored, together with three cost values each (the minimum and its two closest neighbouring costs), plus the disparity and cost values of the intermediate result, for a total of eighteen values for each pixel, making the total memory consumption equal to 18 × W × H + 3 × W × D + D \displaystyle 18\times W\times H+3\times W\times D+D , at the cost in time of an additional pass over the image.[13]

See also[edit]

3D reconstructionComputer stereo visionStructure from motion
References[edit]

^ a b Hirschmüller (2005), pp. 807-814^ Hirschmüller (2011), pp. 178-184^ Spangenberg et al. (2013), pp. 34-41^ Hirschmüller (2005), p. 809^ Hirschmüller (2005), p. 807^ a b Hirschmüller (2007), p. 331^ a b c Hirschmüller et al. (2012), p. 372^ "OpenCV cv::StereoSGBM Class Reference". Archived from the original on 2019-10-05.^ Kim et al. (2003), pp. 1033-1040^ Hirschmüller (2007), p. 330^ Hirschmüller (2007), p. 332-334^ Hirschmüller (2007), p. 334-335^ a b Hirschmüller et al. (2012), p. 373Hirschmüller, Heiko (2005). "Accurate and efficient stereo processing by semi-global matching and mutual information". IEEE Conference on Computer Vision and Pattern Recognition. pp. 807-814.Hirschmuller, Heiko (2007). "Stereo processing by semiglobal matching and mutual information". IEEE Transactions on Pattern Analysis and Machine Intelligence. IEEE. 30 (2): 328-341. doi:10.1109/TPAMI.2007.1166. PMID 18084062.Hirschmüller, Heiko (2011). "Semi-global matching-motivation, developments and applications". Photogrammetric Week. Vol. 11. pp. 173-184.Hirschmüller, Heiko; Buder, Maximilian; Ernst, Ines (2012). "Memory efficient semi-global matching". ISPRS Annals of the Photogrammetry, Remote Sensing and Spatial Information Sciences. 3: 371-376. Bibcode:2012ISPAn..I3..371H. doi:10.5194/isprsannals-I-3-371-2012.Kim, Junhwan; Kolmogorov, Vladimir; Zabih, Ramin (2003). "Visual correspondence using energy minimization and mutual information". Proceedings of the Ninth IEEE International Conference on Computer Vision. pp. 1033-1040.Spangenberg, Robert; Langner, Tobias; Rojas, Raúl (2013). "Weighted semi-global matching and center-symmetric census transform for robust driver assistance". International Conference on Computer Analysis of Images and Patterns. pp. 34-41.
External links[edit]

"Stereo vision". Archived from the original on 2019-01-01.

관련자료

등록된 댓글이 없습니다.

+ 뉴스


+ 최근글


+ 새댓글


통계


  • 현재 접속자 235 명
  • 오늘 방문자 3,562 명
  • 어제 방문자 4,866 명
  • 최대 방문자 15,631 명
  • 전체 방문자 12,665,547 명
  • 오늘 가입자 0 명
  • 어제 가입자 0 명
  • 전체 회원수 37,533 명
  • 전체 게시물 248,186 개
  • 전체 댓글수 193,361 개