Geometric deep learning going beyond Euclidean data
Michael M. Bronstein, Joan Bruna, Yann LeCun, Arthur Szlam, Pierre Vandergheynst
- I. INTRODUCTION
- II. GEOMETRIC LEARNING PROBLEMS
- III. DEEP LEARNING ON EUCLIDEAN DOMAINS
- IV. THE GEOMETRY OF MANIFOLDS AND GRAPHS
- V. SPECTRAL METHODS
- VI. SPECTRUM-FREE METHODS
- VII. CHARTING-BASED METHODS
- Geometric deep learning is an umbrella term for emerging techniques attempting to generalize (structured) deep neural models to non-Euclidean domains such as graphs and manifolds.
- 기하학적 딥러닝: 그래프나 매니폴드같은 비유클리디언 도메인에서 구조화된 깊은 신경망 모델을 일반화하는 등장하는 기술 시도를 포괄하는 용어
- The purpose of this paper is to overview different examples of geometric deep learning problems and present available solutions, key difficulties, applications, and future research directions in this nascent field.
- The use of convolutions has a two-fold effect.
- First, it allows extracting local features that are shared across the image domain and greatly reduces the number of parameters in the network with respect to generic deep architectures (and thus also the risk of overfitting), without sacrificing the expressive capacity of the network.
- 적은 수의 파라메터, 로컬 특징 추출
- Second, the convolutional architecture itself imposes some priors about the data, which appear very suitable especially for natural images
- 자연 이미지에 적합한 우선순위 부과
- The non-Euclidean nature of such data implies that there are no such familiar properties as global parameterization, common system of coordinates, vector space structure, or shift-invariance.
- Consequently, basic operations like convolution that are taken for granted in the Euclidean case are even not well defined on non-Euclidean domains.
- 유클리드레서 당연한 컨벌루션같은 기본 연산은 비유클리드 도메인에서 잘 정의되지 않음!
- The purpose of our paper is to show different methods of translating the key ingredients of successful deep learning methods such as convolutional neural networks to non-Euclidean data.
- 목적: 비유클리디안 데이터에 컨벌루션 신경망같은 성공적인 딥러닝 방법의 주요 구성을 변환하는 다양한 방법을 보여주기~
- Broadly speaking, we can distinguish between two classes of geometric learning problems.
- In the first class of problems, the goal is to characterize the structure of the data.
- 목적은 데이터 구조를 특성화하는 것
- The second class of problems deals with analyzing functions defined on a given non-Euclidean domain.
- These two classes are related, since understanding the properties of functions defined on a domain conveys certain information about the domain, and vice-versa, the structure of the domain imposes certain properties on the functions on it.
- 도메인에 정의된 함수의 특성을 이해하는 것은 도메인에 대한 특정 정보를 전달하고, 그 반대로 도매안의 구조는 도메인에서 함수에 특성 속성을 부과하기 때문에 구 클래스는 연관이 있다.
Structure of the domain:
- Many methods for nonlinear dimensionality reduction consist of two steps:
- first, they start with constructing a representation of local affinity of the data points (typically, a sparsely connected graph).
- 데이터 포인트의 지엽적 유사도local affinity 구성
- Second, the data points are embedded into a low-dimensional space trying to preserve some criterion of the original affinity.
- 데이터 포인트는 원본 유사도를 보존하면서 저차원 공간에 임베드됨
- first, they start with constructing a representation of local affinity of the data points (typically, a sparsely connected graph).
Instead of embedding the vertices, the graph structure can be processed by decomposing it into small sub-graphs called motifs [36] or graphlets [37].
- 정점 임베딩하는 대신 그래프 구조는 모티브나 그래프릿같은 작은 서브 그래프에서 분해함으로써 처리 가능
- 데이터가 manifold나 graph로 주어지면 유사도를 구성하는 과정이 불필요함
- In network analysis applications such as computational sociology, the topological structure of the social graph representing the social relations between people carries important insights allowing, for example, to classify the vertices and detect communities [40].
- 네트워크 분석 어플리케이션은 정점을 분류하고 커뮤니티 탐지
- In natural language processing, words in a corpus can be represented by the co-occurrence graph, where two words are connected if they often appear near each other [41].
- 자연어 처리에서는 두 단어가 서로 근처에 나타나면 연결된 동시 발생 그래프로 표현 가능
Data on a domain
- Our second class of problems deals with analyzing functions defined on a given non-Euclidean domain.
- We can further break down such problems into two subclasses: problems where the domain is fixed and those where multiple domains are given.
- In computer graphics and vision applications, finding similarity and correspondence between shapes are examples of the second sub-class of problems: each shape is modeled as a manifold, and one has to work with multiple such domains.
- 각 모양은 매니폴드로 모델화되고, 하나는 여러 도메인에서 작업해야 한다.
- In this setting, a generalization of convolution in the spatial domain using local charting [46], [47], [48] appears to be more appropriate.
Brief history
- The main focus of this review is on this second class of problems, namely learning functions on non-Euclidean structured domains, and in particular, attempts to generalize the popular CNNs to such settings.
- First attempts to generalize neural networks to graphs we are aware of are due to Scarselli et al. [49], who proposed a scheme combining recurrent neural networks and random walk models.
- 그래프에서 신경망을 일반화하려는 첫번째 시도는 Scarselli이 함~ 임의보행 모델과 반복 신경망 모델의 결합을 제안함
- The first formulation of CNNs on graphs is due to Bruna et al. [52], who used the definition of convolutions in the spectral domain.
- 그레프에서 cnn을 첫번째 공식화한 건 Bruna, 스펙트럼 도메인에서 컨벌루션 정의를 사용!
- Their paper, while being of conceptual importance, came with significant computational drawbacks that fell short of a truly useful method.
- 유용한데 계산상의 결점이 존재.
- These drawbacks were subsequently addressed in the followup works of Henaff et al.[44] and Defferrard et al. [45].
- In the latter paper, graph CNNs allowed achieving some state-of-the-art results.
- 후에 해결책 제시하는듯?!
- In a parallel effort in the computer vision and graphics community, Masci et al. [47] showed the first CNN model on meshed surfaces, resorting to a spatial definition of the convolution operation based on local intrinsic patches.
- 평행적 노력: 로컬 고유 패치를 기반으로 한 컨벌루션 연산의 공간적 정의에 의존한 매쉬드 표면에서 첫번째 CNN 모델을 보임
- The interest in deep learning on graphs or manifolds has exploded in the past year, resulting in numerous attempts to apply these methods in a broad spectrum of problems ranging from biochemistry [55] to recommender systems [56].
Structure of the paper
- Going to the non-Euclidean world in Section IV, we then define basic notions in differential geometry and graph theory.
- These topics are insufficiently known in the signal processing community, and to our knowledge, there is no introductorylevel reference treating these so different structures in a common way.
- One of our goals is to provide an accessible overview of these models resorting as much as possible to the intuition of traditional signal processing.
- In Sections V–VIII, we overview the main geometric deep learning paradigms, emphasizing the similarities and the differences between Euclidean and non-Euclidean learning methods.
- In Section IX, we show examples of selected problems from the fields of network analysis, particle physics, recommender systems, computer vision, and graphics.
- In Section X, we draw conclusions and outline current main challenges and potential future research directions in geometric deep learning.
- To make the paper more readable, we use inserts to illustrate important concepts.
Geometric priors
- Consider a compact d-dimensional Euclidean domain $\Omega = [0; 1]^d \subset \mathbb{R}^d $ on which squareintegrable functions $f \in L2(\Omega)$ are defined
- 제곱할 수 있는 함수가 정의된 컴책트한 d차원의 유클리디안 도메인을 고려해보자.
- We consider a generic supervised learning setting, in which an unknown8 function $y : L^2(\Omega) \rightarrow \cal{Y}$ is observed on a training set
- $\{ f_i \in L^2 (\Omega) , y_i = y(f_i ) \}_{i \in \cal{I}}$..(1)
- In a supervised classification setting, the target space $\cal{Y}$ can be thought discrete with $| \cal{Y} |$ being the number of classes.
- In a multiple object recognition setting, we can replace $\cal{Y}$ by the $K$-dimensional simplex, which represents the posterior class probabilities $p(y|x)$. In regression tasks, we may consider $\cal{Y} = \mathbb{R}^m$.
Stationarity
- Let $\cal{T}vf(x) = f(x - v),$ $x; v \in \Omega$ (2) be a translation operator acting on functions $f \in L^2 (\Omega)$
- Our first assumption is that the function y is either invariant or equivariant with respect to translations, depending on the task.
- In the former case, we have $y(\cal{T}_v f) = y(f)$ for any $f \in L^2(\Omega)$ and $v \in \Omega$.
- invariant의 뜻
- This is typically the case in object classification tasks.
- In the latter, we have $y(\cal{T}_v f) = \cal{T}_v y(f)$, which is welldefined when the output of the model is a space in which translations can act upon (for example, in problems of object8 localization, semantic segmentation, or motion estimation).
- Our definition of invariance should not be confused with the traditional notion of translation invariant systems in signal processing, which corresponds to translation equivariance in our language (since the output translates whenever the input translates).
Local deformations and scale separation
- Similarly, a deformation $\cal{L}_{\cal{T}}$, where $\tau : \Omega \rightarrow \Omega$ is a smooth vector field, acts on $L^2 (\Omega)$ as $\cal{L}_{\cal{T}} f(x) = f(x - \tau(x))$.
- deformation can model local translations, changes in point of view, rotations and frequency transpositions
- Most tasks studied in computer vision are not only translation invariant/equivariant, but also stable with respect to local deformations [57], [18].
- In tasks that are translation invariant we have
- $| y(\cal{L}_{\cal{T}} f) - y(f)| \approx \| \bigtriangledown \tau \|$, (3)
- for all $f$, $\tau$ .
- Here, $\| \bigtriangledown \tau \|$ measures the smoothness of a given deformation field.
- In other words, the quantity to be predicted does not change much if the input image is slightly deformed.
- In tasks that are translation equivariant, we have
- $| y(\cal{L}_{\cal{T}} f) - \cal{L}_{\cal{T}}y(f)| \approx \| \bigtriangledown \tau \|$, (4)
- This property is much stronger than the previous one, since the space of local deformations has a high dimensionality, as opposed to the d-dimensional translation group.
- It follows from (3) that we can extract sufficient statistics at a lower spatial resolution by downsampling demodulated localized filter responses without losing approximation power.
- An important consequence of this is that long-range dependencies can be broken into multi-scale local interaction terms, leading to hierarchical models in which spatial resolution is progressively reduced.
- To illustrate this principle, denote by
- $Y (x_1; x2; v) = Prob(f(u) = x1 \text{ and } f(u + v) = x2)$ (5)
- the joint distribution of two image pixels at an offset $v$ from each other.
- In the presence of long-range dependencies, this joint distribution will not be separable for any $v$.
- However, the deformation stability prior states that $Y (x_1, x_2; v) \approx Y (x_1, x_2; v(1 + \epsilon ))$ for small $\epsilon$.
- In other words, whereas long-range dependencies indeed exist in natural images and are critical to object recognition, they can be captured and down-sampled at different scales.
- This principle of stability to local deformations has been exploited in the computer vision community in models other than CNNs,
- for instance, deformable parts models [58].
Convolutional neural networks
- Stationarity and stability to local translations are both leveraged in convolutional neural networks
- The model is said to be deep if it comprises multiple layers, though this notion is rather vague and one can find examples of CNNs with as few as a couple and as many as hundreds of layers [11].
- 다층으로 구성되면 깊다고는 하는데 이 개념은 오히려 모호하고 구 개 혹은 수백개 층으로 CNN의 예를 찾을 수 있음
- The output features enjoy translation invariance/covariance depending on whether spatial resolution is progressively lost by means of pooling or kept fixed.
- 출력 특징으로는 공간 해상도가 풀링에 의해 점진적으로 손실되는지 고정을 유지하는지에 따라 변형 불변성(위치 변해도 결과 나옴) 및 공변성을 가진다는 것.
- Moreover, if one specifies the convolutional tensors to be complex wavelet decomposition operators and uses complex modulus as pointwise nonlinearities, one can provably obtain stability to local deformations [17].
- Although this stability is not rigorously proved for generic compactly supported convolutional tensors, it underpins the empirical success of CNN architectures across a variety of computer vision applications [1].
- A key advantage of CNNs explaining their success in numerous tasks is that the geometric priors on which CNNs are based result in a learning complexity that avoids the curse of dimensionality.
- 수많은 연구에서 성공을 설명하는 CNN의 주요 장점은 CNN이 기반으로 하는 기하학적 prior가 차원의 curse를 피하는 학습 복잡성의 결과를 초래한다는 것.
- Thanks to the stationarity and local deformation priors, the linear operators at each layer have a constant number of parameters, independent of the input size n (number of pixels in an image).
- stationarity와 국소적 분해 prior 덕분에 선형 연산자는 출력 크기 n에 독립적으로(상관없이) 각 층에서 매개변수의 일정한 수를 가짐
- Moreover, thanks to the multiscale hierarchical property, the number of layers grows at a rate $\cal{O}(log n)$, resulting in a total learning complexity of $\cal{O}(log n)$ parameters.
- 다중 스케일 계층적 특징 덕분에 수많은 층은 $\cal{O}(log n)$ 비율로 증가하고, $\cal{O}(log n)$ 매개변수의 전체 학습 복접성을 초래함
Manifolds
- a manifold is a space that is locally Euclidean.
- Formally speaking, a (differentiable) d-dimensional manifold $\cal{X}$ is a topological space where each point $x$ has a neighborhood that is topologically equivalent (homeomorphic) to a d-dimensional Euclidean space, called the tangent space and denoted by $T_x \cal{X}$.
- The collection of tangent spaces at all points (more formally, their disjoint union) is referred to as the tangent bundle and denoted by $T \cal{X}$.
- On each tangent space, we define an inner product $<\bullet,\bullet>_{T_x \cal{X}} : T_x \cal{X} \times T_ \cal{x}X \rightarrow \mathbb{R}$, which is additionally assumed to depend smoothly on the position x.
- This inner product is called a Riemannian metric in differential geometry and allows performing local measurements of angles, distances, and volumes.
- A manifold equipped with a metric is called a Riemannian manifold.
리만 다양체: 각 점의 접공간 위에 양의 정부호 쌍선형 형식이 주어져, 두 점 사이의 거리를 측정할 수 있는 매끄러운 다양체이다
- The celebrated Nash Embedding Theorem guarantees that any sufficiently smooth Riemannian manifold can be realized in a Euclidean space of sufficiently high dimension [67].
- An embedding is not necessarily unique; two different realizations of a Riemannian metric are called isometries.
- Isometries do not affect the metric structure of themanifold and consequently, preserve any quantities that can be expressed in terms of the Riemannian metric (called intrinsic).
- Conversely, properties related to the specific realization of the manifold in the Euclidean space are called extrinsic.
Calculus on manifolds
- 우리가 관심있는 두 가지
- A scalar field is a smooth real function $f : \cal{X} \rightarrow \mathbb{R}$ on the manifold.
- A tangent vector field $F : \cal{X} \rightarrow T\cal{X}$ is a mapping attaching a tangent vector $F(x) \in T_x \cal{X}$ to each point $x$.
- We define the Hilbert spaces of scalar and vector fields on manifolds, denoted by $L^2(\cal{X})$ and $L^2(T \cal{X})$,
- One of the big differences distinguishing classical calculus from differential geometry is a lack of vector space structure on the manifold, prohibiting us from na¨ıvely using expressions like $f(x+dx)$.
- The conceptual leap that is required to generalize such notions to manifolds is the need to work locally in the tangent space.
- The operator $\bigtriangledown f : L^2(\cal{X}) \rightarrow L^2(T\cal{X})$ in the definition above is called the intrinsic gradient, and is similar to the classical notion of the gradient defining the direction of the steepest change of the function at a point, with the only difference that the direction is now a tangent vector.
- The Laplacian can be interpreted as the difference between the average of a function on an infinitesimal sphere around a point and the value of the function at the point itself.
- 라플라시안은 point 주위의 극소수 상의 함수의 평균과 점 자체의 함수 값의 차이로 해석될 수 있다.
Graphs and discrete differential operators
- For simplicity, we will consider weighted undirected graphs, formally defined as a pair $(\cal{V}, \cal{E})$, where $\cal{V} = \{1; \dots n\}$ is the set of $n$ vertices, and $\cal{E} \subseteq \cal{V} \times \cal{V}$ is the set of edges, where the graph being undirected implies that $(i, j) \in \cal{E}$ iff $(j, i) \in \cal{E}$.
- Furthermore, we associate a weight $a_i > 0$ with each vertex $i \in \cal{V}$, and a weight $w_{ij} \ge 0$ with each edge $(i, j) \in \cal{E}$.
- Real functions $f : \cal{V} \rightarrow \mathbb{R}$ and $F : \cal{E} \rightarrow \mathbb{R}$ on the vertices and edges of the graph, respectively, are roughly the discrete analogy of continuous scalar and tangent vector fields in differential geometry.
- We can define Hilbert spaces $L^2(\cal{V})$ and $L^2(\cal{E})$ of such functions by specifying the respective inner products, $$<f, g>_{L^2(\cal{V})} = \sum_{i\in \cal{V}} a_i f_i g_i ; \text{ (20)}$$ $$<F, G>_{L^2(\cal{E})} = \sum_{i\in \cal{E}} w_{i,j} F_{i,j} G_{i,j} ; \text{ (21)}$$
- Let $f \in L^2(\cal{V})$ and $F \in L^2(\cal{E})$ be functions on the vertices and edges of the graphs, respectively.
- We can define differential operators acting on such functions analogously to differential operators on manifolds [72].
- The graph gradient is an operator $\bigtriangledown : L^2(\cal{V}) \rightarrow L^2(\cal{E})$ mapping functions defined on vertices to functions defined on edges, $$(\bigtriangledown f)_{ij} = f_i - f_j , \text{ (22)}$$
- automatically satisfying $(\bigtriangledown f)_{ij} = (\bigtriangledown f)_{ji}$.
- The graph divergence is an operator div : $L^2(\cal{E}) \rightarrow L^2(\cal{V})$ doing the converse, $$(div F )_i = \frac{1}{a_i} \sum_{j:(i,j)\in \cal{E}} w_{ij} F_{ij} \text{ (23)}$$
- It is easy to verify that the two operators are adjoint w.r.t. the inner products (20)–(21), $$ <F, \bigtriangledown f> _{L^2(\cal{E})} = <\bigtriangledown * F, f>_{L^2(\cal{V})} = |<-div F, f>_{L^2(\cal{V})}\text{ (24)}$$
- The graph Laplacian is an operator $\bigtriangleup : L^2(\cal{V}) \rightarrow L^2(\cal{V})$ defined as $\bigtriangleup = -div \bigtriangledown$.
- Combining definitions (22)–(23), it can be expressed in the familiar form $$(\bigtriangleup f)_i = \frac{1}{a_i} \sum_{(i,j)\in \cal{E}} w_{ij} (f_i - f_j) \text{ (25)}$$
- Note that formula (25) captures the intuitive geometric interpretation of the Laplacian as the difference between the local average of a function around a point and the value of the function at the point itself.
- Denoting by $W = (w_{ij})$ the $n \times n$ matrix of edge weights (it is assumed that $w_{ij} = 0$ if $(i, j) \notin \cal{E})$, by $A = diag(a_1, \dots , a_n)$ the diagonal matrix of vertex weights, and by $D = diag(\sum_{j:j \neq i} w_{ij} )$ the degree matrix, the graph Laplacian application to a function $f \in L^2(\cal{V})$ represented as a column vector $f = (f_1, \dots ,f_n)^\top$ can be written in matrixvector form as $$\bigtriangleup f = A^{-1} (D - W) f \text{ (26)}$$
- The choice of $A = I$ in (26) is referred to as the unnormalized graph Laplacian; another popular choice is $A = D$ producing the random walk Laplacian [73].
Discrete manifolds
- As we mentioned, there are many practical situations in which one is given a sampling of points arising from a manifold but not the manifold itself.
- In computer graphics applications, reconstructing a correct discretization of a manifold from a point cloud is a difficult problem of its own, referred to a meshing (see insert IN2).
- In manifold learning problems, the manifold is typically approximated as a graph capturing the local affinity structure.
- We warn the reader that the term “manifold” as used in the context of generic data science is not geometrically rigorous, and can have less structure than a classical smooth manifold we have defined beforehand.
- 일반적인 데이터 과학에서 사용되는 manifold는 엄격하지도 않고 더 적은 구조를 가질 수 있음
Fourier analysis on non-Euclidean domains
- The Laplacian operator is a self-adjoint positive-semidefinite operator, admitting on a compact domain an eigendecomposition with a discrete set of orthonormal eigenfunctions $φ_0, φ_1, . . .$ (satisfying $(φ_i, φ_j)_{i^L2(\cal{X}} = δ_{ij} )$ and non-negative real eigenvalues $0 = λ_0 ≤ λ_1 ≤ . . .$ (referred to as the spectrum of the Laplacian), $$\bigtriangleup \phi_i = \lambda_i \phi_i , \text{ } i=0,1,\dots \text{ (31)}$$
- The eigenfunctions are the smoothest functions in the sense of the Dirichlet energy (see insert IN3) and can be interpreted as a generalization of the standard Fourier basis (given, in fact, by the eigenfunctions of the 1D Euclidean Laplacian, $−\frac{d^2}{x^2} e^{iωx} = ω^2 e^{iωx}$) to a non-Euclidean domain.
- It is important to emphasize that the Laplacian eigenbasis is intrinsic due to the intrinsic construction of the Laplacian itself.
- 라플라시안 고유기저는 라플라시안 자체의 고유한 구조 떄문에 고유하다는 건 중요
- A square-integrable function $f$ on $\cal{X}$ can be decomposed into Fourier series as $$f(x) = \sum_{i \geq 0} (f,\phi_i)_{L^2 (\cal{X})} \phi_i (x)$$
- where the projection on the basis functions producing a discrete set of Fourier coefficients $(\hat{f}_i = (f,\phi_i)_{L^2 (\cal{X})})$ generalizes the analysis (forward transform) stage in classical signal processing, and summing up the basis functions with these coefficients is the synthesis (inverse transform) stage.
- 푸리에 계수의 이산집합을 만드는 기저 함수의 projection은 고전적 신호 처리에서 분석 단계를 일반화
- 이 계수로 기저 함수를 합한 것이 합성 단계
-
A centerpiece of classical Euclidean signal processing is the property of the Fourier transform diagonalizing the convolution operator, colloquially referred to as the Convolution Theorem.
- 컨벌루션 이론: 고전적인 유클리드 신호 처리의 중점이 컨볼루션 연산자를 대각화하는 푸리에 변환의 성질
-
This property allows to express the convolution $f g$ of two functions in the spectral domain as the element-wise product of their Fourier transforms,*
- 이 특성 때문에 스펙트럼 도메인에서 두 함수의 컨볼루션을 푸리에 변환의 원소별 곱으로 표현 가능! $$(\widehat{f * g}) (w) = \int_{-\infty}^{\infty} f(x) e^{-iwx} dx \int^{\infty}_{-\infty} g(x) e^{-iwx} dx \text{ (33)} $$
- Unfortunately, in the non-Euclidean case we cannot even define the operation $x − x'$ on the manifold or graph, so the notion of convolution (7) does not directly extend to this case.
-
One possibility to generalize convolution to non-Euclidean domains is by using the Convolution Theorem as a definition,
- 컨발루션 이론을 정의로 사용함으로써 비유클리드 영역으로 컨볼루션 일반화 가능! $$(f * g) (x) = \sum_{i \geq 0} < f, \phi_i >_{L^2 ( \cal{X} )} <g,\phi_i>_{L^2 ( \cal{X} ) }\phi_i(x) \text{ (34)}$$
- One of the key differences of such a construction from the classical convolution is the lack of shift-invariance.
- shift-invariance의 부족..
- In terms of signal processing, it can be interpreted as a position-dependent filter.
- 신호 처리에서 이건 위치 의존적 필터로 해석 가능
- While parametrized by a fixed number of coefficients in the frequency domain, the spatial representation of the filter can vary dramatically at different points (see FIGS4).
- 주파수 영역에서 고정된 수의 계수에 의해 파라미터화되는 동안 필터의 공간적 표현은 다른 포인트에서 극적으로 달라질 수 있음
- The discussion above also applies to graphs instead of manifolds, where one only has to replace the inner product in equations (32) and (34) with the discrete one (20).
- 그래프에서도 적용됨!
- All the sums over $i$ would become finite, as the graph Laplacian $∆$ has $n$ eigenvectors.
- 그래프 라플라시안이 n개의 고유벡터 가지기 때문에 모든 합이 유한해짐
- In matrix-vector notation, the generalized convolution $f * g$ can be expressed as $Gf = Φ diag(\hat{g})Φ^\top f$, where $\hat{g} = (\hat{g}_1, . . . , \hat{g}_n)$ is the spectral representation of the filter and $Φ = (φ_1 , . . . , φ_n)$ denotes the Laplacian eigenvectors (30).
- The lack of shift invariance results in the absence of circulant (Toeplitz) structure in the matrix $G$, which characterizes the Euclidean setting.
- 토플리츠 행렬; 행의 값이 행마다 각각 같음
- Furthermore, it is easy to see that the convolution operation commutes with the Laplacian, $G∆f = ∆Gf$.
- 컨벌루션 연산이 라플라시안과 일치!
Uniqueness and stability
- Finally, it is important to note that the Laplacian eigenfunctions are not uniquely defined.
- To start with, they are defined up to sign, i.e., $∆(±φ) = λ(±φ)$.
- Thus, even isometric domains might have different Laplacian eigenfunctions.
- isometric 도메인도 다른 라플라시안 고유함수가지기 가능~
- Furthermore, if a Laplacian eigenvalue has multiplicity, then the associated eigenfunctions can be defined as orthonormal basis spanning the corresponding eigensubspace (or said differently, the eigenfunctions are defined up to an orthogonal transformation in the eigen-subspace).
- 라플라시안 고유값이 다중치라면 연관된 고유함수들은 대응하는 고유부분공간(?)에 걸친 직교 기저로 정의 가능~
- A small perturbation of the domain can lead to very large changes in the Laplacian eigenvectors, especially those associated with high frequencies.
- At the same time, the definition of heat kernels (36) and diffusion distances (38) does not suffer from these ambiguities – for example, the sign ambiguity disappears as the eigenfunctions are squared.
- Heat kernels also appear to be robust to domain perturbations.
[IN3] Physical interpretation of Laplacian eigenfunctions:
- Given a function $f$ on the domain $\cal{X}$ , the Dirichlet energy $$\cal{E}_{Dir}(f) = \int_{\cal{X}} \| ∇f(x) \|^{2}_{T_{x} \cal{X}} dx = \int_{\cal{X}} f(x)∆f(x)dx \text{ , (27)}$$
- measures how smooth it is (the last identity in (27) stems from (19)).
- We are looking for an orthonormal basis on $\cal{X}$, containing k smoothest possible functions (FIGS3), by solving
- the optimization problem $$min {φ_0}\cal{E}_{Dir}(φ_0) \text{ s.t. } \| φ_0 \| = 1 \text{ (28)}$$ $$min_{φ_i} \cal{E}_{Dir}(φ_i) \text{ s.t. } \| φ_i \| = 1, i = 1, 2, . . . k − 1 \\ φ_i ⊥ span\{ φ_0, . . . , φ_{i−1} \}$$
- In the discrete setting, when the domain is sampled at n points, problem (28) can be rewritten as $$min_{Φ_k ∈ \mathbb{R}^{n×k}} trace(Φ_{k}^\top ∆Φ_k) \text{ s.t. } Φ_{k}^\top Φ_k = I \text{ , (29)}$$
- where $Φ_k = (φ_0, . . . φ_{k−1})$.
- The solution of (29) is given by the first k eigenvectors of $∆$ satisfying $$∆Φ_k = Φ_k Λ_k \text{ , (30)}$$
- where $Λ_k = diag(λ_0, . . . , λ_{k−1})$ is the diagonal matrix of corresponding eigenvalues.
- The eigenvalues $0 = λ_0 ≤ λ_1 ≤ . . . λ_{k−1}$ are non-negative due to the positive-semidefiniteness of the Laplacian and can be interpreted as ‘frequencies’, where $φ_0 =$ const with the corresponding eigenvalue $λ_0 = 0$ play the role of the DC.
- The Laplacian eigendecomposition can be carried out in two ways.
- First, equation (30) can be rewritten as a generalized eigenproblem $(D − W)Φ_k = AΦ_kΛ_k$, resulting in A-orthogonal eigenvectors, $Φ_k^\top AΦ_k = I$.
- Alternatively, introducing a change of variables $Ψ_k = A^{1/2}Φ_k$, we can obtain a standard eigendecomposition problem $A^{−1/2}(D − W)A^{−1/2}Ψ_k = Ψ_kΛ_k$ with orthogonal eigenvectors $Ψ_k^\top Ψ_k = I$.
- When $A = D$ is used, the matrix $∆ = A^{−1/2}(D − W)A^{−1/2}$ is referred to as the normalized symmetric Laplacian.
[IN4] Heat diffusion on non-Euclidean domains
- An important application of spectral analysis, and historically, the main motivation for its development by Joseph Fourier, is the solution of partial differential equations (PDEs).
- In particular, we are interested in heat propagation on non-Euclidean domains.
- This process is governed by the heat diffusion equation, which in the simplest setting of homogeneous and isotropic diffusion has the form $$\begin{cases} f_t(x, t) = −c∆f(x, t) \\ f(x, 0) = f_0(x) (Initial condition) \end{cases} \text{ (35)}$$
- with additional boundary conditions if the domain has a boundary.
- $f(x, t)$ represents the temperature at point $x$ at time $t$.
- Equation (35) encodes the Newton’s law of cooling, according to which the rate of temperature change of a body (lhs) is proportional to the difference between its own temperature and that of the surrounding (rhs).
- The proportion coefficient c is referred to as the thermal diffusivity constant; here, we assume it to be equal to one for the sake of simplicity.
- 비율 계수 c는 열 전파 상수, 단순성 위해 1이라 가정
- The solution of (35) is given by applying the heat operator $H^t = e^{−t∆}$ to the initial condition and can be expressed in the spectral domain as $$f(x, t) = e^{−t∆}f_0(x) = \sum_{i≥0}<f_0, φ_i>_{L^2(\cal{X} )}e^{−tλ_i}φ_i(x) \text{ (36)}$$ $$ = \int_{\cal{X}}f_0(x') \sum_{i≥0} e^{−tλ_i}φ_i(x)φ_i(x') dx'$$ $$h_t(x,x`) := \sum_{i≥0} e^{−tλ_i}φ_i(x)φ_i(x')$$
- $h_t(x, x`)$ is known as the heat kernel and represents the solution of the heat equation with an initial condition $f_0(x) = δ_{x`}(x)$, or, in signal processing terms, an ‘impulse response’.
- In physical terms, $h_t(x, x` )$ describes how much heat flows from a point $x$ to point $x`$ in time $t$.
- In the Euclidean case, the heat kernel is shift-invariant, $h_t(x, x`) = h_t(x − x`)$, allowing to interpret the integral in (36) as a convolution $f(x, t) = (f_0 * h_t)(x)$.
- In the spectral domain, convolution with the heat kernel amounts to low-pass filtering with frequency response $e^{−tλ}$.
- Larger values of diffusion time t result in lower effective cutoff frequency and thus smoother solutions in space (corresponding to the intuition that longer diffusion smoothes more the initial heat distribution
- The ‘cross-talk’ between two heat kernels positioned at points $x$ and $x`$ allows to measure an intrinsic distance $$d^{2}_{t}(x, x`) = \int_{\cal{X}}(h_t(x, y) − h_t(x`, y))^2 dy \text{ (37)}$$ $$ = \sum_{i≥0} e^{-2tλ_i}(φ_i(x) − φ_i(x`))^2 \text{ (38)}$$
- referred to as the diffusion distance [30].
- Note that interpreting (37) and (38) as spatial- and frequency-domain norms $\| · \|_{L^2(\cal{X} )}$ and $\| · \|_{l^2}$, respectively, their equivalence is the consequence of the Parseval identity.
- Unlike geodesic distance that measures the length of the shortest path on the manifold or graph, the diffusion distance has an effect of averaging over different paths.
- 확산 거리는 다른 경로에서 평균을 내는 효과 가짐~
- It is thus more robust to perturbations of the domain, for example, introduction or removal of edges in a graph, or ‘cuts’ on a manifold.
열 커널이 다른 위치로 옮겨지면 그 모양이 얼마나 변하는지를 보여주고, 변환 불변성shift-invariance의 부족을 의미함.
- 여기서 시불변성은 시계열에서 정상성과 비슷한 개념으로 보면 된다!
Spectral CNN (SCNN)
- Similarly to the convolutional layer (6) of a classical Euclidean CNN, Bruna et al. [52] define a spectral convolutional layer as $$g_l = ξ \big( \sum^{q}_{l`=1}Φ_kΓ_{l,l`}Φ_{k}^\top f_{l`} \big) \text{, (39)}$$
- where the $n × p$ and $n × q$ matrices $F = (f_1, . . . ,f_p)$ and $G = (g_1, . . . , g_q)$ represent the $p-$ and $q-$dimensional input and output signals on the vertices of the graph, respectively (we use $n = |V|$ to denote the number of vertices in the graph), $Γ_{l,l`}$ is a $k \times k$ diagonal matrix of spectral multipliers representing a filter in the frequency domain, and $ξ$ is a nonlinearity applied on the vertex-wise function values.
- Using only the first $k$ eigenvectors in (39) sets a cutoff frequency which depends on the intrinsic regularity of the graph and also the sample size.
- 첫번째 k고유 벡터만 사용하면 그래프의 고유의 규칙과 표본크기에 따라 컷오프 주파수를 정할 수 있지.
- Typically, $k << n$, since only the first Laplacian eigenvectors describing the smooth structure of the graph are useful in practice.
- 전형적으로 그래프의 smooth 구조를 설명하는 첫번째 라플라시안 고유벡터만이 유일하게 유용하기 때문에 k<<n
- If the graph has an underlying group invariance, such a construction can discover it.
- In particular, standard CNNs can be redefined from the spectral domain (see insert IN5).
- However, in many cases the graph does not have a group structure, or the group structure does not commute with the Laplacian, and so we cannot think of each filter as passing a template across $\cal{V}$ and recording the correlation of the template with that location.
- 대부분 그룹 구조가 없거나 그룹 구조가 라플라시안에 commute되지 않아 각 필터가 정점 집단에서 template을 전달하고 template과 위치의 상관관계를 기록하는 것으로 생각할 수 있지.
- We should stress that a fundamental limitation of the spectral construction is its limitation to a single domain.
- The reason is that spectral filter coefficients (39) are basis dependent.
- 스펙트럼 상수기 기저 의존적이기 때문에 스펙트럼 구조의 기능적 제한을 단순한 도메인의 limitation이라고 강조해야만 함
- It implies that if we learn a filter w.r.t. basis $Φ_k$ on one domain, and then try to apply it on another domain with another basis $Ψ_k$, the result could be very different (see Figure 2 and insert IN6).
- 한 도메인에서 필터 기저 $Φ_k$를 학습하고 또다른 도메인에서 또다른 기저 $Ψ_k$를 가지고 적용하려하면 결과가 매우 달라질걸!
- It is possible to construct compatible orthogonal bases across different domains resorting to a joint diagonalization procedure [74], [75].
- 결합 대각화 과정에 때라 다른 도메인에 의해 호환되는 직교 기저 구성 가능!
- However, such a construction requires the knowledge of some correspondence between the domains.
- In applications such as social network analysis, for example, where dealing with two time instances of a social graph in which new vertices and edges have been added, such a correspondence can be easily computed and is therefore a reasonable assumption.
- Conversely, in computer graphics applications, finding correspondence between shapes is in itself a very hard problem, so assuming known correspondence between the domains is a rather unreasonable assumption.
- 새로 정점과 엣지가 추가되어도 두 시점간 인스턴스를 처리하는데 쉽게 계산되는 소셜 네트워크 분석과 달리 컴퓨터 그래픽은 shape 간의 관련성을 찾는 것만으로 어려운 문제이므로 도메인 간의 관련성을 알고 있다는 가정을 내리는 것은 합리적이지 않은 가정이 될 수 있음~
Fig.2. A toy example illustrating the difficulty of generalizing spectral filtering across non-Euclidean domains.
- Left: a function defined on a manifold (function values are represented by color);
- middle: result of the application of an edge-detection filter in the frequency domain;
- right: the same filter applied on the same function but on a different (nearly-isometric) domain produces a completely different result.
- The reason for this behavior is that the Fourier basis is domain-dependent, and the filter coefficients learnt on one domain cannot be applied to another one in a straightforward manner.
- Assuming that $k = O(n)$ eigenvectors of the Laplacian are kept, a convolutional layer (39) requires $pqk = O(n)$ parameters to train.
- We will see next how the global and local regularity of the graph can be combined to produce layers with constant number of parameters (i.e., such that the number of learnable parameters per layer does not depend upon the size of the input), which is the case in classical Euclidean CNNs.
- 일정한 매개변수의 수를 가진 레이어를 만들기 위해 그래프의 global, local regularity 결합하는 방법 알아볼 것
- The non-Euclidean analogy of pooling is graph coarsening, in which only a fraction $α < 1$ of the graph vertices is retained.
- The eigenvectors of graph Laplacians at two different resolutions are related by the following multigrid property:
- Let Φ, Φ˜ denote the n × n and αn × αn matrices of Laplacian eigenvectors of the original and the coarsened graph, respectively. Then, $$\tilde{Φ} ≈ PΦ \begin{pmatrix} I_{αn} \\ 0 \end{pmatrix} \text{, (40)}$$
- where $P$ is a $αn × n$ binary matrix whose $i$th row encodes the position of the $i$th vertex of the coarse graph on the original graph.
- P는 원래 그래프에서 coarse 그래프의 i번째 정점의 위체를 i번째 행으로 인코딩하는 alpha n * n 인 binary 행렬
- It follows that strided convolutions can be generalized using the spectral construction by keeping only the low-frequency components of the spectrum.
- This property also allows us to interpret (via interpolation) the local filters at deeper layers in the spatial construction to be low frequency.
- However, since in (39) the non-linearity is applied in the spatial domain, in practice one has to recompute the graph Laplacian eigenvectors at each resolution and apply them directly after each pooling step.
- (39)식에서 비선형성은 공간 영역에 적용되기 때문에 실제로 각 resolution에서 그래프 라플라시안 고유벡터를 다시 계산하고 각 pooling 단계 후에 직접 적용해야 한다.
- The spectral construction (39) assigns a degree of freedom for each eigenvector of the graph Laplacian.
- (39)의 스펙트럼 구조는 그래프 라플라시안의 각 고유 벡터에 대해 자유도 할당
- In most graphs, individual high-frequency eigenvectors become highly unstable.
- However, similarly as the wavelet construction in Euclidean domains, by appropriately grouping high frequency eigenvectors in each octave one can recover meaningful and stable information.
- As we shall see next, this principle also entails better learning complexity.
- 대부분 각 고주파 고유벡터는 고불안정성을가지지만 유클리드 도메인에서 웨이블릿 구조와 유사하게 각 octave에서 고주파 고유벡터를 적절하게 그루핑함으로써 의미있고 안정적인 정보를 복구할 수 있으며, 학습이 더 복잡하다...
Spectral CNN with smooth spectral multipliers
- In order to reduce the risk of overfitting, it is important to adapt the learning complexity to reduce the number of free parameters of the model.
- On Euclidean domains, this is achieved by learning convolutional kernels with small spatial support, which enables the model to learn a number of parameters independent of the input size.
- In order to achieve a similar learning complexity in the spectral domain, it is thus necessary to restrict the class of spectral multipliers to those corresponding to localized filters.
- For that purpose, we have to express spatial localization of filters in the frequency domain.
- In the Euclidean case, smoothness in the frequency domain corresponds to spatial decay, since $$\int^{+∞}_{−∞}|x|^{2k}|f(x)|^2dx = \int^{+∞}_{−∞} | \frac{∂^k \hat{f}(ω)}{∂ω^k}^2 dω \text{, (42)}$$
- by virtue of the Parseval Identity.
- This suggests that, in order to learn a layer in which features will be not only shared across locations but also well localized in the original domain, one can learn spectral multipliers which are smooth.
- Smoothness can be prescribed by learning only a subsampled set of frequency multipliers and using an interpolation kernel to obtain the rest, such as cubic splines.
- 평활도는 하위 샘플링된 주파수 multupliers의 집합을 학습하고 cubic spline과 같은 나머지를 얻으려 보간 커널을 사용함으로써 정해질 수 있다.
- However, the notion of smoothness also requires some geometry in the spectral domain.
- In the Euclidean setting, such a geometry naturally arises from the notion of frequency; for example, in the plane, the similarity between two Fourier atoms $e^{iω^\top}$ and $e^{iω`^\top}$ can be quantified by the distance $\| ω − ω` \|$, where $x$ denotes the two-dimensional planar coordinates, and ω is the two-dimensional frequency vector.
- On graphs, such a relation can be defined by means of a dual graph with weights w˜ij encoding the similarity between two eigenvectors $φ_i$ and $φ_j$ .
- A particularly simple choice consists in choosing a one dimensional arrangement, obtained by ordering the eigenvectors according to their eigenvalues.
- In this setting, the spectral multipliers are parametrized as $$diag(Γ_{l,l`} ) = Bα_{l,l`}\text{ , (43)}$$
- where $B = (b_{ij} ) = (β_j (λ_i))$ is a $k \times q$ fixed interpolation kernel (e.g., $β_j (λ)$ can be cubic splines) and $α$ is a vector of $q$ interpolation coefficients.
- In order to obtain filters with constant spatial support (i.e., independent of the input size $n$), one should choose a sampling step $γ ∼ n$ in the spectral domain, which results in a constant number $nγ^{−1} = O(1)$ of coefficients $α_{l,l`}$ per filter.
- Therefore, by combining spectral layers with graph coarsening, this model has $O(log n) total trainable parameters for inputs of size $n$, thus recovering the same learning complexity as CNNs on Euclidean grids.
- 스펙트럼 레이어를 그래프 coarsening이랑 결합함으로써 이 모델이 크기가 n인 입력에 대해 ologn이라는 총 훈련 가능 매개변수를 가지게 되고 그래서 유클리드 그리드에서 CNN과 같은 학습 복잡성을 회복함
- Even with such a parametrization of the filters, the spectral CNN (39) entails a high computational complexity of performing forward and backward passes, since they require an expensive step of matrix multiplication by $Φ_k$ and $Φ_{k}^\top$.
- While on Euclidean domains such a multiplication can be efficiently carried in $O(n log n)$ operations using FFT-type algorithms, for general graphs such algorithms do not exist and the complexity is $O(n^2)$.
- We will see next how to alleviate this cost by avoiding explicit computation of the Laplacian eigenvectors.
- A polynomial of the Laplacian acts as a polynomial on the eigenvalues.
- Thus, instead of explicitly operating in the frequency domain with spectral multipliers as in equation (43), it is possible to represent the filters via a polynomial expansion: $$g_α(∆) = Φg_α(Λ)Φ^\top \text{, (44)}$$
- corresponding to $$g_α(λ) = \sum^{r−1}_{j=0} α_j λ^j \text{. (45)}$$
- Here $α$ is the $r$-dimensional vector of polynomial coefficients, and $g_α(Λ) = diag(g_α(λ1), . . . , g_α(λ_n))$, resulting in filter matrices $Γ_{l,l'} = g_{α_{l,l'}} (Λ)$ whose entries have an explicit form in terms of the eigenvalues.
- An important property of this representation is that it automatically yields localized filters, for the following reason.
- Since the Laplacian is a local operator (working on 1-hop neighborhoods), the action of its $j$th power is constrained to $j$-hops.
- Since the filter is a linear combination of powers of the Laplacian, overall (45) behaves like a diffusion operator limited to $r$-hops around each vertex.
- 필터가 라플라시안의 power의 선형 조합이기 때문에 전반적으로 45번 식은 정점 주변에 r hops로 제한돈된 확산 연산자처럼 작용, 라플라시안이 local 연산자이기 때문에, 이의 j번째 power가 j-hops로 제한됌. 이 때문에 localized filter를 자동으로 산출
Graph CNN (GCNN) a.k.a. ChebNet
- Defferrard et al. used Chebyshev polynomial generated by the recurrence relation $$T_j (λ) = 2λT_{j−1}(λ) − T_{j−2}(λ) \text{; (46)}$$ $$T_0(λ) = 1;$$ $$T_1(λ) = λ.$$
- A filter can thus be parameterized uniquely via an expansion of order $r − 1$ such that $$g_α(\tilde{∆} ) = \sum^{r−1}_{j=0} α_jΦT_j (\tilde{Λ} )Φ^\top \text{ (47)}$$ $$= \sum^{r−1}_{j=0} α_jT_j (\tilde{∆}),$$
- where $\tilde{∆} = 2λ^{−1}_{n} ∆ − I$ and $\tilde{Λ} = 2λ^{−1}_{n} Λ − I$ denotes a rescaling of the Laplacian mapping its eigenvalues from the interval $[0, λ_n]$ to $[−1, 1]$ (necessary since the Chebyshev polynomials form an orthonormal basis in $[−1, 1]$).
- Denoting $\tilde{f}^{(j)} = T_j (\tilde{∆})f$, we can use the recurrence relation (46) to compute $\tilde{f}^{(j)} = 2\tilde{∆} \bar{f}^{(j−1)} − \bar{f}^{(j−2)}$ with $\bar{f}^{(0)} = f$ and $\bar{f}^{(1)} = \tilde{∆}f$ .
- The computational complexity of this procedure is therefore $\cal{O}(rn)$ operations and does not require an explicit computation of the Laplacian eigenvectors.
- 이 절차의 계산적 복잡도는 $\cal{O}(rn)$이고, 라플라시안 고유 벡터의 명시적 계산이 필요하지 않다.
Graph Convolutional Network (GCN)
- Kipf and Welling simplified this construction by further assuming $r = 2$ and $λ_n ≈ 2$, resulting in filters of the form $$g_α(f) = α_0f + α_1(∆ − I)f \\ = α_0f − α_1D^{−1/2}WD^{−1/2}f \text{. (48)}$$
- Further constraining $α = α_0 = −α_1$, one obtains filters represented by a single parameter, $g_α(f) = α(I + D^{−1/2}WD^{−1/2} )f\text{ . (49)}$
- Since the eigenvalues of $I + D^{−1/2}WD^{−1/2}$ are now in the range $[0, 2]$, repeated application of such a filter can result in numerical instability.
- This can be remedied by a renormalization $$g_α(f) = α\tilde{D}^{−1/2}\tilde{W} \tilde{D}^{−1/2}f\text{ , (50)}$$
- where $\tilde{W} = W + I$ and $\tilde{D} = diag(\sum_{j \neq i} \tilde{w}_{ij} )$.
- Note that though we arrived at the constructions of ChebNet and GCN starting in the spectral domain, they boil down to applying simple filters acting on the $r-$ or $1-$hop neighborhood of the graph in the spatial domain.
- We consider these constructions to be examples of the more general Graph Neural Network (GNN) framework:
GCN
- GCN은 graph convolution layer를 통해 그래프 형태의 데이터는 행렬 형태의 데이터로 변환되기 때문에 기존 머신 러닝 알고리즘을 그대로 적용할 수 있게 변환된다.
Graph Neural Network (GNN)
- Graph Neural Networks generalize the notion of applying the filtering operations directly on the graph via the graph weights.
- GNN은 그래프 가중치를 통해 그래프에 직접 필터링 연산을 적용하는 개념을 일반화 함!
- Similarly as Euclidean CNNs learn generic filters as linear combinations of localized, oriented bandpass and lowpass filters, a Graph Neural Network learns at each layer a generic linear combination of graph low-pass and high-pass operators.
- These are given respectively by $f → Wf$ and $f → ∆f$, and are thus generated by the degree matrix $D$ and the diffusion matrix $W$.
- Given a $p$-dimensional input signal on the vertices of the graph, represented by the $n \times p$ matrix $F$, the GNN considers a generic nonlinear function $ηθ : \mathbb{R}^p \times \mathbb{R}^p → \mathbb{R}^q$, parametrized by trainable parameters $θ$ that is applied to all nodes of the graph, $$g_i = η_θ ((Wf)_i,(Df)_i)\text{. (51)}$$
- In particular, choosing $η(a, b) = b−a$ one recovers the Laplacian operator $∆f$, but more general, nonlinear choices for $η$ yield trainable, task-specific diffusion operators.
- Similarly as with a CNN architecture, one can stack the resulting GNN layers $g = C_θ(f)$ and interleave them with graph pooling operators.
- CNN 구조와 유사하게 GNN 레이어를 쌓고 그래프 풀링 연산자로 그것들을 인터리빙할 수 있다.
- Chebyshev polynomials $T_r(∆)$ can be obtained with $r$ layers of (51), making it possible, in principle, to consider ChebNet and GCN as particular instances of the GNN framework.
- As we have seen, defining convolution in the frequency domain has an inherent drawback of inability to adapt the model across different domains.
- note that in the setting of multiple domains, there is no immediate way to define a meaningful spatial pooling operation, as the number of points on different domains can vary, and their order be arbitrary.
- It is however possible to pool point-wise features produced by a network by aggregating all the local information into a single vector.
- One possibility for such a pooling is computing the statistics of the point-wise features, e.g. the mean or covariance [47].
- Note that after such a pooling all the spatial information is lost.
- 다중 도메인 설정할때 도메인에서 의미있는 공간 풀링 연산을 정의하는 즉각적인 방법은 없는데 모든 local 정보를 단일 벡터로 정의해서 네트워크에 의해 만드어진 point 별 특징을 풀링하는 것은 가능~
- 이런 풀링은 point별 특징의 통계를 계산도 가능! 하지만 이 풀링 이후 정보가 손실될 수 도 있음...
- On a Euclidean domain, due to shift-invariance the convolution can be thought of as passing a template at each point of the domain and recording the correlation of the template with the function at that point.
- Thinking of image filtering, this amounts to extracting a (typically square) patch of pixels, multiplying it element-wise with a template and summing up the results, then moving to the next position in a sliding window manner.
-
Shift-invariance implies that the very operation of extracting the patch at each position is always the same.
- 이동 불변성이란 입력의 위피가 달라져도 출력은 같은 것을 의미하지!
- One of the major problems in applying the same paradigm to non-Euclidean domains is the lack of shift-invariance, implying that the ‘patch operator’ extracting a local ‘patch’ would be position-dependent.
- 이동 불변성이 없다는 것은 local 패치를 추출하는 패치 연산자가 위치 의존적이라는 것을 암시하지
- Furthermore, the typical lack of meaningful global parametrization for a graph or manifold forces to represent the patch in some local intrinsic system of coordinates.
- Such a mapping can be obtained by defining a set of weighting functions $v_1(x, ·), . . . , v_J (x, ·)$ localized to positions near $x$ (see examples in Figure 3).
- 이런 매핑은 x 주변의 위치로 localized 가중치 함수의 셋을 정의함으로써 얻을 수 있음.
- Extracting a patch amounts to averaging the function f at each point by these weights, $$D_j (x)f =\int_{\cal{X}}f(x')v_j (x, x')dx', j = 1, . . . , J\text{, (52)}$$
- providing for a spatial definition of an intrinsic equivalent of convolution $$(f \star g)(x) = \sum_{j} g_jD_j (x)f\text{, (53)}$$
- where $g$ denotes the template coefficients applied on the patch extracted at each point.
- Overall, (52)–(53) act as a kind of nonlinear filtering of $f$, and the patch operator $D$ is specified by defining the weighting functions $v_1, . . . , v_J$ .
- Such filters are localized by construction, and the number of parameters is equal to the number of weighting functions $J = O(1)$.
- Several frameworks for non-Euclidean CNNs essentially amount to different choice of these weights.
- The spectrum-free methods (ChebNet and GCN) described in the previous section can also be thought of in terms of local weighting functions, as it is easy to see the analogy between formulae (53) and (47).
Geodesic CNN