ref: https://arxiv.org/pdf/2007.15255.pdf

git: https://github.com/uoguelph-mlrg/instance_selection_for_gans

instance, GAN에서 쉽게 말하자면 만들어내는 결과(?) 의미하는 듯,

위조지폐로 보면 지폐가 instance

Abstract

Several recently proposed techniques attempt to avoid spurious samples, either by rejecting them after generation, or by truncating the model’s latent space.

최근 제안된 기술들은 가짜 샘플들을 피하려는 시도는 생성 후 거절하거나 모델의 잠재 공간을 잘라내는 것임.
효과적이긴 한데 모델의 대부분이 사용되지 않는 샘플에 할당

altering the training dataset via instance selection before model training has taken place.

그래서 모델 학습이 일어나기 전에 인스턴스 선택을 통해 학습셋을 변경하는 것을 제안할 것

Instance Selection for GAN

to automatically remove the sparsest regions of the data manifold, specifically those parts that GANs struggle to capture.
define an image embedding function F and a scoring function H.

Embedding function

F projects images into an embedding space
- image z data set이 주어지면, $z = F(x)$ 를 data point $x ∈ X$ 에 적용하여 embeded image Z 가 주어진다.
- image generation을 위해 사전 학습된 image classifier의 feature space와 같은 aligned embedding function을 제안하고 있다.

Scoring function

H is used to to assess the manifold density in a neighbourhood around each embedded data point z.
- 논문에서 비교할 세 가지 scoring function selection
  - *log likelihood under a standard Gaussian model,
  - log likelihood under a Probabilistic Principal Component Analysis (PPCA) model,
  - distance to the Kth nearest neighbour (KNN Distance).

The Gaussian model is fit to the embedded dataset by computing the empirical mean $µ$ and the sample covariance $Σ$ of $Z$ .

d는 z의 demension

$H_{Gaussian}(z) = −\frac{1}{2}[ln(|Σ|) + (z − µ)^{T} Σ^{−1}(z − µ) + d ln(2π)], (1)$

논문 설정: set the number of principal components such that 95% of the variance in the data is preserved.

$H_{PPCA}(z) = −\frac{1}{2}[ln(|C|) + Tr((z − µ)^{T} C^{−1}(z − µ)) + d ln(2π)], C = WW^T + σ^2 I, (2)$

$W$ is the fit model weight matrix,
$µ$ is the empirical mean of $Z$ ,
$σ$ is the residual variance,
$I$ is the identity matrix,
$d$ is the dimension of $z$ .

KNN

$z$ 와 $Z \ {z}$ 의 유클리드 거리 계산 후 가장 가까운 k번째 원소까지 거리 반환해 data point 얻는데 사용한다.
To convert to a score, we make the resulting distance negative, such that smaller distances return larger values.

$H_{KNN}(z, K, Z) = − min\underset{K} \{||z − z_i ||_2 : z_i ∈ Z \ {z} \}, (3)$

집합에서 k번째 가장 작은 값. $\leftarrow$ 논문에서는 k=5로 정함
To perform instance selection, we compute scores $H(F(x))$ for each data point and keep all data points with scores above some threshold $ψ$ .
For convenience, *we often set $ψ$ to be equal to some percentile of the scores, such that we preserve the top N% of the best scoring data points

Figure 1에서 High likelihood images share a similar visual structure, while low likelihood samples are more varied 였음!

$X' = {x ∈ X s.t. H(F(x)) > ψ}$

data points $x ∈ X$ 의 초기 학습 set을 구성함으로써 reduced training set $X'$ 를 구성함

Figure 1에서 ImageNet의 Red Fox class 에서 most and least likely imgaed를 보면 training set으로부터 data points를 제거하는 것이 좋은 이유가 설명된다.

Likelihood는 pretrain된 Inceptionv3 classifier에서 feature embedding에 적합한 가우시안모델에 의해 결정된다.

The most likely images (a) are similarly cropped around the fox’s face, while the least likely images (b) have many odd viewpoints and often suffer from occlusion. It is logical to imagine how a generative model trained on these unusual instances may try to generate samples that mimic such conditions, resulting in undesirable outputs.

Experiments

review evaluation metrics,
motivate selecting instances based on manifold density,
analyze the impact of applying instance selection to GAN training.

Evaluation Metrics

When calculating FID we follow Brock et al. [2] in using all images in the training set to estimate the reference distribution, and sampling 50 k images to make up the generated distribution.
For P&R and D&C we use an Inceptionv3 embedding.
1 N and M are set to 10 k samples for both the reference and generated distributions, and K is set equal to 5 as recommended by Naeem et al. [19]

Relationship Between Dataset Manifold Density and GAN Performance

image manifold는 많은 data point들이 서로 가까이에 있는 영역에서보다 정확히 정의된다.
GAN은 주어진 dataset의 data point를 기반으로 image manifold를 재현하려고 시도하기 떄문에 잘 정의된 manifold(no sparse manifold regions)가 있는 dataset에서 더 나은 성능을 발휘해야 한다고 suspect한다.
- 그래서 use the ImageNet2 dataset [7] and treat each of the 1000 classes as a separate dataset 할거다
- use a single class-conditional BigGAN from [2] that has been pretrained on ImageNet at 128 × 128 resolution.
- For each class, we sample 700 real images from the dataset, and generate 700 class-conditioned samples with the BigGAN.

To measure the density for each class manifold we compare three different methods:

Gaussian likelihood,
Probabilistic Principal Component Analysis (PPCA) likelihood,
and distance to the Kth neighbour (KNN Distance) (§3).

Figure 2. image data set의 각 class에 대한 manifold 밀도 추정치와 FID 사이의 상관관계. x측 값이 낮을수록 dataset manifold의 밀도가 높다는 것을 나타냄. y축 값이 낮을수록 sample의 품질이 우수함을 나타냄.

Embedding and Scoring Function

dataset msnifold가 GAN 성능과 상관있다는 것을 확인하면 data manifold의 저밀도 영역에 놓여진 data point를 제거하여 training set의 전체 밀도를 인위적으로 증가시킨다.

ImageNet에서 64 * 64 해상도로 여러 개의 Self-Attention GANs (SAGAN) train하기
Each model is trained on a different 50% subset of ImageNet, as chosen by instance selection using different embedding and scoring functions.
instance 선택으로 class 별로 이루어진다.
use the default settings for SAGAN
use a batch size of 128
apply the self-attention module at 32 × 32 resolution
All models are trained for 200k iterations.

Table 1. Comparison of embedding and scoring functions on 64 × 64 ImageNet image generation task.

Models trained with instance selection significantly outperform models trained without instance selection, despite training on a fraction of the available data.
- 인스턴스 선택으로 훈련된 모델은 사용가능한 데이터의 일부의 훈련에도 인스턴스 선택없이 훈련된 모델보다 상당히 뛰어남
RR is the retention ratio (percentage of dataset trained on). Best results in bold.

All runs utilizing instance selection significantly outperform the baseline model trained on the full dataset, despite only having access to half as much training data

We observe a large increase in image fidelity, as indicated by the improvements in Inception Score, Precision, and Density, and a slight drop in overall diversity, as measured by Recall.
Coverage, which measures realism-constrained diversity, benefits greatly from the more realistic samples and thus sees an increase, despite the reduction in overall diversity.
Since the increase in image quality is much greater than the decrease in diversity, FID also improves.
To verify that the gains are not simply caused by the reduction in dataset size we train a model on a 50% subset that was uniform-randomly sampled from the full dataset.
Here, we observe little change in performance compared to the baseline, indicating that performance improvements are indeed due to careful selection of training data, rather than the reduction of dataset size.

table 1의 결과로 미루어보아 all three candidate scoring functions: Gaussian likelihood, PPCA likelihood, and KNN distance, significantly outperform the full dataset baseline를 알 수 있었음

그 중 가우시안이 가장 좋아보여서 점수 함수로 사용할 거(Gaussian likelihood slightly outperforms the alternatives, so we use it as the scoring function in the remainder of our experiments. )

여러가지 임베딩 비교해볼 예정

Inceptionv3 [28] trained on ImageNet, ResNet50 [9] trained on Places365 [40], ImageNet, and with SwAV unsupervised pretraining [4], and ResNeXt-101 32x8d [34] trained with weak supervision on Instagram 1B [15].
compare a randomly initialized Inceptionv3 with no pretraining as a random embedding.

For all architectures features are extracted after the global average pooling layer.

모든 아키텍터 특징들은 global average pooling layer 후 extract될 거

We find that all feature embeddings improve performance over the full dataset baseline except for the randomly initialized network. These results suggest that an embedding function that is well aligned with the target domain is required in order for instance selection to be effective.

instance selection이 효과적이려면 target domain과 잘 정렬된 embedding function이 필요하다는 것을 알았다.

Retention Ratio

Instance selection을 수행할떄 가장 중요한 고려 사항은 보존 비율이라 불리는 hyperparameter인 원래 dataset의 비율을 결정하는 것!

Figure 3. SAGAN trained on 64 × 64 ImageNet, with instance selection used to reduced the dataset by varying amounts.

retention ratio 보존 비율=100은 전체 dataset에 대해 train된 model을 나타냄, 즉 no instance selection
The application of instance selection boosts overall performance significantly.

Figure 4. Samples of bird classes from SAGAN trained on 64×64 ImageNet.

Each row is conditioned on a different class.
Red borders indicate misclassification by a row-specific pretrained Inceptionv3 classifier.
Instance selection (b) significantly improves sample fidelity and class consistency compared to the baseline (a).

Our best performing SAGAN model in terms of FID was trained on only 40% of the ImageNet dataset, yet outperforms FQ-BigGAN [39], the current state-of-the-art model for the task of 64 × 64 ImageNet generation.

Despite using 2× less parameters and a 4× smaller batch size, our SAGAN achieves a better FID (9.07 vs. 9.76).

Figure 4에서 볼 수 있듯이 instance selection model의 sample이 full dataset에서 train된 기준 model의 sample보다 더 잘 인식한다

128 × 128 ImageNet

이 장의 목적: To examine the impact of instance selection on the training time of large-scale models, we train two BigGAN models on 128 × 128 ImageNet3 .

uses the default hyperparameters from BigGAN with the exception that we reduce the channel multiplier from 96 to 64
use a single discriminator update instead of two for faster training
baseline BigGAN으로 우수한 성능을 이루는데 large batch가 중요하긴 하지만 instance selection과 결합하면 성능이 저하된다는 것이 발견됨.(Although large batch sizes are critical for achieving good performance with the baseline BigGAN [2], we found them to degrade performance when combined with instance selection. )
Therefore, we reduce the batch size from BigGAN’s default of 2048 to 256 for the instance selection model

Despite using a much smaller batch size, our model trained with instance selection outperforms the baseline in all metrics except for Recall (Table 2), as expected due to the diversity/fidelity trade-off.

훨씬 작은 배치 사이즈를 사용했지만 instance model로 훈련된 모델이 다양성/충실성 trade-off 때문에 예상대로 리콜을 제외한 모든 매트릭스에서 기준치를 능가한다.

The instance selection model trains significantly faster than the baseline, requiring less than four days while the baseline requires more than two weeks.

Figure 5: Samples from BigGAN trained on 256 × 256 ImageNet, with the truncation trick. Samples are selected to demonstrate the highest quality outputs for each model. The baseline model (a) struggles to produce convincing facial details, which the instance selection model (b) successfully achieves.

256 ×256 ImageNet

절대 시도 못할 조건..

To further demonstrate instance selection we train a BigGAN on ImageNet at 256 × 256 resolution using 4 V100s with 32GB of RAM each.

Instance Selection in Practice

As the experiments have shown, instance selection stands as a useful tool for trading away sample diversity in exchange for improvements in image fidelity, faster training, and lower model capacity requirements. We believe that this trade-off is a worthwhile hyperparameter to tune in consideration of the available compute budget, just as it is common practice to adjust model capacity or batch size to fit within the memory constraints of the available hardware.

instance selection은 이미지 충실도 개선, faster training, 낮은 capacity로 sample 가양성을 교확하는 유용한 tool.

The control over the diversity/fidelity trade-off afforded by instance selection also yields a tool that can be used to better understand the behaviour and limitations of existing evaluation metrics. For instance, in some cases when applying instance selection, we observed that certain diversity-sensitive metrics (such as FID and Coverage) improved, even though the diversity of the training set had been significantly reduced. We leave it for future work to determine whether this is a limitation of these metrics, or a behaviour that should be expected.

Finally, instance selection can be used to automatically curate new datasets for the task of image generation. Existing datasets that are designed for image synthesis often use manual filtering and hand-crafted cropping and alignment tools to increase the dataset manifold density [11]. As an alternative to these time-intensive procedures, instance selection provides a generic solution that can quickly be applied to any uncurated set of images.

instance selection은 uncurated set에 빠르게 적용할 수 있는 일반적인 solution을 제공한다.

Conclusion

Our motivation is to remove sparse regions of the data manifold before training, acknowledging that they will ultimately be poorly represented by the GAN, and therefore, that attempting to capture them is an inefficient use of model capacity.

There are multiple benefits of taking the instance selection approach: 충실도, 시간 단축

We improve sample fidelity across a variety of metrics compared to training on uncurated data;
We demonstrate that reallocating model capacity to denser regions of the data manifold leads to efficiency gains, meaning that we can achieve SOTA quality with smaller-capacity models trained in far less time.

To our knowledge, instance selection has not yet been formally analyzed in the generative setting.
We have only considered the setting where curation is performed up-front, prior to training.

Training models with instance selection results in improvements to image quality, as well as significant reduction in training time and model capacity requirements. On 128x128 ImageNet, instance selection reduces training time by 4x compared to the baseline, and achieves better image quality than a model trained with twice the model capacity. On 256x256 ImageNet, a model trained with instance selection produces higher fidelity images than a model with twice the capacity, while also using approximately one order of magnitude less multiply-accumulate operations (MACS) throughout the duration of training.

인스턴스 선택을하여 모델을 훈련시키면이미지 품질이 향상되고 훈련 시간과 모델 용량 요구사항(GPU )도 감소함