# \*\*\[ANOMALOUS\]\*\*Graph

SEOYEON CHOI  
2023-09-07

# Reference

[공식홈](https://docs.pygod.org/en/latest/generated/pygod.detector.ANOMALOUS.html)

[paper](https://www.ijcai.org/Proceedings/2018/0488.pdf)

[교수님께서 알려주신
사이트](https://pycaret.gitbook.io/docs/get-started/quickstart#anomaly-detection)

`Summary`

-   노드당 매핑된 속성(attribute)으로 이상치를 계산해낸다.
-   그래서 속성 특징마다 나오는 이상치라고 칭하는 노드가 다른 것 같다.
-   노드 정보와 네트워크를 기반으로 rare하거나 상당히 differ한 인스턴스
    집합 찾는 것을 목표로 한다.
-   there may exist some outlying attributes that do not satisfy the
    Homophily hypothesis
    -   Homophily hypothesis을 만족하지 않는 어떤 outlying attributes가
        존재하며 이것이 이상치로 생각한다.

[데이터셋 논문](https://arxiv.org/pdf/1603.08861.pdf)

|    Abbr    | Year |  Backbone  | Sampling |           Class           |
|:----------:|:----:|:----------:|:--------:|:-------------------------:|
|    SCAN    | 2007 | Clustering |    No    |    pygod.detector.SCAN    |
|    GAE     | 2016 |   GNN+AE   |   Yes    |    pygod.detector.GAE     |
|   Radar    | 2017 |     MF     |    No    |   pygod.detector.Radar    |
| ANOMALOUS  | 2018 |     MF     |    No    | pygod.detector.ANOMALOUS  |
|    ONE     | 2019 |     MF     |    No    |    pygod.detector.ONE     |
|  DOMINANT  | 2019 |   GNN+AE   |   Yes    |  pygod.detector.DOMINANT  |
|    DONE    | 2020 |   MLP+AE   |   Yes    |    pygod.detector.DONE    |
|   AdONE    | 2020 |   MLP+AE   |   Yes    |   pygod.detector.AdONE    |
| AnomalyDAE | 2020 |   GNN+AE   |   Yes    | pygod.detector.AnomalyDAE |
|    GAAN    | 2020 |    GAN     |   Yes    |    pygod.detector.GAAN    |
|   OCGNN    | 2021 |    GNN     |   Yes    |   pygod.detector.OCGNN    |
|    CoLA    | 2021 | GNN+AE+SSL |   Yes    |    pygod.detector.CoLA    |
|   GUIDE    | 2021 |   GNN+AE   |   Yes    |   pygod.detector.GUIDE    |
|   CONAD    | 2022 | GNN+AE+SSL |   Yes    |   pygod.detector.CONAD    |

# Import

In [206]:
import pygod
import numpy as np
import torch_geometric.transforms as T
from torch_geometric.datasets import Planetoid

import torch
from pygod.generator import gen_contextual_outlier, gen_structural_outlier

from pygod.utils import load_data

from pygod.metric import eval_roc_auc

from pygod.detector import SCAN, GAE, Radar, ANOMALOUS, ONE, DOMINANT, DONE, AdONE, AnomalyDAE, GAAN, OCGNN, CoLA, GUIDE, CONAD

# Tutorial

In [105]:
data = Planetoid('./data/Cora', 'Cora', transform=T.NormalizeFeatures())[0]

In [106]:
data

`gen_contextual_outlier`의 역할: Generating contextual outliers

-   임의로 선택한 노드 중 그 노드들끼리 얼마나 떨어져 있나?

In [107]:
data, ya = gen_contextual_outlier(data, n=100, k=50)

In [108]:
ya

In [109]:
len(sum(np.where(ya==1)))

In [110]:
len(sum(np.where(ya==0)))

In [111]:
len(ya)

`gen_structural_outlier`의 역할: Generating structural outliers

-   임의로 선택한 노드들이 fully connected 되어있을때 그 집단과 얼마나
    많이 다른가??

In [112]:
data, ys = gen_structural_outlier(data, m=10, n=10)

In [113]:
ys

In [114]:
len(sum(np.where(ys==1)))

In [115]:
len(sum(np.where(ys==0)))

In [116]:
len(ys)

위에서 찾은 이상치 간에 `torch.logical_or` 논리 or 생성

In [117]:
data.y = torch.logical_or(ys, ya).long()

In [118]:
data.y

In [119]:
len(sum(np.where(data.y==1)))

In [120]:
len(sum(np.where(data.y==0)))

In [121]:
len(data.y)

load_data(‘inj_cora’)에서 쓸 수 있는 데이터
[종류](https://github.com/pygod-team/data)

In [122]:
data = load_data('inj_cora')
data.y = data.y.bool()

For injected/generated datasets, the labels meanings are as follows.

`-` 0: inlier

`-` 1: contextual outlier only

`-` 2: structural outlier only

`-` 3: both contextual outlier and structural outlier

Examples to convert the labels are as follows:

``` python
y = data.y.bool()    # binary labels (inlier/outlier)
yc = data.y >> 0 & 1 # contextual outliers
ys = data.y >> 1 & 1 # structural outliers
```

In [123]:
data.y

‘ANOMALOUS’ 함수 사용

In [124]:
detector = ANOMALOUS(gamma=1.,
                     weight_decay=0.,
                     lr=0.01,
                     epoch=50,
                     gpu=-1,
                     contamination=0.1,
                     verbose=0)

In [125]:
detector.fit(data)

``` python
class ANOMALOUSBase(nn.Module):
    def __init__(self, w, r):
        super(ANOMALOUSBase, self).__init__()
        self.w = nn.Parameter(w)
        self.r = nn.Parameter(r)

    def forward(self, x):
        return x @ self.w @ x, self.r
```

In [126]:
detector.decision_function(data)

위에서 decision_function의 결과로 나오는 decision_score는 r의 제곱이며,
이 r은 model에서 나온 결과인데 이 model은 ANOMALOUSBase(w_init,
r_init)의 결과이다.

이 r_init은 ANOMALOUS class 내에 있는 x, s, l, w_init, r_init =
self.process_graph(data) 여기서 나온다.

`-` return되는 거는 순서대로 x, s, laplacian, w_init, r_init

`x`

In [127]:
detector.process_graph(data)[0]

In [128]:
detector.process_graph(data)[0].shape

$X \in \mathbb{R}^{d \times n}$

2708 = `n` = the number of nodes

1433 = `d` = dimensiotnalattribute

`s`

In [129]:
detector.process_graph(data)[1]

In [130]:
detector.process_graph(data)[1].shape

$A \in \mathbb{R}^{n \times n}$

`laplacian`

In [131]:
detector.process_graph(data)[2]

In [132]:
detector.process_graph(data)[2].shape

`generated Laplacian`

$\tilde{R} L \tilde{R}^T$

`w_init`

In [133]:
detector.process_graph(data)[3]

In [134]:
detector.process_graph(data)[3].shape

`r_init`

In [135]:
detector.process_graph(data)[4]

In [136]:
detector.process_graph(data)[4].shape

# Disney

Disney dataset is a network of movies including many attributes such as
ratings, prices and the number of reviews

In [186]:
data = load_data('disney')
data.y = data.y.bool()

In [187]:
data.y

In [188]:
sum(data.y*1)

In [189]:
data

In [205]:
data.stores

-   node = 124
-   ratio of anomalous = 4.8%

In [177]:
detector.fit(data)

In [184]:
detector.label_

In [179]:
detector.decision_function(data)

In [166]:
detector.decision_function(data).shape



In [170]:
detector.predict(data)



In [169]:
sum(detector.predict())