Multi-Label Classification

Multi-label classification refers to the problem of identifying the categories of objects in images that may not contain exactly one type of object. There may be more than one kind of object, or there may be no objects at all in the classes that you are looking for.

The Data

from fastai.vision.all import *
path = untar_data(URLs.PASCAL_2007)
100.00% [1637801984/1637796771 01:48<00:00]

This dataset is different from the ones we have seen before, in that it is not structured by filename or folder but instead comes with a CSV (comma-separated values) file telling us what labels to use for each image.

df = pd.read_csv(path/'train.csv')
df.head()
fname labels is_valid
0 000005.jpg chair True
1 000007.jpg car True
2 000009.jpg horse person True
3 000012.jpg car False
4 000016.jpg bicycle True

the list of categories in each image is shown as a space-delimited string.

Sidebar: Pandas and DataFrames

df.iloc[:,0]
0       000005.jpg
1       000007.jpg
2       000009.jpg
3       000012.jpg
4       000016.jpg
           ...    
5006    009954.jpg
5007    009955.jpg
5008    009958.jpg
5009    009959.jpg
5010    009961.jpg
Name: fname, Length: 5011, dtype: object
df.iloc[0,:]
fname       000005.jpg
labels           chair
is_valid          True
Name: 0, dtype: object

Constructing a DataBlock

How do we convert from a DataFrame object to a DataLoaders object?

As we have seen, PyTorch and fastai have two main classes for representing and accessing a training set or validation set:

  • Dataset:: A collection that returns a tuple of your independent and dependent variable for a single item
  • DataLoader:: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables

On top of these, fastai provides two classes for bringing your training and validation sets together:

  • Datasets:: An object that contains a training Dataset and a validation Dataset
  • DataLoaders:: An object that contains a training DataLoader and a validation DataLoader

Since a DataLoader builds on top of a Dataset and adds additional functionality to it (collating multiple items into a mini-batch), it’s often easiest to start by creating and testing Datasets, and then look at DataLoaders after that’s working.

When we create a DataBlock, we build up gradually, step by step, and use the notebook to check our data along the way.

dblock = DataBlock()
dsets = dblock.datasets(df)
len(dsets.train),len(dsets.valid)
(4009, 1002)
x,y = dsets.train[0]
x,y
(fname       005620.jpg
 labels       aeroplane
 is_valid          True
 Name: 2821, dtype: object,
 fname       005620.jpg
 labels       aeroplane
 is_valid          True
 Name: 2821, dtype: object)
dsets.train[0]
(fname       005620.jpg
 labels       aeroplane
 is_valid          True
 Name: 2821, dtype: object,
 fname       005620.jpg
 labels       aeroplane
 is_valid          True
 Name: 2821, dtype: object)

As you can see, this simply returns a row of the DataFrame, twice.

This is because by default, the data block assumes we have two things: input and target.

We are going to need to grab the appropriate fields from the DataFrame, which we can do by passing get_x and get_y functions:

x['fname']
'005620.jpg'
dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]
('002549.jpg', 'tvmonitor')

As you can see, rather than defining a function in the usual way, we are using Python’s lambda keyword. This is just a shortcut for defining and then referring to a function. The following more verbose approach is identical:

def get_x(r): return r['fname']
def get_y(r): return r['labels']
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
('002546.jpg', 'dog')

Lambda functions are great for quickly iterating, but they are not compatible with serialization, so we advise you to use the more verbose approach if you want to export your Learner after training (lambdas are fine if you are just experimenting).

We can see that the independent variable will need to be converted into a complete path, so that we can open it as an image, and the dependent variable will need to be split on the space character (which is the default for Python’s split function) so that it becomes a list:

df['labels'][200].split(' ')
['person', 'horse']
def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
(Path('/home/csy/.fastai/data/pascal_2007/train/004347.jpg'), ['bird'])
dsets.train[200]
(Path('/home/csy/.fastai/data/pascal_2007/train/004687.jpg'),
 ['motorbike', 'person', 'car'])

To actually open the image and do the conversion to tensors, we will need to use a set of transforms; block types will provide us with those. We can use the same block types that we have used previously, with one exception: the ImageBlock will work fine again, because we have a path that points to a valid image, but the CategoryBlock is not going to work. The problem is that block returns a single integer, but we need to be able to have multiple labels for each item. To solve this, we use a MultiCategoryBlock. This type of block expects to receive a list of strings, as we have in this case, so let’s test it out:

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
(PILImage mode=RGB size=500x333,
 TensorMultiCategory([1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))

As you can see, our list of categories is not encoded in the same way that it was for the regular CategoryBlock. In that case, we had a single integer representing which category was present, based on its location in our vocab. In this case, however, we instead have a list of zeros, with a one in any position where that category is present. For example, if there is a one in the second and fourth positions, then that means that vocab items two and four are present in this image. This is known as one-hot encoding. The reason we can’t easily just use a list of category indices is that each list would be a different length, and PyTorch requires tensors, where everything has to be the same length.

Let’s check what the categories represent for this example (we are using the convenient torch.where function, which tells us all of the indices where our condition is true or false):

dsets.train[0][1]==1.
TensorMultiCategory([ True, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False, False])
torch.where(dsets.train[0][1]==1.)
(TensorMultiCategory([0]),)
torch.where(dsets.train[0][1]==1.)[0]
TensorMultiCategory([0])
idxs = torch.where(dsets.train[0][1]==1.)[0]
dsets.train.vocab[idxs]
(#1) ['aeroplane']

With NumPy arrays, PyTorch tensors, and fastai’s L class, we can index directly using a list or vector, which makes a lot of code (such as this example) much clearer and more concise.

We have ignored the column is_valid up until now, which means that DataBlock has been using a random split by default. To explicitly choose the elements of our validation set, we need to write a function and pass it to splitter (or use one of fastai's predefined functions or classes). It will take the items (here our whole DataFrame) and must return two (or more) lists of integers:

def splitter(df):
    train = df.index[~df['is_valid']].tolist()
    valid = df.index[df['is_valid']].tolist()
    return train,valid

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y)

dsets = dblock.datasets(df)
dsets.train[0]
(PILImage mode=RGB size=500x333,
 TensorMultiCategory([0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]))

As we have discussed, a DataLoader collates the items from a Dataset into a mini-batch. This is a tuple of tensors, where each tensor simply stacks the items from that location in the Dataset item.

Now that we have confirmed that the individual items look okay, there's one more step we need to ensure we can create our DataLoaders, which is to ensure that every item is of the same size. To do this, we can use RandomResizedCrop:

dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
                   splitter=splitter,
                   get_x=get_x, 
                   get_y=get_y,
                   item_tfms = RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df)
dls.show_batch(nrows=1, ncols=5)

Remember that if anything goes wrong when you create your DataLoaders from your DataBlock, or if you want to view exactly what happens with your DataBlock, you can use the summary method we presented in the last chapter.

Our data is now ready for training a model. As we will see, nothing is going to change when we create our Learner, but behind the scenes, the fastai library will pick a new loss function for us: binary cross-entropy.

Binary Cross-Entropy

Let's use cnn_learner to create a Learner, so we can look at its activations:

learn = cnn_learner(dls, resnet34)

We also saw that the model in a Learner is generally an object of a class inheriting from nn.Module, and that we can call it using parentheses and it will return the activations of a model. You should pass it your independent variable, as a mini-batch. We can try it out by grabbing a mini batch from our DataLoader and then passing it to the model:

  • 데이터로더에서 미니배치를 가져온 다음 모델로 전환하여 shape 봐보기
x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x)
activs.shape
torch.Size([64, 20])

Think about why activs has this shape—we have a batch size of 64, and we need to calculate the probability of each of 20 categories.

activs[0]
TensorBase([ 2.4575,  2.0757,  1.6195,  0.4433, -1.5286,  3.3762,  1.4168,  2.0657, -0.5620, -1.7041, -0.4396, -4.4145, -1.4306,  0.5344,  1.9553,  2.2492,  0.6063, -3.4413,  3.0845,  1.6068],
       grad_fn=<AliasBackward0>)

sigmoid 써보기, log변환도 하기

def binary_cross_entropy(inputs, targets):
    inputs = inputs.sigmoid()
    return -torch.where(targets==1, 1-inputs, inputs).log().mean()

the binary_cross_entropy function, Each activation will be compared to each target for each column, so we don't have to do anything to make this function work for multiple columns.

F.binary_cross_entropy and its module equivalent nn.BCELoss calculate cross-entropy on a one-hot-encoded target, but do not include the initial sigmoid.

Normally for one-hot-encoded targets you'll want F.binary_cross_entropy_with_logits (or nn.BCEWithLogitsLoss), which do both sigmoid and binary cross-entropy in a single function, as in the preceding example.

The equivalent for single-label datasets (like MNIST or the Pet dataset), where the target is encoded as a single integer, is F.nll_loss or nn.NLLLoss for the version without the initial softmax, and F.cross_entropy or nn.CrossEntropyLoss for the version with the initial softmax.

BCEWithLogitsLoss: a one-hot-encoded target

loss_func = nn.BCEWithLogitsLoss()
loss = loss_func(activs, y)
loss
TensorMultiCategory(1.0863, grad_fn=<AliasBackward0>)

We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the DataLoaders has multiple category labels, so it will use nn.BCEWithLogitsLoss by default.

fastai knows that the DataLoaders has multiple category labels, so it will use nn.BCEWithLogitsLoss by default.

multilabel problem이 있지, accurac function을 사용할 수 없는데 정확도는 결과물을 밑 코드처럼 비교해서 그럼

def accuracy(inp, targ, axis=-1):
    "Compute accuracy with `targ` when `pred` is bs * n_classes"
    pred = inp.argmax(dim=axis)
    return (pred == targ).float().mean()

argmax는 highest activation이고, 어느게 0이고 어느게 1인지 정하기 위해 임계점threshold를 pick했어 임계점 이상이면 다 1로 생각하고 임계점보다 낮으면 다 0으로 생각!

def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
    "Compute accuracy when `inp` and `targ` are the same size."
    if sigmoid: inp = inp.sigmoid()
    return ((inp>thresh)==targ.bool()).float().mean()

임계점은 기본으로 0.5로 생각! 바꿀 수 있어 partial function 사용해서 !

def say_hello(name, say_what="Hello"): return f"{say_what} {name}."
say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')
('Hello Jeremy.', 'Ahoy! Jeremy.')
f = partial(say_hello, say_what="Bonjour")
f("Jeremy"),f("Sylvain")
('Bonjour Jeremy.', 'Bonjour Sylvain.')
learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
epoch train_loss valid_loss accuracy_multi time
0 0.934710 0.699861 0.234821 00:06
1 0.818233 0.563166 0.282351 00:05
2 0.600119 0.204835 0.811514 00:05
3 0.358874 0.125488 0.941653 00:05
epoch train_loss valid_loss accuracy_multi time
0 0.133624 0.114899 0.948885 00:06
1 0.116160 0.106754 0.951713 00:06
2 0.097663 0.103179 0.951554 00:06

임계점이 0.2 일때. 임계점 잡는 거 너무 중요함. 임계점을 너무 낮게 잡으면 라벨이 정확하게 선택되지 않을 수 있음

learn.metrics = partial(accuracy_multi, thresh=0.1)
learn.validate()
(#2) [0.10317931324243546,0.9307569265365601]

임계점 너무 높게 잡으면 모델운 매우 confident한 거만 선택할 걸

learn.metrics = partial(accuracy_multi, thresh=0.99)
learn.validate()
(#2) [0.10317931324243546,0.9428285956382751]

최적의 임계점을 찾아보자!

preds,targs = learn.get_preds()

get_preds는 sigmoid 같은 activation을 기본으로 반영하니까 안 하려면 accuracy_multi 가 필요하다

accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)
TensorBase(0.9562)
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);

임계점이 너무 잘 맞아 이상할 수도./

Regression

A model is defined by its independent and dependent variables, along with its loss function.

Assemble the Data

path = untar_data(URLs.BIWI_HEAD_POSE)
100.00% [452321280/452316199 00:51<00:00]
Path.BASE_PATH = path
path.ls().sorted()
(#50) [Path('01'),Path('01.obj'),Path('02'),Path('02.obj'),Path('03'),Path('03.obj'),Path('04'),Path('04.obj'),Path('05'),Path('05.obj')...]

There are 24 directories numbered from 01 to 24 (they correspond to the different people photographed), and a corresponding .obj file for each (we won't need them here). Let's take a look inside one of these directories:

(path/'01').ls().sorted()
(#1000) [Path('01/depth.cal'),Path('01/frame_00003_pose.txt'),Path('01/frame_00003_rgb.jpg'),Path('01/frame_00004_pose.txt'),Path('01/frame_00004_rgb.jpg'),Path('01/frame_00005_pose.txt'),Path('01/frame_00005_rgb.jpg'),Path('01/frame_00006_pose.txt'),Path('01/frame_00006_rgb.jpg'),Path('01/frame_00007_pose.txt')...]
img_files = get_image_files(path)
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')
img2pose(img_files[0])
Path('01/frame_00372_pose.txt')
im = PILImage.create(img_files[0])
im.shape
(480, 640)
im.to_thumb(150)

Biwi 데이터 셋은 머리 중앙의 위피를 보여즘. 머리 중앙 지점을 추출하기 위해 사용할 function 정의

cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_ctr(f):
    ctr = np.genfromtxt(img2pose(f), skip_header=3)
    c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
    c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
    return tensor([c1,c2])

각 사진은 개인이 들어있기 때문에 splitter를 사용하면 일반화 할 수 있음 그래서 사용하면 안 되고 각각 개인에 맞게.

데치터 블락 와 다른 점은 the second block is a PointBlock. 라벨의 좌표를 나타낸다는 것을 알기 위해 필요

biwi = DataBlock(
    blocks=(ImageBlock, PointBlock),
    get_items=get_image_files,
    get_y=get_ctr,
    splitter=FuncSplitter(lambda o: o.parent.name=='13'),
    batch_tfms=[*aug_transforms(size=(240,320)), 
                Normalize.from_stats(*imagenet_stats)]
)

Important: Points and Data Augmentation: We’re not aware of other libraries (except for fastai) that automatically and correctly apply data augmentation to coordinates. So, if you’re working with another library, you may need to disable data augmentation for these kinds of problems.
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))

look at the underlying tensors

xb,yb = dls.one_batch()
xb.shape,yb.shape
(torch.Size([64, 3, 240, 320]), torch.Size([64, 1, 2]))

Make sure that you understand why these are the shapes for our mini-batches.

yb[0]
TensorPoint([[ 0.1069, -0.0506]], device='cuda:0')

별도의 image regression applicatgion을 사용하지 않았음. 데이터에 레이블 지정 후 독립/종속 변수를 나타내는 데이터의 종류를 알려주면 된다.

As you can see, we haven't had to use a separate image regression application; all we've had to do is label the data, and tell fastai what kinds of data the independent and dependent variables represent.

It's the same for creating our Learner. We will use the same function as before, with one new parameter, and we will be ready to train our model.

Training a Model

cnn_learner to create our Learner

how we used y_range to tell fastai the range of our targets?

learn = cnn_learner(dls, resnet18, y_range=(-1,1))

y_range is implemented in fastai using sigmoid_range, which is defined as:

def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo
plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)
/home/csy/anaconda3/envs/csy/lib/python3.8/site-packages/fastbook/__init__.py:74: UserWarning: Not providing a value for linspace's steps is deprecated and will throw a runtime error in a future release. This warning will appear only once per process. (Triggered internally at  /opt/conda/conda-bld/pytorch_1639180588308/work/aten/src/ATen/native/RangeFactories.cpp:23.)
  x = torch.linspace(min,max)

손실 함수 지정해주지 않아서 fastai가 기본으로 선택함

dls.loss_func
FlattenedLoss of MSELoss()

좌표를 종속변수로 사용할때 배부분 가능한 가까운 것을 예측하려고 해서(MSELoss) 다른 손실 함수 사용하려면 cnn_leaner에 손실함수 파라네터를 전달하면 된다.

This makes sense, since when coordinates are used as the dependent variable, most of the time we're likely to be trying to predict something as close as possible; that's basically what MSELoss (mean squared error loss) does. If you want to use a different loss function, you can pass it to cnn_learner using the loss_func parameter.

learn.lr_find()
SuggestedLRs(valley=0.0020892962347716093)
lr = 1e-2
learn.fine_tune(3, lr)
epoch train_loss valid_loss time
0 0.045868 0.011111 00:27
epoch train_loss valid_loss time
0 0.008093 0.001202 00:32
1 0.002967 0.001237 00:32
2 0.001436 0.000068 00:32

Generally when we run this we get a loss of around 0.0001, which corresponds to an average coordinate prediction error of:

math.sqrt(0.0001)
0.01

But it's important to take a look at our results with Learner.show_results. The left side are the actual (ground truth) coordinates and the right side are our model's predictions:

learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))

Conclusion

  • nn.CrossEntropyLoss for single-label classification
  • nn.BCEWithLogitsLoss for multi-label classification
  • nn.MSELoss for regression

Questionnaire

  1. How could multi-label classification improve the usability of the bear classifier?
  2. How do we encode the dependent variable in a multi-label classification problem?
  3. How do you access the rows and columns of a DataFrame as if it was a matrix?
  4. How do you get a column by name from a DataFrame?
  5. What is the difference between a Dataset and DataLoader?
  6. What does a Datasets object normally contain?
  7. What does a DataLoaders object normally contain?
  8. What does lambda do in Python?
  9. What are the methods to customize how the independent and dependent variables are created with the data block API?
  10. Why is softmax not an appropriate output activation function when using a one hot encoded target?
  11. Why is nll_loss not an appropriate loss function when using a one-hot-encoded target?
  12. What is the difference between nn.BCELoss and nn.BCEWithLogitsLoss?
  13. Why can't we use regular accuracy in a multi-label problem?
  14. When is it okay to tune a hyperparameter on the validation set?
  15. How is y_range implemented in fastai? (See if you can implement it yourself and test it without peeking!)
  16. What is a regression problem? What loss function should you use for such a problem?
  17. What do you need to do to make sure the fastai library applies the same data augmentation to your input images and your target point coordinates?