fastbook 06_multicat
done
- Multi-Label Classification
- The Data
- Sidebar: Pandas and DataFrames
- Constructing a DataBlock
- Binary Cross-Entropy
- Regression
- Assemble the Data
- Training a Model
- Conclusion
- Questionnaire
Multi-label classification refers to the problem of identifying the categories of objects in images that may not contain exactly one type of object. There may be more than one kind of object, or there may be no objects at all in the classes that you are looking for.
from fastai.vision.all import *
path = untar_data(URLs.PASCAL_2007)
This dataset is different from the ones we have seen before, in that it is not structured by filename or folder but instead comes with a CSV (comma-separated values) file telling us what labels to use for each image.
df = pd.read_csv(path/'train.csv')
df.head()
the list of categories in each image is shown as a space-delimited string.
df.iloc[:,0]
df.iloc[0,:]
How do we convert from a DataFrame
object to a DataLoaders
object?
As we have seen, PyTorch and fastai have two main classes for representing and accessing a training set or validation set:
-
Dataset
:: A collection that returns a tuple of your independent and dependent variable for a single item -
DataLoader
:: An iterator that provides a stream of mini-batches, where each mini-batch is a tuple of a batch of independent variables and a batch of dependent variables
On top of these, fastai provides two classes for bringing your training and validation sets together:
-
Datasets
:: An object that contains a trainingDataset
and a validationDataset
-
DataLoaders
:: An object that contains a trainingDataLoader
and a validationDataLoader
Since a DataLoader
builds on top of a Dataset
and adds additional functionality to it (collating multiple items into a mini-batch), it’s often easiest to start by creating and testing Datasets
, and then look at DataLoaders
after that’s working.
When we create a DataBlock
, we build up gradually, step by step, and use the notebook to check our data along the way.
dblock = DataBlock()
dsets = dblock.datasets(df)
len(dsets.train),len(dsets.valid)
x,y = dsets.train[0]
x,y
dsets.train[0]
As you can see, this simply returns a row of the DataFrame, twice.
This is because by default, the data block assumes we have two things: input and target.
We are going to need to grab the appropriate fields from the DataFrame, which we can do by passing get_x
and get_y
functions:
x['fname']
dblock = DataBlock(get_x = lambda r: r['fname'], get_y = lambda r: r['labels'])
dsets = dblock.datasets(df)
dsets.train[0]
As you can see, rather than defining a function in the usual way, we are using Python’s lambda keyword. This is just a shortcut for defining and then referring to a function. The following more verbose approach is identical:
def get_x(r): return r['fname']
def get_y(r): return r['labels']
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
Lambda functions are great for quickly iterating, but they are not compatible with serialization, so we advise you to use the more verbose approach if you want to export your Learner after training (lambdas are fine if you are just experimenting).
We can see that the independent variable will need to be converted into a complete path, so that we can open it as an image, and the dependent variable will need to be split on the space character (which is the default for Python’s split function) so that it becomes a list:
df['labels'][200].split(' ')
def get_x(r): return path/'train'/r['fname']
def get_y(r): return r['labels'].split(' ')
dblock = DataBlock(get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
dsets.train[200]
To actually open the image and do the conversion to tensors, we will need to use a set of transforms; block types will provide us with those. We can use the same block types that we have used previously, with one exception: the ImageBlock
will work fine again, because we have a path that points to a valid image, but the CategoryBlock
is not going to work. The problem is that block returns a single integer, but we need to be able to have multiple labels for each item. To solve this, we use a MultiCategoryBlock
. This type of block expects to receive a list of strings, as we have in this case, so let’s test it out:
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
get_x = get_x, get_y = get_y)
dsets = dblock.datasets(df)
dsets.train[0]
As you can see, our list of categories is not encoded in the same way that it was for the regular CategoryBlock
. In that case, we had a single integer representing which category was present, based on its location in our vocab. In this case, however, we instead have a list of zeros, with a one in any position where that category is present. For example, if there is a one in the second and fourth positions, then that means that vocab items two and four are present in this image. This is known as one-hot encoding. The reason we can’t easily just use a list of category indices is that each list would be a different length, and PyTorch requires tensors, where everything has to be the same length.
Let’s check what the categories represent for this example (we are using the convenient torch.where
function, which tells us all of the indices where our condition is true or false):
dsets.train[0][1]==1.
torch.where(dsets.train[0][1]==1.)
torch.where(dsets.train[0][1]==1.)[0]
idxs = torch.where(dsets.train[0][1]==1.)[0]
dsets.train.vocab[idxs]
With NumPy arrays, PyTorch tensors, and fastai’s L
class, we can index directly using a list or vector, which makes a lot of code (such as this example) much clearer and more concise.
We have ignored the column is_valid
up until now, which means that DataBlock
has been using a random split by default. To explicitly choose the elements of our validation set, we need to write a function and pass it to splitter
(or use one of fastai's predefined functions or classes). It will take the items (here our whole DataFrame) and must return two (or more) lists of integers:
def splitter(df):
train = df.index[~df['is_valid']].tolist()
valid = df.index[df['is_valid']].tolist()
return train,valid
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y)
dsets = dblock.datasets(df)
dsets.train[0]
As we have discussed, a DataLoader
collates the items from a Dataset
into a mini-batch. This is a tuple of tensors, where each tensor simply stacks the items from that location in the Dataset
item.
Now that we have confirmed that the individual items look okay, there's one more step we need to ensure we can create our DataLoaders
, which is to ensure that every item is of the same size. To do this, we can use RandomResizedCrop
:
dblock = DataBlock(blocks=(ImageBlock, MultiCategoryBlock),
splitter=splitter,
get_x=get_x,
get_y=get_y,
item_tfms = RandomResizedCrop(128, min_scale=0.35))
dls = dblock.dataloaders(df)
dls.show_batch(nrows=1, ncols=5)
Remember that if anything goes wrong when you create your DataLoaders
from your DataBlock
, or if you want to view exactly what happens with your DataBlock
, you can use the summary
method we presented in the last chapter.
Our data is now ready for training a model. As we will see, nothing is going to change when we create our Learner
, but behind the scenes, the fastai library will pick a new loss function for us: binary cross-entropy.
Let's use cnn_learner
to create a Learner
, so we can look at its activations:
learn = cnn_learner(dls, resnet34)
We also saw that the model in a Learner
is generally an object of a class inheriting from nn.Module
, and that we can call it using parentheses and it will return the activations of a model. You should pass it your independent variable, as a mini-batch. We can try it out by grabbing a mini batch from our DataLoader
and then passing it to the model:
- 데이터로더에서 미니배치를 가져온 다음 모델로 전환하여 shape 봐보기
x,y = to_cpu(dls.train.one_batch())
activs = learn.model(x)
activs.shape
Think about why activs has this shape—we have a batch size of 64, and we need to calculate the probability of each of 20 categories.
activs[0]
sigmoid 써보기, log변환도 하기
def binary_cross_entropy(inputs, targets):
inputs = inputs.sigmoid()
return -torch.where(targets==1, 1-inputs, inputs).log().mean()
the binary_cross_entropy function
, Each activation will be compared to each target for each column, so we don't have to do anything to make this function work for multiple columns.
F.binary_cross_entropy
and its module equivalent nn.BCELoss
calculate cross-entropy on a one-hot-encoded target, but do not include the initial sigmoid
.
Normally for one-hot-encoded targets you'll want F.binary_cross_entropy_with_logits
(or nn.BCEWithLogitsLoss
), which do both sigmoid and binary cross-entropy in a single function, as in the preceding example.
The equivalent for single-label datasets (like MNIST or the Pet dataset), where the target is encoded as a single integer, is F.nll_loss
or nn.NLLLoss
for the version without the initial softmax, and F.cross_entropy
or nn.CrossEntropyLoss
for the version with the initial softmax.
BCEWithLogitsLoss
: a one-hot-encoded target
loss_func = nn.BCEWithLogitsLoss()
loss = loss_func(activs, y)
loss
We don't actually need to tell fastai to use this loss function (although we can if we want) since it will be automatically chosen for us. fastai knows that the DataLoaders
has multiple category labels, so it will use nn.BCEWithLogitsLoss
by default.
fastai knows that the DataLoaders
has multiple category labels, so it will use nn.BCEWithLogitsLoss
by default.
multilabel problem이 있지, accurac function을 사용할 수 없는데 정확도는 결과물을 밑 코드처럼 비교해서 그럼
def accuracy(inp, targ, axis=-1):
"Compute accuracy with `targ` when `pred` is bs * n_classes"
pred = inp.argmax(dim=axis)
return (pred == targ).float().mean()
argmax
는 highest activation이고, 어느게 0이고 어느게 1인지 정하기 위해 임계점threshold를 pick했어 임계점 이상이면 다 1로 생각하고 임계점보다 낮으면 다 0으로 생각!
def accuracy_multi(inp, targ, thresh=0.5, sigmoid=True):
"Compute accuracy when `inp` and `targ` are the same size."
if sigmoid: inp = inp.sigmoid()
return ((inp>thresh)==targ.bool()).float().mean()
임계점은 기본으로 0.5로 생각! 바꿀 수 있어 partial
function 사용해서 !
def say_hello(name, say_what="Hello"): return f"{say_what} {name}."
say_hello('Jeremy'),say_hello('Jeremy', 'Ahoy!')
f = partial(say_hello, say_what="Bonjour")
f("Jeremy"),f("Sylvain")
learn = cnn_learner(dls, resnet50, metrics=partial(accuracy_multi, thresh=0.2))
learn.fine_tune(3, base_lr=3e-3, freeze_epochs=4)
임계점이 0.2 일때. 임계점 잡는 거 너무 중요함. 임계점을 너무 낮게 잡으면 라벨이 정확하게 선택되지 않을 수 있음
learn.metrics = partial(accuracy_multi, thresh=0.1)
learn.validate()
임계점 너무 높게 잡으면 모델운 매우 confident한 거만 선택할 걸
learn.metrics = partial(accuracy_multi, thresh=0.99)
learn.validate()
최적의 임계점을 찾아보자!
preds,targs = learn.get_preds()
get_preds
는 sigmoid 같은 activation을 기본으로 반영하니까 안 하려면 accuracy_multi
가 필요하다
accuracy_multi(preds, targs, thresh=0.9, sigmoid=False)
xs = torch.linspace(0.05,0.95,29)
accs = [accuracy_multi(preds, targs, thresh=i, sigmoid=False) for i in xs]
plt.plot(xs,accs);
임계점이 너무 잘 맞아 이상할 수도./
A model is defined by its independent and dependent variables, along with its loss function.
path = untar_data(URLs.BIWI_HEAD_POSE)
Path.BASE_PATH = path
path.ls().sorted()
There are 24 directories numbered from 01 to 24 (they correspond to the different people photographed), and a corresponding .obj file for each (we won't need them here). Let's take a look inside one of these directories:
(path/'01').ls().sorted()
img_files = get_image_files(path)
def img2pose(x): return Path(f'{str(x)[:-7]}pose.txt')
img2pose(img_files[0])
im = PILImage.create(img_files[0])
im.shape
im.to_thumb(150)
Biwi 데이터 셋은 머리 중앙의 위피를 보여즘. 머리 중앙 지점을 추출하기 위해 사용할 function 정의
cal = np.genfromtxt(path/'01'/'rgb.cal', skip_footer=6)
def get_ctr(f):
ctr = np.genfromtxt(img2pose(f), skip_header=3)
c1 = ctr[0] * cal[0][0]/ctr[2] + cal[0][2]
c2 = ctr[1] * cal[1][1]/ctr[2] + cal[1][2]
return tensor([c1,c2])
각 사진은 개인이 들어있기 때문에 splitter를 사용하면 일반화 할 수 있음 그래서 사용하면 안 되고 각각 개인에 맞게.
데치터 블락 와 다른 점은 the second block is a PointBlock
. 라벨의 좌표를 나타낸다는 것을 알기 위해 필요
biwi = DataBlock(
blocks=(ImageBlock, PointBlock),
get_items=get_image_files,
get_y=get_ctr,
splitter=FuncSplitter(lambda o: o.parent.name=='13'),
batch_tfms=[*aug_transforms(size=(240,320)),
Normalize.from_stats(*imagenet_stats)]
)
dls = biwi.dataloaders(path)
dls.show_batch(max_n=9, figsize=(8,6))
look at the underlying tensors
xb,yb = dls.one_batch()
xb.shape,yb.shape
Make sure that you understand why these are the shapes for our mini-batches.
yb[0]
별도의 image regression applicatgion을 사용하지 않았음. 데이터에 레이블 지정 후 독립/종속 변수를 나타내는 데이터의 종류를 알려주면 된다.
As you can see, we haven't had to use a separate image regression application; all we've had to do is label the data, and tell fastai what kinds of data the independent and dependent variables represent.
It's the same for creating our Learner. We will use the same function as before, with one new parameter, and we will be ready to train our model.
cnn_learner
to create our Learner
how we used y_range
to tell fastai the range of our targets?
learn = cnn_learner(dls, resnet18, y_range=(-1,1))
y_range
is implemented in fastai using sigmoid_range
, which is defined as:
def sigmoid_range(x, lo, hi): return torch.sigmoid(x) * (hi-lo) + lo
plot_function(partial(sigmoid_range,lo=-1,hi=1), min=-4, max=4)
손실 함수 지정해주지 않아서 fastai가 기본으로 선택함
dls.loss_func
좌표를 종속변수로 사용할때 배부분 가능한 가까운 것을 예측하려고 해서(MSELoss) 다른 손실 함수 사용하려면 cnn_leaner에 손실함수 파라네터를 전달하면 된다.
This makes sense, since when coordinates are used as the dependent variable, most of the time we're likely to be trying to predict something as close as possible; that's basically what MSELoss
(mean squared error loss) does. If you want to use a different loss function, you can pass it to cnn_learner
using the loss_func
parameter.
learn.lr_find()
lr = 1e-2
learn.fine_tune(3, lr)
Generally when we run this we get a loss of around 0.0001, which corresponds to an average coordinate prediction error of:
math.sqrt(0.0001)
But it's important to take a look at our results with Learner.show_results. The left side are the actual (ground truth) coordinates and the right side are our model's predictions:
learn.show_results(ds_idx=1, nrows=3, figsize=(6,8))
-
nn.CrossEntropyLoss
for single-label classification -
nn.BCEWithLogitsLoss
for multi-label classification -
nn.MSELoss
for regression
- How could multi-label classification improve the usability of the bear classifier?
- How do we encode the dependent variable in a multi-label classification problem?
- How do you access the rows and columns of a DataFrame as if it was a matrix?
- How do you get a column by name from a DataFrame?
- What is the difference between a
Dataset
andDataLoader
? - What does a
Datasets
object normally contain? - What does a
DataLoaders
object normally contain? - What does
lambda
do in Python? - What are the methods to customize how the independent and dependent variables are created with the data block API?
- Why is softmax not an appropriate output activation function when using a one hot encoded target?
- Why is
nll_loss
not an appropriate loss function when using a one-hot-encoded target? - What is the difference between
nn.BCELoss
andnn.BCEWithLogitsLoss
? - Why can't we use regular accuracy in a multi-label problem?
- When is it okay to tune a hyperparameter on the validation set?
- How is
y_range
implemented in fastai? (See if you can implement it yourself and test it without peeking!) - What is a regression problem? What loss function should you use for such a problem?
- What do you need to do to make sure the fastai library applies the same data augmentation to your input images and your target point coordinates?