行为识别与分析是计算机视觉领域的一个重要研究方向。通过分析视频中的行为和动作,我们可以为许多实际应用提供智能解决方案,如智能监控、安防、医疗康复、体育分析等
实际项目:基于3D卷积神经网络(3D-CNN)的行为识别
-
数据准备:我们将使用 UCF101 数据集进行训练和评估。这个数据集包含了101个行为类别,共有约13000段视频。首先,我们需要下载并解压数据集:
import urllib.request import zipfile url = "https://www.crcv.ucf.edu/data/UCF101/UCF101.rar" file_name = "UCF101.rar" urllib.request.urlretrieve(url, file_name) with zipfile.ZipFile(file_name, 'r') as zip_ref: zip_ref.extractall("UCF101") 复制代码
2.数据预处理:将视频帧提取为图像序列,并进行缩放、裁剪和归一化等操作。这里,我们将使用 OpenCV 库来处理视频文件:
import cv2 import os def video_to_frames(video_file, target_folder, frame_size=(224, 224)): video = cv2.VideoCapture(video_file) count = 0 while video.isOpened(): ret, frame = video.read() if not ret: break resized_frame = cv2.resize(frame, frame_size) frame_path = os.path.join(target_folder, f"frame_{count:04d}.png") cv2.imwrite(frame_path, resized_frame) count += 1 video.release() return count 复制代码
3.模型构建:我们将使用一个预先训练的 3D-CNN 模型(如 I3D 或 C3D)作为基础模型,并根据 UCF101 数据集进行微调。这里,我们将使用 PyTorch 框架构建模型:
import torch import torch.nn as nn from torchvision.models import video class ActionRecognitionModel(nn.Module): def __init__(self, num_classes): super().__init__() self.base_model = video.r3d_18(pretrained=True) self.base_model.fc = nn.Linear(self.base_model.fc.in_features, num_classes) def forward(self, x): return self.base_model(x) 复制代码
4.训练与评估:我们将使用 PyTorch 训练和评估我们的行为识别模型。首先,我们需要定义损失函数、优化器和学习率调整策略。然后,我们将使用 UCF101 数据集进行训练,并在每个训练周期结束时评估模型的性能。
import torch.optim as optim from torch.utils.data import DataLoader from torchvision.transforms import Compose, ToTensor, Normalize from custom_dataset import UCF101Dataset # Hyperparameters num_classes = 101 num_epochs = 20 batch_size = 16 learning_rate = 0.001 momentum = 0.9 weight_decay = 0.0005 # Dataset and DataLoader train_transforms = Compose([ToTensor(), Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) val_transforms = Compose([ToTensor(), Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])]) train_dataset = UCF101Dataset("UCF101/train", train_transforms) val_dataset = UCF101Dataset("UCF101/val", val_transforms) train_loader = DataLoader(train_dataset, batch_size=batch_size, shuffle=True, num_workers=4) val_loader = DataLoader(val_dataset, batch_size=batch_size, shuffle=False, num_workers=4) # Model, Loss, Optimizer, and Scheduler device = torch.device("cuda" if torch.cuda.is_available() else "cpu") model = ActionRecognitionModel(num_classes).to(device) criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=learning_rate, momentum=momentum, weight_decay=weight_decay) scheduler = optim.lr_scheduler.StepLR(optimizer, step_size=5, gamma=0.1) # Training and Evaluation Loop for epoch in range(num_epochs): # Training model.train() running_loss = 0.0 for i, (inputs, labels) in enumerate(train_loader): inputs, labels = inputs.to(device), labels.to(device) optimizer.zero_grad() outputs = model(inputs) loss = criterion(outputs, labels) loss.backward() optimizer.step() running_loss += loss.item() # Evaluation model.eval() correct = 0 total = 0 with torch.no_grad(): for inputs, labels in val_loader: inputs, labels = inputs.to(device), labels.to(device) outputs = model(inputs) _, predicted = torch.max(outputs.data, 1) total += labels.size(0) correct += (predicted == labels).sum().item() print(f"Epoch {epoch + 1}, Loss: {running_loss / len(train_loader)}, Accuracy: {correct / total}") # Update learning rate scheduler.step() 复制代码
这段代码首先设置了训练的超参数,接着创建了训练和验证数据集以及相应的 DataLoader。接下来,我们定义了模型、损失函数、优化器和学习率调整策略。在训练和评估循环中,我们在每个训练周期结束时计算损失和准确率,并根据需要调整学习率。请注意,这里的
UCF101Dataset
类需要自行实现,您可以参考以下示例代码创建自定义的数据集类:import os import glob from torch.utils.data import Dataset from PIL import Image class UCF101Dataset(Dataset): def __init__(self, data_folder, transform=None): self.data_folder = data_folder self.transform = transform self.samples = self._load_samples() def __len__(self): return len(self.samples) def __getitem__(self, index): video_folder, label = self.samples[index] frames = self._load_frames(video_folder) if self.transform is not None: frames = [self.transform(frame) for frame in frames] video_tensor = torch.stack(frames) return video_tensor, label def _load_samples(self): samples = [] for class_folder in glob.glob(os.path.join(self.data_folder, "*")): label = os.path.basename(class_folder) for video_folder in glob.glob(os.path.join(class_folder, "*")): samples.append((video_folder, label)) return samples def _load_frames(self, video_folder): frame_paths = sorted(glob.glob(os.path.join(video_folder, "*.png"))) frames = [Image.open(frame_path).convert("RGB") for frame_path in frame_paths] return frames 复制代码
在完成训练和评估后,您可以使用训练好的模型来识别新视频中的行为。为了提高模型的性能,您可以尝试使用更复杂的 3D-CNN 架构,或者结合其他技术,如循环神经网络(RNN)或长短时记忆网络(LSTM)。
总之,行为识别与分析是计算机视觉领域的一项重要技术。在本文中,我们介绍了一个基于 3D-CNN 的行为识别项目,并详细解释了项目步骤及相关代码。这种技术可以广泛应用于各种实际场景,如智能监控、安防、医疗康复等。