PyTorch/TensorFlow 自定义层与模块：构建独特神经网络结构 - 智猿学院-前后端，数据库，人工智能，云计算等领域前沿技术讲座

好的，咱们今天就来聊聊PyTorch和TensorFlow里那些“定制款”的神经网络零部件——自定义层和模块。别害怕，虽然听起来高大上，但其实就像搭乐高一样，只要掌握了基本原理，就能拼出属于你自己的“变形金刚”。

开场白：为啥要“定制”？

话说回来，PyTorch和TensorFlow自带的那些层和模块，已经够我们用一阵子了。比如卷积层、全连接层、RNN、LSTM等等，都是神经网络界的“常青树”。那为啥还要费劲巴拉地自己写呢？原因很简单：

需求不一样啊！ 有时候，你遇到的问题比较特殊，现成的模块没法直接套用。比如，你需要一个具有特定约束的激活函数，或者一个特殊的损失函数，那就得自己动手丰衣足食了。
性能优化！ 框架提供的模块，虽然通用性强，但有时候为了适应各种情况，牺牲了一些性能。如果你能针对特定硬件或者算法进行优化，就能获得更好的效果。
研究需要！ 为了探索新的神经网络结构或者算法，你可能需要自己实现一些新的层或者模块，验证你的想法。

总之，自定义层和模块，就像是程序员的“瑞士军刀”，能让你更灵活地解决问题。

PyTorch：一切皆对象

在PyTorch里，构建自定义层和模块，主要依赖torch.nn.Module这个基类。咱们先从最简单的自定义层开始：

1. 自定义层（Layer）：激活函数变变变

import torch
import torch.nn as nn
import torch.nn.functional as F

class MyReLU(nn.Module):
    def __init__(self, threshold=0.0):
        super(MyReLU, self).__init__()
        self.threshold = threshold

    def forward(self, x):
        return torch.where(x > self.threshold, x, torch.zeros_like(x))

# 使用方法
my_relu = MyReLU(threshold=0.5)
input_tensor = torch.randn(10)
output_tensor = my_relu(input_tensor)
print("Input:", input_tensor)
print("Output:", output_tensor)

__init__： 构造函数，用于初始化层的参数。这里我们设置了一个阈值 threshold，可以控制ReLU的激活范围。 super(MyReLU, self).__init__() 这一句是必须的，用于调用父类的构造函数。
forward： 前向传播函数，定义了层的计算逻辑。这里我们使用了torch.where函数，根据输入是否大于阈值，选择输出值。

2. 自定义模块（Module）：搭积木的快乐

自定义模块，其实就是把多个层组合在一起，形成一个更复杂的结构。比如，我们可以自定义一个简单的全连接神经网络：

class MyNetwork(nn.Module):
    def __init__(self, input_size, hidden_size, output_size):
        super(MyNetwork, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size)
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, output_size)

    def forward(self, x):
        x = self.fc1(x)
        x = self.relu(x)
        x = self.fc2(x)
        return x

# 使用方法
input_size = 10
hidden_size = 5
output_size = 2
my_network = MyNetwork(input_size, hidden_size, output_size)
input_tensor = torch.randn(1, input_size) # 注意batch维度
output_tensor = my_network(input_tensor)
print("Output shape:", output_tensor.shape)

在__init__函数里，我们定义了两个全连接层(nn.Linear)和一个ReLU激活函数。
在forward函数里，我们将这些层依次连接起来，完成前向传播。

3. 关于参数的“管理”

PyTorch会自动跟踪nn.Module中定义的参数。你可以通过model.parameters()方法访问所有参数，用于优化。此外，你还可以使用named_parameters()方法，获取参数的名字。

for name, param in my_network.named_parameters():
    print(name, param.shape)

# 输出：
# fc1.weight torch.Size([5, 10])
# fc1.bias torch.Size([5])
# fc2.weight torch.Size([2, 5])
# fc2.bias torch.Size([2])

4. 进阶技巧：注册Buffer和Parameter

除了nn.Module自带的参数管理机制，你还可以手动注册Buffer和Parameter。

Buffer： 用于存储不需要优化的状态信息。比如，BatchNorm层中的running_mean和running_var。
Parameter： 用于存储需要优化的参数。虽然nn.Parameter可以直接定义参数，但是nn.Module会自动管理，因此我们通常都使用nn.Module的方式定义参数。

class MyLayer(nn.Module):
    def __init__(self, num_features):
        super(MyLayer, self).__init__()
        # 注册一个Parameter
        self.weight = nn.Parameter(torch.randn(num_features))
        # 注册一个Buffer
        self.register_buffer('running_mean', torch.zeros(num_features))

    def forward(self, x):
        # 使用Parameter
        output = x * self.weight
        # 更新Buffer (这里只是一个例子，实际应用可能更复杂)
        self.running_mean = 0.9 * self.running_mean + 0.1 * torch.mean(x, dim=0)
        return output

TensorFlow/Keras：函数式编程与面向对象编程的结合

在TensorFlow/Keras中，自定义层和模型有两种主要方式：函数式编程和面向对象编程。

1. 函数式编程：构建计算图

函数式编程的思想是，将神经网络视为一个计算图，通过组合各种函数（层）来构建模型。

import tensorflow as tf
from tensorflow.keras import layers

def my_dense_block(x, units, activation='relu'):
    """自定义Dense块"""
    x = layers.Dense(units, activation=activation)(x)
    x = layers.BatchNormalization()(x)
    return x

# 构建模型
input_tensor = tf.keras.Input(shape=(10,))
x = my_dense_block(input_tensor, 64)
x = my_dense_block(x, 32)
output_tensor = layers.Dense(1)(x)

model = tf.keras.Model(inputs=input_tensor, outputs=output_tensor)

# 编译模型
model.compile(optimizer='adam', loss='mse')

# 打印模型结构
model.summary()

我们定义了一个my_dense_block函数，用于构建一个包含Dense层和BatchNormalization层的块。
通过调用这个函数，我们可以轻松地构建更复杂的模型。

2. 面向对象编程：继承tf.keras.layers.Layer

面向对象编程的方式，与PyTorch类似，通过继承tf.keras.layers.Layer类来定义自定义层。

class MyDense(layers.Layer):
    def __init__(self, units, activation=None):
        super(MyDense, self).__init__()
        self.units = units
        self.activation = tf.keras.activations.get(activation) # 获取激活函数

    def build(self, input_shape):
        # 初始化权重和偏置
        self.w = self.add_weight(shape=(input_shape[-1], self.units),
                                  initializer='random_normal',
                                  trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                                  initializer='zeros',
                                  trainable=True)

    def call(self, inputs):
        # 前向传播
        x = tf.matmul(inputs, self.w) + self.b
        if self.activation is not None:
            x = self.activation(x)
        return x

# 使用方法
my_dense = MyDense(units=32, activation='relu')
input_tensor = tf.random.normal((1, 10))
output_tensor = my_dense(input_tensor)
print("Output shape:", output_tensor.shape) # 输出: Output shape: (1, 32)

__init__： 构造函数，用于初始化层的参数。
build： 在第一次调用call函数之前被调用，用于初始化权重和偏置等参数。这里我们使用了self.add_weight方法来创建可训练的变量。 input_shape 参数在 build 函数中可用，确保你能根据输入维度来创建权重。
call： 前向传播函数，定义了层的计算逻辑。

3. 自定义模型：继承tf.keras.Model

与自定义层类似，自定义模型也通过继承tf.keras.Model类来实现。

class MyModel(tf.keras.Model):
    def __init__(self, num_classes):
        super(MyModel, self).__init__()
        self.dense1 = MyDense(64, activation='relu')
        self.dense2 = MyDense(num_classes, activation='softmax')

    def call(self, inputs):
        x = self.dense1(inputs)
        x = self.dense2(x)
        return x

# 使用方法
num_classes = 10
my_model = MyModel(num_classes=num_classes)
input_tensor = tf.random.normal((1, 784)) # 例如 MNIST 数据
output_tensor = my_model(input_tensor)
print("Output shape:", output_tensor.shape) # 输出: Output shape: (1, 10)

在__init__函数里，我们定义了两个自定义的MyDense层。
在call函数里，我们将这些层依次连接起来，完成前向传播。

4. @tf.function：性能加速神器

TensorFlow的@tf.function装饰器可以将Python函数编译成TensorFlow计算图，从而提高性能。

@tf.function
def my_function(x):
    return x * 2

input_tensor = tf.constant(1.0)
output_tensor = my_function(input_tensor)
print(output_tensor)

你可以将@tf.function装饰器应用到自定义层的call函数上，以获得更好的性能。

PyTorch vs TensorFlow/Keras：一些差异

特性	PyTorch	TensorFlow/Keras
编程风格	更偏向于命令式编程	函数式和面向对象编程结合
动态图 vs 静态图	动态图，更灵活，易于调试	默认静态图，需要`@tf.function`进行编译，性能更好
自定义层	继承`nn.Module`，实现`forward`函数	继承`layers.Layer`，实现`build`和`call`函数
自定义模型	继承`nn.Module`，实现`forward`函数	继承`tf.keras.Model`，实现`call`函数

一些实用的例子

带注意力机制的层 (Attention Layer)

# PyTorch 实现
class AttentionLayer(nn.Module):
    def __init__(self, input_dim):
        super(AttentionLayer, self).__init__()
        self.linear = nn.Linear(input_dim, 1)

    def forward(self, x):
        # x: (batch_size, seq_len, input_dim)
        attention_weights = torch.softmax(self.linear(x).squeeze(-1), dim=1) # (batch_size, seq_len)
        weighted_x = x * attention_weights.unsqueeze(-1) # (batch_size, seq_len, input_dim)
        return weighted_x.sum(dim=1) # (batch_size, input_dim)

# TensorFlow 实现
class AttentionLayer(tf.keras.layers.Layer):
    def __init__(self):
        super(AttentionLayer, self).__init__()

    def build(self, input_shape):
        self.W = self.add_weight(name='attention_weight',
                                  shape=(input_shape[-1], 1),
                                  initializer='random_normal',
                                  trainable=True)

    def call(self, inputs):
        # inputs: (batch_size, seq_len, input_dim)
        attention_weights = tf.nn.softmax(tf.matmul(inputs, self.W), axis=1)
        weighted_inputs = inputs * attention_weights
        return tf.reduce_sum(weighted_inputs, axis=1)

自定义损失函数 (Custom Loss Function)

# PyTorch 实现
def custom_loss(outputs, targets):
    # outputs 和 targets 是模型输出和真实标签
    loss = torch.mean((outputs - targets)**2) # 例如，均方误差
    return loss

# TensorFlow 实现
def custom_loss(targets, outputs): # 注意参数顺序！
    # outputs 和 targets 是模型输出和真实标签
    loss = tf.reduce_mean((outputs - targets)**2) # 例如，均方误差
    return loss

循环神经网络中的自定义单元 (Custom RNN Cell)

这个例子比较复杂，但是可以展示如何在循环神经网络中使用自定义的计算逻辑。

# PyTorch 实现
class MyRNNCell(nn.Module):
    def __init__(self, input_size, hidden_size):
        super(MyRNNCell, self).__init__()
        self.linear_ih = nn.Linear(input_size, hidden_size)
        self.linear_hh = nn.Linear(hidden_size, hidden_size)

    def forward(self, input, hidden):
        combined = self.linear_ih(input) + self.linear_hh(hidden)
        hidden = torch.tanh(combined)
        return hidden

# 使用自定义 Cell
rnn_cell = MyRNNCell(input_size=10, hidden_size=20)
rnn = nn.RNN(rnn_cell, num_layers=1) # 注意这里可以直接传入一个 Cell 实例

# TensorFlow 实现
class MyRNNCell(tf.keras.layers.Layer):
    def __init__(self, units):
        super(MyRNNCell, self).__init__()
        self.units = units
        self.state_size = units # 必须定义 state_size

    def build(self, input_shape):
        self.W_ih = self.add_weight(shape=(input_shape[-1], self.units),
                                      initializer='random_normal',
                                      trainable=True)
        self.W_hh = self.add_weight(shape=(self.units, self.units),
                                      initializer='random_normal',
                                      trainable=True)
        self.b = self.add_weight(shape=(self.units,),
                                  initializer='zeros',
                                  trainable=True)

    def call(self, inputs, states):
        prev_output = states[0] # 获取之前的状态
        h = tf.tanh(tf.matmul(inputs, self.W_ih) + tf.matmul(prev_output, self.W_hh) + self.b)
        return h, [h] # 返回新的输出和状态

# 使用自定义 Cell
my_cell = MyRNNCell(units=20)
rnn = tf.keras.layers.RNN(my_cell, return_sequences=True, return_state=True)

总结：自由发挥，创造无限可能

自定义层和模块，是深度学习框架提供的强大工具，能让你根据自己的需求，构建独特的神经网络结构。无论是PyTorch还是TensorFlow/Keras，都提供了灵活的方式来实现自定义功能。掌握了这些技巧，你就能像搭积木一样，创造出属于你自己的“变形金刚”，解决各种复杂的深度学习问题。记住，大胆尝试，勇于创新，你也能成为神经网络界的“定制大师”！

发表回复 取消回复

发表回复取消回复