借用王树森老师的图总结一下FiBiNet的设计:
1. RecSys中,首先将离散特征embed化,每个离散特征对应一个K维的embedding向量,得到embeding矩阵M;
2. 在原有DNN结构基础上,FiBiNet新增了红框中的子网络结构:
a. 直接把M所有的特征Embeding拼接,产出张量A;
b. 对M中所有特征进行Bilinear运算,产出张量B;
c. 先将M中所有特征进行SENet运算,然后再通过Bilinear运算,产出张量C。
3. 将A、B、C与连续特征拼接到一起,作为上层网络的输入。
三、FiBiNet实践
基于paddle框架实现了FiBiNet,这里主要给出SENet和Bilinear interaction Layer两个模块的的实现代码。
(一)、SENet
def _senet(self, all_emb, reduction_ratio=3):"""Func:implementation of senetArgs:all_emb: a lod tensor, shape is (-1 slot_nums, embed_dim)reduction_ratio: integer, the ratio of fcOutput:a lod tensor, shape is (-1 slot_nums, embed_dim)"""slot_nums = all_emb.shape[1] # 获取特征个数fc_unit = max(1, slot_nums // reduction_ratio) # 计算FC层的神经元个数################## squeeze ##################squeeze_emb = layers.reduce_mean(all_emb, dim=-1) # (-1, slot_nums, 1)falten_emb = layers.flatten(squeeze_emb) # (-1, slot_nums) print('feature nums is ' + str(slot_nums))print('falten_emb shape is:' + str(falten_emb.shape))################## excitation ##################weight = layers.fc(input=falten_emb, size=fc_unit, act='relu',param_attr =fluid.ParamAttr(learning_rate=1.0,initializer=fluid.initializer.NormalInitializer(loc=0.0, scale=self._init_range / (slot_nums ** 0.5)),name="se_w_1"),bias_attr =fluid.ParamAttr(learning_rate=1.0,initializer=fluid.initializer.NormalInitializer(loc=0.0, scale=self._init_range / (slot_nums ** 0.5)),name="se_b_1")) # (-1, fc_unit)weight = layers.fc(input=weight, size=slot_nums, act='relu',param_attr =fluid.ParamAttr(learning_rate=1.0,initializer=fluid.initializer.NormalInitializer(loc=0.0, scale=self._init_range / (fc_unit ** 0.5)),name="se_w_2"),bias_attr =fluid.ParamAttr(learning_rate=1.0,initializer=fluid.initializer.NormalInitializer(loc=0.0, scale=self._init_range / (fc_unit ** 0.5)),name="se_b_2")) # (-1, slot_nums)################## re_weight ##################out = layers.elementwise_mul(all_emb, layers.unsqueeze(weight, axes=[2])) # (-1, slot_nums, embed_dim) * (-1, slot_nums, 1)print('senet out shape is: ' + str(out.shape)) # (-1 slot_nums embed_dim)return out
(二)、Bilinear Interaction Layer
def _bilinear_interaction_layer(self, all_emb, mode):"""Func:implementation of bilinear interaction layerArgs:all_emb: an embedding which has concated all embed , shape is (-1, slot_nums, embed_dim)"""slot_nums = all_emb.shape[1]embed_dim = all_emb.shape[2]emb_list = layers.split(all_emb, num_or_sections=slot_nums, dim=1)emb_list = [layers.squeeze(emb, axes=[1]) for emb in emb_list] # list, ele shape is (-1 embed_dim)if mode == "field_all":# 构建一个共享的参数矩阵W = layers.create_parameter(shape=[embed_dim, embed_dim], dtype='float32')# 先计算点积vidots = [layers.matmul(emb, W) for emb in emb_list] # (-1 embed_dim)# 计算Hadamard Productp_ij = [fluid.layers.elementwise_mul(vidots[i], emb_list[j])for i, j in itertools.combinations(range(slot_nums), 2)] # (-1 embed_dim)output = layers.concat(p_ij, axis=-1) # (-1 embed_dim * slot_nums)return outputelif mode == "field_each":# 构建参数矩阵,数量与slot_nums保持一致W_list = [layers.create_parameter(shape=[embed_dim, embed_dim], dtype='float32') for _ in range(slot_nums)]# 计算点积vidots = [layers.matmul(emb_list[1], W_list[i]) for i in range(slot_nums)]# 计算 Hadamard productp_ij = [layers.elementwise_mul(vidots[i], emb_list[j])for i, j in itertools.combinations(range(slot_nums), 2)] # (-1 embed_dim)output = layers.concat(p_ij, axis=-1) # (-1 embed_dim * slot_nums)return outputelif mode == "field_interaction":W_list = [layers.create_parameter(shape=[embed_dim, embed_dim], dtype='float32') for _, _ in itertools.combinations(range(slot_nums), 2)]p_ij = [layers.elementwise_mul(layers.matmul(v[0], w), v[1])for v, w in zip(itertools.combinations(emb_list, 2), self.W_list)]else:raise NotImplementedError
三、FiBiNet存在的问题
原文中把所有的特征embedding都进行双线性特征交叉,这一部分会带来巨大的参数量,也导致线上推理时长和内存存储的增加,因此在实现时,可以根据具体业务,选择出必要的特征进行交叉。