Siamese networks

Siamese networks is a class of neural networks used for learning representations of data.^[1], often referred to as Feature learning. This type of networks are trained in a supervised manner to identify if any given pair of data points are coming from the same or from different classes. The objective of the Siamese architecture is not to classify input data, but to differentiate between them. But the Siamese networks can be viewed as binary classifiers that operate on a pair of inputs. The classifier generates a binary output depending on whether the two inputs share the same class or not. In the simplest variant of the Siamese networks, a function $f_{\theta }(\cdot )$ is modeled by a neural network using the parameter set $\theta$ . The function takes in raw data like images at its input and outputs a vector of a smaller dimension. The parameter set $\theta$ is initialized randomly and optimized over the samples in a given data set by minimizing an appropriate loss function over $\theta$ using gradient descent.

Loss functions[edit]

Most common loss functions are the variants of contrastive loss^[2] and triplet loss^[3]

Contrastive loss[edit]

The contrastive loss^[2] function is formulated by considering two training examples $\mathbf {x} _{i}$ and $\mathbf {x} _{j}$ along with their labels $y_{i}$ and $y_{j}$ . Given a positive scalar value $t$ the contrastive loss is

${\mathcal {L}}_{\text{cnt}}=\mathbf {1} [y_{i}=y_{j}]\lVert f_{\theta }(\mathbf {x} _{i})-f_{\theta }(\mathbf {x} _{j})\rVert ^{2}+\mathbf {1} [y_{i}\neq y_{j}]\max(0,t-\lVert f_{\theta }(\mathbf {x} _{i})-f_{\theta }(\mathbf {x} _{j})\rVert ^{2})$ ,

where $\mathbf {1} [\cdot ]$ denotes the indicator function and $\lVert \cdot \rVert$ denotes the Euclidean distance. The $\max(0,\cdot )$ operator is used to lower bound the loss when $y_{i}\neq y_{j}$ . Intuitively, minimizing ${\mathcal {L}}_{\text{cnt}}$ forces $f_{\theta }(\mathbf {x} _{i})$ and $f_{\theta }(\mathbf {x} _{j})$ vectors to be closer to each other when $y_{i}=y_{j}$ , and to separate by a distance of ${\sqrt {t}}$ when $y_{i}\neq y_{j}$ . Thus, reflecting the semantic similarity of raw data in the transformed domain.

Triplet loss[edit]

The triplet loss^[3] shares a similar concept, but uses three examples $\mathbf {x} _{i}$ , $\mathbf {x} _{j}$ and $\mathbf {x} _{l}$ along with their labels $y_{i}$ , $y_{j}$ and $y_{l}$ that satisfy the property $y_{i}=y_{j}\neq y_{l}$ . Given a positive value $\alpha$ , the triplet loss is defined as

${\mathcal {L}}_{\text{trp}}=\max(0,\alpha +\lVert f_{\theta }(\mathbf {x} _{i})-f_{\theta }(\mathbf {x} _{j})\rVert ^{2}-\lVert f_{\theta }(\mathbf {x} _{i})-f_{\theta }(\mathbf {x} _{l})\rVert ^{2})$ .

Minimizing the triplet loss enforces the inequality $\alpha +\lVert f_{\theta }(\mathbf {x} _{i})-f_{\theta }(\mathbf {x} _{j})\rVert ^{2}<\lVert f_{\theta }(\mathbf {x} _{i})-f_{\theta }(\mathbf {x} _{l})\rVert ^{2})$ is satisfied, where the $\max(0,\cdot )$ operator ensures the loss is lower bounded at zero. Intuitively, this means that the distance between points that share the class label are smaller by at least an ${\sqrt {\alpha }}$ amount, than the distance between points of different classes.

↑ Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML Deep Learning Workshop. Vol. 2. 2015.
↑ ^2.0 ^2.1 Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." Computer vision and pattern recognition, 2006 IEEE computer society conference on. Vol. 2. IEEE, 2006.
↑ ^3.0 ^3.1 Hoffer, Elad, and Nir Ailon. "Deep metric learning using triplet network." International Workshop on Similarity-Based Pattern Recognition. Springer, Cham, 2015.

Siamese networks[edit]

This article "Siamese networks" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Siamese networks. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

[1] Koch, Gregory, Richard Zemel, and Ruslan Salakhutdinov. "Siamese neural networks for one-shot image recognition." ICML Deep Learning Workshop. Vol. 2. 2015.

[:0-2] 2.0 ^2.1 Hadsell, Raia, Sumit Chopra, and Yann LeCun. "Dimensionality reduction by learning an invariant mapping." Computer vision and pattern recognition, 2006 IEEE computer society conference on. Vol. 2. IEEE, 2006.

[:1-3] 3.0 ^3.1 Hoffer, Elad, and Nir Ailon. "Deep metric learning using triplet network." International Workshop on Similarity-Based Pattern Recognition. Springer, Cham, 2015.

[1]

[2]

[3]