Seq2SeqSharp
| Developer(s) | Zhongkai Fu |
|---|---|
| Repository | https://github.com/zhongkaifu/Seq2SeqSharp |
| Engine | |
| Type | Library for deep learning |
| License | BSD 3[1] |
| Website | github |
Search Seq2SeqSharp on Amazon.
| Machine learning and data mining |
|---|
|
Machine-learning venues |
|
|
Seq2SeqSharp[2] is a tensor based fast & flexible encoder-decoder deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, many different types of encoders/decoders (Transformer, LSTM, BiLSTM and so on), multi-GPUs supported and so on.
History
Features
- Pure C# framework
- Deep bi-directional LSTM encoder
- Deep attention based LSTM decoder
- Transformer encoder
- Graph based neural network
- Automatic differentiation
- Tensor based operations
- Running on both CPU and GPU (CUDA)
- Support multi-GPUs
- Mini-batch
- Dropout
- RMSProp optimization
- Embedding & Pre-trained model
- Auto data shuffling
- Auto vocabulary building
- Beam search decoder
- Visualize neural network
How it works
Benefit from automatic differentiation, tensor based compute graph and built-in operations, neural network can be built with a few lines of code, and the framework will automatically build the corresponding backward part for you, and make the network run on multi-GPUs or CPUs. Here is an example about attention based LSTM cells in C# code.
/// <summary>
/// Update LSTM-Attention cells according to given weights
/// </summary>
/// <param name="context">The context weights for attention</param>
/// <param name="input">The input weights</param>
/// <param name="computeGraph">The compute graph to build workflow</param>
/// <returns>Update hidden weights</returns>
public IWeightTensor Step(IWeightTensor context, IWeightTensor input, IComputeGraph g)
{
var computeGraph = g.CreateSubGraph(m_name);
var cell_prev = Cell;
var hidden_prev = Hidden;
var hxhc = computeGraph.ConcatColumns(input, hidden_prev, context);
var hhSum = computeGraph.Affine(hxhc, m_Wxhc, m_b);
var hhSum2 = layerNorm1.Process(hhSum, computeGraph);
(var gates_raw, var cell_write_raw) = computeGraph.SplitColumns(hhSum2, m_hdim * 3, m_hdim);
var gates = computeGraph.Sigmoid(gates_raw);
var cell_write = computeGraph.Tanh(cell_write_raw);
(var input_gate, var forget_gate, var output_gate) = computeGraph.SplitColumns(gates, m_hdim, m_hdim, m_hdim);
// compute new cell activation: ct = forget_gate * cell_prev + input_gate * cell_write
Cell = computeGraph.EltMulMulAdd(forget_gate, cell_prev, input_gate, cell_write);
var ct2 = layerNorm2.Process(Cell, computeGraph);
Hidden = computeGraph.EltMul(output_gate, computeGraph.Tanh(ct2));
return Hidden;
}
Another example about scaled multi-heads attention component which is the core part in Transformer model written by C#.
/// <summary>
/// Scaled multi-heads attention component with skip connectioned feed forward layers
/// </summary>
/// <param name="input">The input tensor</param>
/// <param name="g">The instance of computing graph</param>
/// <returns></returns>
public IWeightTensor Perform(IWeightTensor input, IComputeGraph graph)
{
IComputeGraph g = graph.CreateSubGraph(m_name);
var seqLen = input.Rows / m_batchSize;
//Input projections
var allQ = g.View(Q.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
var allK = g.View(K.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
var allV = g.View(V.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
//Multi-head attentions
var Qs = g.View(g.Permute(allQ, 2, 0, 1, 3), m_multiHeadNum * m_batchSize, seqLen, m_d);
var Ks = g.View(g.Permute(allK, 2, 0, 3, 1), m_multiHeadNum * m_batchSize, m_d, seqLen);
var Vs = g.View(g.Permute(allV, 2, 0, 1, 3), m_multiHeadNum * m_batchSize, seqLen, m_d);
// Scaled softmax
float scale = 1.0f / (float)Math.Sqrt(m_d);
var attn = g.MulBatch(Qs, Ks, m_multiHeadNum * m_batchSize, scale);
var attn2 = g.View(attn, m_multiHeadNum * m_batchSize * seqLen, seqLen);
var softmax = g.Softmax(attn2);
var softmax2 = g.View(softmax, m_multiHeadNum * m_batchSize, seqLen, seqLen);
var o = g.View(g.MulBatch(softmax2, Vs, m_multiHeadNum * m_batchSize), m_multiHeadNum, m_batchSize, seqLen, m_d);
var W = g.View(g.Permute(o, 1, 2, 0, 3), m_batchSize * seqLen, m_multiHeadNum * m_d);
// Output projection
var finalAttResults = g.Affine(W, W0, b0);
//Skip connection and layer normaliztion
var addedAttResult = g.Add(finalAttResults, input);
var normAddedAttResult = layerNorm1.Process(addedAttResult, g);
//Feed forward
var ffnResult = feedForwardLayer1.Process(normAddedAttResult, g);
var reluFFNResult = g.Relu(ffnResult);
var ffn2Result = feedForwardLayer2.Process(reluFFNResult, g);
//Skip connection and layer normaliztion
var addFFNResult = g.Add(ffn2Result, normAddedAttResult);
var normAddFFNResult = layerNorm2.Process(addFFNResult, g);
return normAddFFNResult;
}
See also
References
- ↑ "Seq2SeqSharp LICENSE". GitHub.
- ↑ "Seq2SeqSharp Project". https://github.com/zhongkaifu/Seq2SeqSharp. Retrieved 2019-10-10. External link in
|website=(help) - ↑ "Seq2SeqSharp: a tensor based fast and flexible encoder-decoder deep neural network framework written by .NET (C#)". GitHub.
External links
Modify the page according to the feedback from reviewer
This article "Seq2SeqSharp" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Seq2SeqSharp. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.
