Seq2SeqSharp
Developer(s) | Zhongkai Fu |
---|---|
Repository | https://github.com/zhongkaifu/Seq2SeqSharp |
Engine | |
Type | Library for deep learning |
License | BSD 3[1] |
Website | github |
Search Seq2SeqSharp on Amazon.
Machine learning and data mining |
---|
Machine-learning venues |
|
Seq2SeqSharp[2] is a tensor based fast & flexible encoder-decoder deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, many different types of encoders/decoders(Transformer, LSTM, BiLSTM and so on), multi-GPUs supported and so on.
History[edit]
Features[edit]
- Pure C# framework
- Deep bi-directional LSTM encoder
- Deep attention based LSTM decoder
- Transformer encoder
- Graph based neural network
- Automatic differentiation
- Tensor based operations
- Running on both CPU and GPU (CUDA)
- Support multi-GPUs
- Mini-batch
- Dropout
- RMSProp optmization
- Embedding & Pre-trained model
- Auto data shuffling
- Auto vocabulary building
- Beam search decoder
- Visualize neural network
How it works[edit]
Benefit from automatic differentiation, tensor based compute graph and built-in operations, neural network can get built by a few code, and the framework will automatically build the corresponding backward part for you, and make the network could run on multi-GPUs or CPUs. Here is an example about attentioned based LSTM cells in C# code.
/// <summary> /// Update LSTM-Attention cells according to given weights /// </summary> /// <param name="context">The context weights for attention</param> /// <param name="input">The input weights</param> /// <param name="computeGraph">The compute graph to build workflow</param> /// <returns>Update hidden weights</returns> public IWeightTensor Step(IWeightTensor context, IWeightTensor input, IComputeGraph g) { var computeGraph = g.CreateSubGraph(m_name);
var cell_prev = Cell; var hidden_prev = Hidden;
var hxhc = computeGraph.ConcatColumns(input, hidden_prev, context); var hhSum = computeGraph.Affine(hxhc, m_Wxhc, m_b); var hhSum2 = layerNorm1.Process(hhSum, computeGraph);
(var gates_raw, var cell_write_raw) = computeGraph.SplitColumns(hhSum2, m_hdim * 3, m_hdim); var gates = computeGraph.Sigmoid(gates_raw); var cell_write = computeGraph.Tanh(cell_write_raw);
(var input_gate, var forget_gate, var output_gate) = computeGraph.SplitColumns(gates, m_hdim, m_hdim, m_hdim);
// compute new cell activation: ct = forget_gate * cell_prev + input_gate * cell_write Cell = computeGraph.EltMulMulAdd(forget_gate, cell_prev, input_gate, cell_write); var ct2 = layerNorm2.Process(Cell, computeGraph);
Hidden = computeGraph.EltMul(output_gate, computeGraph.Tanh(ct2));
return Hidden; }
Another example about scaled multi-heads attention component which is the core part in Transformer model written by C#.
/// <summary> /// Scaled multi-heads attention component with skip connectioned feed forward layers /// </summary> /// <param name="input">The input tensor</param> /// <param name="g">The instance of computing graph</param> /// <returns></returns> public IWeightTensor Perform(IWeightTensor input, IComputeGraph graph) { IComputeGraph g = graph.CreateSubGraph(m_name);
var seqLen = input.Rows / m_batchSize;
//Input projections var allQ = g.View(Q.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d); var allK = g.View(K.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d); var allV = g.View(V.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
//Multi-head attentions var Qs = g.View(g.Permute(allQ, 2, 0, 1, 3), m_multiHeadNum * m_batchSize, seqLen, m_d); var Ks = g.View(g.Permute(allK, 2, 0, 3, 1), m_multiHeadNum * m_batchSize, m_d, seqLen); var Vs = g.View(g.Permute(allV, 2, 0, 1, 3), m_multiHeadNum * m_batchSize, seqLen, m_d);
// Scaled softmax float scale = 1.0f / (float)Math.Sqrt(m_d); var attn = g.MulBatch(Qs, Ks, m_multiHeadNum * m_batchSize, scale); var attn2 = g.View(attn, m_multiHeadNum * m_batchSize * seqLen, seqLen);
var softmax = g.Softmax(attn2); var softmax2 = g.View(softmax, m_multiHeadNum * m_batchSize, seqLen, seqLen); var o = g.View(g.MulBatch(softmax2, Vs, m_multiHeadNum * m_batchSize), m_multiHeadNum, m_batchSize, seqLen, m_d); var W = g.View(g.Permute(o, 1, 2, 0, 3), m_batchSize * seqLen, m_multiHeadNum * m_d);
// Output projection var finalAttResults = g.Affine(W, W0, b0);
//Skip connection and layer normaliztion var addedAttResult = g.Add(finalAttResults, input); var normAddedAttResult = layerNorm1.Process(addedAttResult, g);
//Feed forward var ffnResult = feedForwardLayer1.Process(normAddedAttResult, g); var reluFFNResult = g.Relu(ffnResult); var ffn2Result = feedForwardLayer2.Process(reluFFNResult, g);
//Skip connection and layer normaliztion var addFFNResult = g.Add(ffn2Result, normAddedAttResult); var normAddFFNResult = layerNorm2.Process(addFFNResult, g);
return normAddFFNResult; }
```
See also[edit]
References[edit]
- ↑ "Seq2SeqSharp LICENSE". GitHub.
- ↑ "Seq2SeqSharp Project". https://github.com/zhongkaifu/Seq2SeqSharp. Retrieved 2019-10-10. External link in
|website=
(help) - ↑ "Seq2SeqSharp: a tensor based fast and flexible encoder-decoder deep neural network framework written by .NET (C#)". GitHub.
External links[edit]
Modify the page according to the feedback from reviewer[edit]
This article "Seq2SeqSharp" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Seq2SeqSharp. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.