You can edit almost every page by Creating an account. Otherwise, see the FAQ.

Seq2SeqSharp

From EverybodyWiki Bios & Wiki



Seq2SeqSharp
Developer(s)Zhongkai Fu
Repositoryhttps://github.com/zhongkaifu/Seq2SeqSharp
Engine
    TypeLibrary for deep learning
    LicenseBSD 3[1]
    Websitegithub.com/zhongkaifu/Seq2SeqSharp

    Search Seq2SeqSharp on Amazon.

    Seq2SeqSharp[2] is a tensor based fast & flexible encoder-decoder deep neural network framework written by .NET (C#). It has many highlighted features, such as automatic differentiation, many different types of encoders/decoders(Transformer, LSTM, BiLSTM and so on), multi-GPUs supported and so on.

    History[edit]

    It is hosted at GitHub.[3]

    Features[edit]

    • Pure C# framework
    • Deep bi-directional LSTM encoder
    • Deep attention based LSTM decoder
    • Transformer encoder
    • Graph based neural network
    • Automatic differentiation
    • Tensor based operations
    • Running on both CPU and GPU (CUDA)
    • Support multi-GPUs
    • Mini-batch
    • Dropout
    • RMSProp optmization
    • Embedding & Pre-trained model
    • Auto data shuffling
    • Auto vocabulary building
    • Beam search decoder
    • Visualize neural network

    How it works[edit]

    Benefit from automatic differentiation, tensor based compute graph and built-in operations, neural network can get built by a few code, and the framework will automatically build the corresponding backward part for you, and make the network could run on multi-GPUs or CPUs. Here is an example about attentioned based LSTM cells in C# code.

           /// <summary>
           /// Update LSTM-Attention cells according to given weights
           /// </summary>
           /// <param name="context">The context weights for attention</param>
           /// <param name="input">The input weights</param>
           /// <param name="computeGraph">The compute graph to build workflow</param>
           /// <returns>Update hidden weights</returns>
           public IWeightTensor Step(IWeightTensor context, IWeightTensor input, IComputeGraph g)
           {
               var computeGraph = g.CreateSubGraph(m_name);
    
               var cell_prev = Cell;
               var hidden_prev = Hidden;
    
               var hxhc = computeGraph.ConcatColumns(input, hidden_prev, context);
               var hhSum = computeGraph.Affine(hxhc, m_Wxhc, m_b);
               var hhSum2 = layerNorm1.Process(hhSum, computeGraph);
    
               (var gates_raw, var cell_write_raw) = computeGraph.SplitColumns(hhSum2, m_hdim * 3, m_hdim);
               var gates = computeGraph.Sigmoid(gates_raw);
               var cell_write = computeGraph.Tanh(cell_write_raw);
    
               (var input_gate, var forget_gate, var output_gate) = computeGraph.SplitColumns(gates, m_hdim, m_hdim, m_hdim);
    
               // compute new cell activation: ct = forget_gate * cell_prev + input_gate * cell_write
               Cell = computeGraph.EltMulMulAdd(forget_gate, cell_prev, input_gate, cell_write);
               var ct2 = layerNorm2.Process(Cell, computeGraph);
    
               Hidden = computeGraph.EltMul(output_gate, computeGraph.Tanh(ct2));
    
               return Hidden;
           }
    

    Another example about scaled multi-heads attention component which is the core part in Transformer model written by C#.

           /// <summary>
           /// Scaled multi-heads attention component with skip connectioned feed forward layers
           /// </summary>
           /// <param name="input">The input tensor</param>
           /// <param name="g">The instance of computing graph</param>
           /// <returns></returns>
           public IWeightTensor Perform(IWeightTensor input, IComputeGraph graph)
           {
               IComputeGraph g = graph.CreateSubGraph(m_name);
    
               var seqLen = input.Rows / m_batchSize;
    
               //Input projections
               var allQ = g.View(Q.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
               var allK = g.View(K.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
               var allV = g.View(V.Process(input, g), m_batchSize, seqLen, m_multiHeadNum, m_d);
    
               //Multi-head attentions
               var Qs = g.View(g.Permute(allQ, 2, 0, 1, 3), m_multiHeadNum * m_batchSize, seqLen, m_d);
               var Ks = g.View(g.Permute(allK, 2, 0, 3, 1), m_multiHeadNum * m_batchSize, m_d, seqLen);
               var Vs = g.View(g.Permute(allV, 2, 0, 1, 3), m_multiHeadNum * m_batchSize, seqLen, m_d);
    
               // Scaled softmax
               float scale = 1.0f / (float)Math.Sqrt(m_d);
               var attn = g.MulBatch(Qs, Ks, m_multiHeadNum * m_batchSize, scale);
               var attn2 = g.View(attn, m_multiHeadNum * m_batchSize * seqLen, seqLen);
    
               var softmax = g.Softmax(attn2);
               var softmax2 = g.View(softmax, m_multiHeadNum * m_batchSize, seqLen, seqLen);
               var o = g.View(g.MulBatch(softmax2, Vs, m_multiHeadNum * m_batchSize), m_multiHeadNum, m_batchSize, seqLen, m_d);
               var W = g.View(g.Permute(o, 1, 2, 0, 3), m_batchSize * seqLen, m_multiHeadNum * m_d);
    
               // Output projection
               var finalAttResults = g.Affine(W, W0, b0);
    
               //Skip connection and layer normaliztion
               var addedAttResult = g.Add(finalAttResults, input);
               var normAddedAttResult = layerNorm1.Process(addedAttResult, g);
    
               //Feed forward
               var ffnResult = feedForwardLayer1.Process(normAddedAttResult, g);
               var reluFFNResult = g.Relu(ffnResult);
               var ffn2Result = feedForwardLayer2.Process(reluFFNResult, g);
    
               //Skip connection and layer normaliztion
               var addFFNResult = g.Add(ffn2Result, normAddedAttResult);
               var normAddFFNResult = layerNorm2.Process(addFFNResult, g);
    
               return normAddFFNResult;
           }
    

    ```

    See also[edit]

    References[edit]

    1. "Seq2SeqSharp LICENSE". GitHub.
    2. "Seq2SeqSharp Project". https://github.com/zhongkaifu/Seq2SeqSharp. Retrieved 2019-10-10. External link in |website= (help)
    3. "Seq2SeqSharp: a tensor based fast and flexible encoder-decoder deep neural network framework written by .NET (C#)". GitHub.

    External links[edit]

    Modify the page according to the feedback from reviewer[edit]


    This article "Seq2SeqSharp" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Seq2SeqSharp. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.