# Linear Regression with CNTK and C#

CNTK is Microsoft’s deep learning tool for training very large and complex neural network models. However, you can use CNTK for various other purposes. In some of the previous posts we have seen how to use CNTK to perform matrix multiplication, in order to calculate descriptive statistics parameters on data set.
In this blog post we are going to implement simple linear regression model, LR. The model contains only one neuron. The model also contains bias parameters, so in total the linear regression has only two parameters: w and b.
The image below shows LR model:

The reason why we use the CNTK to solve such a simple task is very straightforward. Learning on simple models like this one, we can see how the CNTK library works, and see some of not-so-trivial actions in CNTK.
The model shown above can be easily extend to logistic regression model, by adding activation function. Besides the linear regression which represent the neural network configuration without activation function, the Logistic Regression is the simplest neural network configuration which includes activation function.

The following image shows logistic regression model:
In case you want to see more info about how to create Logistic Regression with CNTK, you can see this official demo example.
Now that we made some introduction to the neural network models, we can start by defining the data set. Assume we have simple data set which represent the simple linear function $y=2x+1$. The generated data set is shown in the following table:

We already know that the linear regression parameters for presented data set are: $b_0=1$ and $b_1=2$, so we want to engage the CNTK library in order to get those values, or at least parameter values which are very close to them.

All task about how the develop LR model by using CNTK can be described in several steps:

Step 1: Create C# Console application in Visual Studio, change the current architecture to $x64$, and add the latest “CNTK.GPU “ NuGet package in the solution. The following image shows those action performed in Visual Studio.

Step 2: Start writing code by adding two variables: $X$ – feature, and label $Y$. Once the variables are defined, start with defining the training data set by creating batch. The following code snippet shows how to create variables and batch, as well as how to start writing CNTK based C# code.

First we need to add some using statements, and define the device where computation will be happen. Usually, we can defined CPU or GPU in case the machine contains NVIDIA compatible graphics card. So the demo starts with the following cod snippet:

using System;
using System.Linq;
using System.Collections.Generic;
using CNTK;
namespace LR_CNTK_Demo
{
class Program
{
static void Main(string[] args)
{
//Step 1: Create some Demo helpers
Console.Title = "Linear Regression with CNTK!";
Console.WriteLine("#### Linear Regression with CNTK! ####");
Console.WriteLine("");
//define device
var device = DeviceDescriptor.UseDefaultDevice();


Now define two variables, and data set presented in the previous table:

//Step 2: define values, and variables
Variable x = Variable.InputVariable(new int[] { 1 }, DataType.Float, "input");
Variable y = Variable.InputVariable(new int[] { 1 }, DataType.Float, "output");

//Step 2: define training data set from table above
var xValues = Value.CreateBatch(new NDShape(1, 1), new float[] { 1f, 2f, 3f, 4f, 5f }, device);
var yValues = Value.CreateBatch(new NDShape(1, 1), new float[] { 3f, 5f, 7f, 9f, 11f }, device);


Step 3: Create linear regression network model, by passing input variable and device for computation. As we already discussed, the model consists of one neuron and one bias parameter. The following method implements LR network model:

private static Function createLRModel(Variable x, DeviceDescriptor device)
{
//initializer for parameters
var initV = CNTKLib.GlorotUniformInitializer(1.0, 1, 0, 1);

//bias
var b = new Parameter(new NDShape(1,1), DataType.Float, initV, device, "b"); ;

//weights
var W = new Parameter(new NDShape(2, 1), DataType.Float, initV, device, "w");

//matrix product
var Wx = CNTKLib.Times(W, x, "wx");

//layer
var l = CNTKLib.Plus(b, Wx, "wx_b");

return l;
}


First, we create initializer, which will initialize startup values of network parameters. Then we defined bias and weight parameters, and join them in form of linear model “$wx+b$”, and returned as Function type. The createModel function is called in the main method. Once the model is created, we can exam it, and prove there are only two parameters in the model. The following code create the Linear Regression model, and print model parameters:

//Step 3: create linear regression model
var lr = createLRModel(x, device);
//Network model contains only two parameters b and w, so we query
//the model in order to get parameter values
var paramValues = lr.Inputs.Where(z => z.IsParameter).ToList();
var totalParameters = paramValues.Sum(c => c.Shape.TotalSize);
Console.WriteLine($"LRM has {totalParameters} params, {paramValues[0].Name} and {paramValues[1].Name}.");  In the previous code, we have seen how to extract parameters from the model. Once we have parameters, we can change its values, or just print those values for the further analysis. Step 4: Create Trainer, which will be used to train network parameters w and b. The following code snippet shows implementation of Trainer method. public Trainer createTrainer(Function network, Variable target) { //learning rate var lrate = 0.082; var lr = new TrainingParameterScheduleDouble(lrate); //network parameters var zParams = new ParameterVector(network.Parameters().ToList()); //create loss and eval Function loss = CNTKLib.SquaredError(network, target); Function eval = CNTKLib.SquaredError(network, target); //learners // var llr = new List(); var msgd = Learner.SGDLearner(network.Parameters(), lr); llr.Add(msgd); //trainer var trainer = Trainer.CreateTrainer(network, loss, eval, llr); // return trainer; }  First we defined learning rate the main neural network parameter. Then we create Loss and Evaluation functions. With those parameters we can create SGD learner. Once the SGD learner object is instantiated, the trainer is created by calling CreateTrainer static CNTK method, and passed it further as function return. The method createTrainer is called in the main method: //Step 4: create trainer var trainer = createTrainer(lr, y);  Step 5: Training process: Once the variables, data set, network model and trainer are defined, the training process can be started. //Ştep 5: training for (int i = 1; i <= 200; i++) { var d = new Dictionary(); d.Add(x, xValues); d.Add(y, yValues); // trainer.TrainMinibatch(d, true, device); // var loss = trainer.PreviousMinibatchLossAverage(); var eval = trainer.PreviousMinibatchEvaluationAverage(); // if (i % 20 == 0) Console.WriteLine($"It={i}, Loss={loss}, Eval={eval}");

if(i==200)
{
//print weights
var b0_name = paramValues[0].Name;
var b0 = new Value(paramValues[0].GetValue()).GetDenseData(paramValues[0]);
var b1_name = paramValues[1].Name;
var b1 = new Value(paramValues[1].GetValue()).GetDenseData(paramValues[1]);
Console.WriteLine($" "); Console.WriteLine($"Training process finished with the following regression parameters:");
Console.WriteLine($"b={b0[0][0]}, w={b1[0][0]}"); Console.WriteLine($" ");
}
}
}


As can be seen, in just 200 iterations, regression parameters got the values we almost expected $b_0=0.995$, and $w=2.005$. Since the training process is different than classic regression parameter determination, we cannot get exact values. In order to estimate regression parameters, the neural network uses iteration methods called Stochastic Gradient Decadent, SGD. On the other hand, classic regression uses regression analysis procedures by minimizing the least square error, and solve system equations where unknowns are b and w.
Once we implement all code above, we can start LR demo by pressing F5. Similar output window should be shown:

Hope this blog post can provide enough information to start with CNTK C# and Machine Learning. Source code for this blog post can be downloaded here.

# Input normalization as separate layer in CNTK with C#

In the previous post, we have seen how to calculate some of basis parameters of descriptive statistics, as well as how to normalize data by calculating  mean and standard deviation. In this blog post we are going to implement data normalization as regular neural network layer, which can simplify the training process and data preparation.

## What is Data normalization?

Simple said, data normalization is set of tasks which transform values of any feature in a data set into predefined number range. Usually this range is [-1,1] , [0,1] or some other specific ranges. Data normalization plays very important role in ML, since it can dramatically improve the training process, and simplify settings of network parameters.

There are two main types of data normalization:
– MinMax normalization – which transforms all values into range of [0,1],
– Gauss Normalization or Z score normalization, which transforms the value in such a way that the average value is zero, and std is 1.

Beside those types there are plenty of other methods which can be used. Usually those two are used when the size of the data set is known, otherwise we should use some of the other methods, like log scaling, dividing every value with some constant, etc. But why data need to be normalized? This is essential question in ML, and the simplest answer is to provide the equal influence to all features to change the output label. More about data normalization and scaling can be found on this link.

In this blog post we are going to implement CNTK neural network which contain a “Normalization layer” between input and first hidden layer. The schematic picture of the network looks like the following image:

As can be observed, the Normalization layer is placed between input and first hidden layer. Also the Normalization layer contains the same neurons as input layer and produced the  output with the same dimension as the input layer.

In order to implement Normalization layer the following requirements must be met:

• calculate average  $\mu$ and standard deviation $\sigma$ in training data set as well find maximum and minimum value of each feature.
• this must be done prior to neural network model creation, since we need those values in the normalization layer.
• within network model creation, the normalization layer should be define after input layer is defined.

# Calculation of mean and standard deviation for training data set

Before network creation, we should prepare mean and standard deviation parameters which will be used in the Normalization layer as constants. Hopefully, the CNTK has the static method in the Minibatch source class for this purpose “MinibatchSource.ComputeInputPerDimMeansAndInvStdDevs”. The method takes the whole training data set defined in the minibatch and calculate the parameters.


//calculate mean and std for the minibatchsource
// prepare the training data
var d = new DictionaryNDArrayView, NDArrayView>>();
using (var mbs = MinibatchSource.TextFormatMinibatchSource(
trainingDataPath , streamConfig, MinibatchSource.FullDataSweep,false))
{
d.Add(mbs.StreamInfo("feature"), new Tuple(null, null));
//compute mean and standard deviation of the population for inputs variables
MinibatchSource.ComputeInputPerDimMeansAndInvStdDevs(mbs, d, device);

}



Now that we have average and std values for each feature, we can create network with normalization layer. In this example we define simple feed forward NN with 1 input, 1 normalization, 1 hidden and 1 output layer.


private static Function createFFModelWithNormalizationLayer(Variable feature, int hiddenDim,int outputDim, Tuple avgStdConstants, DeviceDescriptor device)
{
//First the parameters initialization must be performed
var glorotInit = CNTKLib.GlorotUniformInitializer(
CNTKLib.DefaultParamInitScale,
CNTKLib.SentinelValueForInferParamInitRank,
CNTKLib.SentinelValueForInferParamInitRank, 1);

//*******Input layer is indicated as feature
var inputLayer = feature;

//*******Normalization layer
var mean = new Constant(avgStdConstants.Item1, "mean");
var std = new Constant(avgStdConstants.Item2, "std");
var normalizedLayer = CNTKLib.PerDimMeanVarianceNormalize(inputLayer, mean, std);

//*****hidden layer creation
//shape of one hidden layer should be inputDim x neuronCount
var shape = new int[] { hiddenDim, 4 };
var weightParam = new Parameter(shape, DataType.Float, glorotInit, device, "wh");
var biasParam = new Parameter(new NDShape(1, hiddenDim), 0, device, "bh");
var hidLay = CNTKLib.Times(weightParam, normalizedLayer) + biasParam;
var hidLayerAct = CNTKLib.ReLU(hidLay);

//******Output layer creation
//the last action is creation of the output layer
var shapeOut = new int[] { 3, hiddenDim };
var wParamOut = new Parameter(shapeOut, DataType.Float, glorotInit, device, "wo");
var bParamOut = new Parameter(new NDShape(1, 3), 0, device, "bo");
var outLay = CNTKLib.Times(wParamOut, hidLayerAct) + bParamOut;
return outLay;
}


# Complete Source Code Example

The whole source code about this example is listed below. The example show how to normalize input feature for Iris famous data set. Notice that when using such way of data normalization, we don’t need to handle  normalization for validation or testing data sets, because data normalization  is part of the network model.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using CNTK;
namespace NormalizationLayerDemo
{
class Program
{
static string trainingDataPath = "./data/iris_training.txt";
static string validationDataPath = "./data/iris_validation.txt";
static void Main(string[] args)
{
DeviceDescriptor device = DeviceDescriptor.UseDefaultDevice();

//stream configuration to distinct features and labels in the file
var streamConfig = new StreamConfiguration[]
{
new StreamConfiguration("feature", 4),
new StreamConfiguration("flower", 3)
};

// build a NN model
//define input and output variable and connecting to the stream configuration
var feature = Variable.InputVariable(new NDShape(1, 4), DataType.Float, "feature");
var label = Variable.InputVariable(new NDShape(1, 3), DataType.Float, "flower");

//calculate mean and std for the minibatchsource
// prepare the training data
var d = new Dictionary();
using (var mbs = MinibatchSource.TextFormatMinibatchSource(
trainingDataPath , streamConfig, MinibatchSource.FullDataSweep,false))
{
d.Add(mbs.StreamInfo("feature"), new Tuple(null, null));
//compute mean and standard deviation of the population for inputs variables
MinibatchSource.ComputeInputPerDimMeansAndInvStdDevs(mbs, d, device);

}

//Build simple Feed Froward Neural Network with normalization layer
var ffnn_model = createFFModelWithNormalizationLayer(feature,5,3,d.ElementAt(0).Value, device);

//Loss and error functions definition
var trainingLoss = CNTKLib.CrossEntropyWithSoftmax(new Variable(ffnn_model), label, "lossFunction");
var classError = CNTKLib.ClassificationError(new Variable(ffnn_model), label, "classificationError");

// set learning rate for the network
var learningRatePerSample = new TrainingParameterScheduleDouble(0.01, 1);

//define learners for the NN model
var ll = Learner.SGDLearner(ffnn_model.Parameters(), learningRatePerSample);

//define trainer based on model, loss and error functions , and SGD learner
var trainer = Trainer.CreateTrainer(ffnn_model, trainingLoss, classError, new Learner[] { ll });

//Preparation for the iterative learning process

// create minibatch for training
var mbsTraining = MinibatchSource.TextFormatMinibatchSource(trainingDataPath, streamConfig, MinibatchSource.InfinitelyRepeat, true);

int epoch = 1;
while (epoch  a.sweepEnd))
{
reportTrainingProgress(feature, label, streamConfig, trainer, epoch, device);
epoch++;
}
}
}

private static void reportTrainingProgress(Variable feature, Variable label, StreamConfiguration[] streamConfig,  Trainer trainer, int epoch, DeviceDescriptor device)
{
// create minibatch for training
var mbsTrain = MinibatchSource.TextFormatMinibatchSource(trainingDataPath, streamConfig, MinibatchSource.FullDataSweep, false);
var trainD = mbsTrain.GetNextMinibatch(int.MaxValue, device);
//
var a1 = new UnorderedMapVariableMinibatchData();
var trainEvaluation = trainer.TestMinibatch(a1);

// create minibatch for validation
var mbsVal = MinibatchSource.TextFormatMinibatchSource(validationDataPath, streamConfig, MinibatchSource.FullDataSweep, false);
var valD = mbsVal.GetNextMinibatch(int.MaxValue, device);

//
var a2 = new UnorderedMapVariableMinibatchData();
var valEvaluation = trainer.TestMinibatch(a2);

Console.WriteLine($"Model Expectation: Input({xVal[0]},{xVal[1]},{xVal[2]},{xVal[3]}), Iris Flower= setosa");  ## Training previous saved model Training previously saved model is very simple, since it requires no special coding. Right after the trainer is created with all necessary stuff (network, learning rate, momentum and other), you just need to call  trainer.RestoreFromCheckpoint(strIrisFilePath);  No additional code should be added. The above method is called, after you successfully saved the model state by calling trainer.SaveCheckpoint(strIrisFilePath);  The method is usually called at the end of the training process. Complete source code from this blog post can be found here. # How to setup learning rate per iteration in CTNK using C# So far we have seen how to train and validate models in CNTK using C#. Also there many more details which should be revealed in order to better understand the CNTK library. One of the important feature not only in the CNTK but also in every DNN (deep neural networks) is the learning rate. In ANN the learning rate is the number by which the derivative is multiply before it is subtracted by the weight. If the weight is decreased to much the loss function will be increased and the network will diverge. On the other hand if the weight is decreased to little the loss function will be changed little and the diverge progress will be to slow. So selecting the right value of the parameter is important. During the training process, the learning rate is usually defined as constant value. In CNTK the learning rate is defined as follow: // set learning rate for the network var learningRate = new TrainingParameterScheduleDouble(0.2, 1);  From the code above the learning rate is assign to 0.2 value per sample. This means whole training process will be done with the learning rate of 0.2. The CNTK support dynamic changing of the learning rate. Assume we want to setup different the learning rates so that from the fist to the 100 iterations the learning rate would be 0.2. From the 100 to 500 iterations we want the learning rate would be 0.1. Moreover, after the 500 iterations are completed and to he end of the iteration process, we want to setup the learning rate to 0.05. Above said can be expressed: lr1=0.2 , from 1 to 100 iterations lr2= 0.1 from 100 to 500 iterations lr3= 0.05 from 500 to the end of the searching process.  In case we want to setup the learning rate dynamically we need to use the PairSizeTDouble class in order to defined the learning rate. So for the above requirements the flowing code should be implemented: PairSizeTDouble p1 = new PairSizeTDouble(2, 0.2); PairSizeTDouble p2 = new PairSizeTDouble(10, 0.1); PairSizeTDouble p3 = new PairSizeTDouble(1, 0.05); var vp = new VectorPairSizeTDouble() { p1, p2, p3 }; var learningRatePerSample = new CNTK.TrainingParameterScheduleDouble(vp, 50);  First we need to defined PairSizeTDouble object for every learning rate value, with the integer number which will be multiply. Once we define the rates, make a array of rate values by creating the VectorPairSizeTDouble object. Then the array is passed as the first argument in the TrainingParameterScheduleDouble method. The second argument of the method is multiplication number. So in the first rate value, the 2 is multiple with 50 which is 100, and denotes the iteration number. Similar multiplication are done in the other rate values. # Testing and Validation CNTK models using C# …continue from the previous post. Once the model is build and Loss and Validation functions are satisfied our expectation, we need to validate and test the model using the data which was not part of the training data set (unseen data). The model validation is very important because we want to see if our model is trained well,so that can evaluates unseen data approximately same as the training data. Otherwise the model which cannot predict the output is called overfitted model. Overfitting can happen when the model was trained long enough that shows very high performance for the training data set, but for the testing data evaluate bad results. We will continue with the implementation from the prevision two posts, and implement model validation. After the model is trained, the model and the trainer are passed to the Evaluation method. The evaluation method loads the testing data and calculated the output using passed model. Then it compares calculated (predicted) values with the output from the testing data set and calculated the accuracy. The following source code shows the evaluation implementation. private static void EvaluateIrisModel(Function ffnn_model, Trainer trainer, DeviceDescriptor device) { var dataFolder = "Data";//files must be on the same folder as program var trainPath = Path.Combine(dataFolder, "testIris_cntk.txt"); var featureStreamName = "features"; var labelsStreamName = "label"; //extract features and label from the model var feature = ffnn_model.Arguments[0]; var label = ffnn_model.Output; //stream configuration to distinct features and labels in the file var streamConfig = new StreamConfiguration[] { new StreamConfiguration(featureStreamName, feature.Shape[0]), new StreamConfiguration(labelsStreamName, label.Shape[0]) }; // prepare testing data var testMinibatchSource = MinibatchSource.TextFormatMinibatchSource( trainPath, streamConfig, MinibatchSource.InfinitelyRepeat, true); var featureStreamInfo = testMinibatchSource.StreamInfo(featureStreamName); var labelStreamInfo = testMinibatchSource.StreamInfo(labelsStreamName); int batchSize = 20; int miscountTotal = 0, totalCount = 20; while (true) { var minibatchData = testMinibatchSource.GetNextMinibatch((uint)batchSize, device); if (minibatchData == null || minibatchData.Count == 0) break; totalCount += (int)minibatchData[featureStreamInfo].numberOfSamples; // expected labels are in the mini batch data. var labelData = minibatchData[labelStreamInfo].data.GetDenseData<float>(label); var expectedLabels = labelData.Select(l => l.IndexOf(l.Max())).ToList(); var inputDataMap = new Dictionary<Variable, Value>() { { feature, minibatchData[featureStreamInfo].data } }; var outputDataMap = new Dictionary<Variable, Value>() { { label, null } }; ffnn_model.Evaluate(inputDataMap, outputDataMap, device); var outputData = outputDataMap[label].GetDenseData<float>(label); var actualLabels = outputData.Select(l => l.IndexOf(l.Max())).ToList(); int misMatches = actualLabels.Zip(expectedLabels, (a, b) => a.Equals(b) ? 0 : 1).Sum(); miscountTotal += misMatches; Console.WriteLine($"Validating Model: Total Samples = {totalCount}, Mis-classify Count = {miscountTotal}");

if (totalCount >= 20)
break;
}
Console.WriteLine($"---------------"); Console.WriteLine($"------TESTING SUMMARY--------");
float accuracy = (1.0F - miscountTotal / totalCount);
Console.WriteLine(\$"Model Accuracy = {accuracy}");
return;

}


The implemented method is called in the previous Training method.

 EvaluateIrisModel(ffnn_model, trainer, device);



As can be seen the model validation has shown that the model predicts the data with high accuracy, which is shown on the following picture.

This was the latest post in series of blog posts about using Feed forward neural networks to train the Iris data using CNTK and C#.

The full source code for all three samples can be found here.