Advanced Technology Days 14, ATD14, is a two days conference organized by the Microsoft and MS Community in Zagreb the Capital of Croatia. My session about Microsoft Cognitive Toolkit, CNTK on .NET platform held on second day, and I was very happy to talk about this, since only two months ago .NET Core support has finally implemented in the library.
There were more demos that I had time to preset them, so at the end of this blog you can find link for all demos and presentation file. Also the information about data sets need to be downloaded prior to run examples are placed in the code. The last demo about ANNdotNET you can find on https://bhrnjica.net/anndotnet The demos and presentation file can be found at this location: https://1drv.ms/f/s!AgPZDj-_uxGLhY1pCCODeT03qK_T3A
When building deep learning models, it is often required to check the model for consistency and proper parameters definition. In ANNdotNET, ml network models are designed using Visual Network Designer (VND), so it is easy to see the network configuration. Beside VND, in ANNdotNET there are several visualization features on different level: network preparation, model training phase, post training evaluation, performance analysis, and export results. In this blog post we will learn how to use those features when working with deep learning models
Visualization during network preparation and model training
When preparing network and training parameters, we need information about data sets, input format and output type. This information is relevant for selecting what type of network model to configure, what types of layers we will use, and what learner to select. For example the flowing image shows network configuration containing of 2 embedding layers, 3 dense layers and 2 dropout layers. This network configuration is used to train CNTK model for mushroom data set. As can be seen network layers are arranged as listbox items, and the user has possibility to see, on the highest level, how neural networks looks like, which layers are included in the network, and how many dimensions each layer is defined. This is very helpful, since it provides the way of building network very quickly and accurately, and it requires much less times in comparisons to use traditional way of coding the network in python, or other programming language.
ANNdotNET Network Settings page provides pretty much information about the network, input and output layers, what data set are defined, as well as whole network configuration arranged in layers. Beside network related information, the Network Settings tab page also provides the learning parameters for the network training. More about Visual Network Designer the ready can find on one of the previous blog post.
Since ANNdotNET implements MLEngine which is based on CNTK, so all CNTK related visualization features could be used. The CNTK library provides rich set of visualizations. For example you can use Tensorboard in CNTK for visualization not just computational graph, but also training history, model evaluation etc. Beside Tensorboard, CNTK provides logger module which uses Graphviz tool for visualizing network graph. The bad news of this is that all above features cannot be run on C#, since those implementation are available only in python.
This is one of the main reason why ANNdotNET provides rich set of visualizations for .NET platform. This includes: training history, model evaluation for training and validation data set, as well as model performance analysis. The following image show some of the visualization features: the training history (loss and evaluation) of minibatches during training of mushroom model:
Moreover, the following image shows evaluation of training and validation set for each iteration during training:
Those graphs are generated during training phase, so the user can see what is happening with the model. This is of tremendous help, when deciding when to stop the training process, or are training parameters produce good model at all, or this can be helpful in case when can stop and change parameters values. In case we need to stop the training process immediately, ANNdotNET provides Stop command which stops training process at any time.
Model performance visualization
Once the model is trained, ANNdotNET provides performance analysis tool for all three types of ML problems: regression, binary and multi class classification.
Since the mushrooms project is binary ML problem the following image shows the performance of the trained model:
Using Graphviz to visualize CNTK network graph in C#
We have seen that ANNdotNET provides all types of visualizations CNTK models, and those features are provided by mouse click through the GUI interfaces. One more feature are coming to ANNdotNET v1.1 which uses Grpahviz to visualize CNTK network graph. The feature is implemented based on original CNTK python implementation with some modification and style.
In order to use Graphviz to visualize network computation graph the following requirements must be met:
Along the book, I was developing GPdotNET application which is explained in Chapter 5. Actually the Chapter 5 described in depth all aspects of the application, with real world examples.
As can be seen GPdotNET v5 is completely rewritten application, with new logo and GUI. As Introduction of the application I have prepared several videos on youtube with quick explanation how to use some of the main modules in GPdotNET.
2. Step: Open ANNdotNET application. Press New command, select Project 1 tree item and rename the project into Iris Data Set.
2. Step: Select Data Command from Model Preparation ribbon group, Click File button from Import experimenal data dialog and select the recently downloaded file. Check Comma check box and press Import Data button.
4. Steps: Once the data is prepared Click Create Model Command and Model Settings panel is shown. Setup parameters as shown on the image below and click Run command.
5. Steps: Once the model is trained you can evaluate model by selecting Evaluate Command. Depending on the model type (regression, Binary or Multi class classification) The appropriate Evaluation dialog appears. Since this is multi class classification model, the Confusion matrix is shows, with micro and macron performance parameters.
6. Steps: For further analysis you can export model to Excel, or into ONNX. Also you can save the project which can later be opened and retrained again.
Note: Currently ANNdotNET is in alpha version, and more feature will come in near future.
ANNdotNET is windows desktop application written in C# for creating and training ANN models. The application relies on Microsoft Cognitive Toolkit, CNTK, and it is supposed to be GUI tool for CNTK library with extensions in data preprocessing, model evaluation and exporting capabilities. It is hosted at GitHub and can be clone from http://github.com/bhrnjica/anndotnet
Currently, ANNdotNET supports the folowing type of ANN:
Simple Feed Forward NN
Deep Feed Forward NN
Recurrent NN with LSTM
The process of creating, training, evaluating and exporting models is provided from the GUI Application and does not require knowledge for supported programming languages. The ANNdotNET is ideal for engineers which are not familiar with programming languages.
ANNdotNET is x64 Windows desktop application which is running on .NET Framework 4.7.1. In order to run the application, the following requirements must be met:
– Windows 7, 8 or 10 with x64 architecture
– NET Framework 4.7.1
– CPU/GPU support.
Note: The application automatically detect GPU capability on your machine and use it in training and evaluation, otherwise it will use CPU.
How to run application
In order to run the application there are two possibilities:
Clone the GitHub repository of the application and open it in Visual Studio 2017.
Change build architecture into x64, build and run the application.
Download released version unzip and run ANNdotNET.exe.
The following three short videos quickly show how to create, train and evaluate regression, binary and multi class classification models.
Training regression model. Data set is Concrete Slump Test is downloaded from the UCI ML Repository and loaded into ANNdotNET without any modification, since the data preparation module can prepare it.
2. Training and evaluation binary classifier model. Data represent Titanic data set downloaded from the public repository.
3. Training and evaluation multi class classification models. Data represents Iris data set downloaded from the same page as above.
As you already know GPdotNET v4 tool consists of several modules which include:
GP module for creating and training models based on genetic programming,
ANN module for creating and training models based on Feed Forward Neural Networks,
GA module for model and function optimization using Genetic Algorithm
LGA module is for linear programming with GA which includes solving Traveling Salesman based problems, Assignment and Transportation problems.
With the latest release the GPdotNET has changed a lot. First of all, the initial idea about GPdotNET was to provide GP method in the application. And as the project grew lot of new implementations were included in the main project. This year I decided to make two different projects which can be seen as the natural evolution of GPdotNET v4.
The first project remain the same which follows the previous version and it is called GPdotNET v5. The project includes only GP related algorithm implementation which is developed for creating and training supervised ML problems (regression, binary and multi-class classification).
The second project uses several ANN algorithms for creating and training supervised machine learning problems. The project is called ANNdotNET. It is Windows Forms desktop application very similar with GPdotNET, for creating and training ANN models.
I am very prod to announce that the new version of GPdotNET will be released as two different open source projects.
Regardless of machine learning library you use, the data preparation is the first and one of the most important step in developing predictive models. It is very often case that the data supposed to be used for the training is dirty with lot of unnecessary columns, full of missing values, un-formatted numbers etc. Before training the data must be cleaned and properly defined in order to get good model. This is known as data preparation. The data preparation consist of cleaning the data, defining features and labels, deriving the new features from the existing data, handling missing values, scaling the data etc. It can be concluded that the total time we spend in ML modelling,the most of it is related to data preparation.
In this blog post I am going to present the simple tool which can significantly reduce the preparation time for ML. The tool simply loads the data in to GUI, and then the user can define all necessary information. Once the data is prepared user can store the data it to files which can be then directly imported into ML algorithm such as CNTK.
The following image shows the ML Data Preparation Tool main window.
From the image above, the data preparation can be achieved in several steps.
Load dirty data into ML Prep Tool, by pressing Import Data button
Transform the data by providing the flowing:
Type – each column can be:
Numeric – which holds continuous numeric values,
Binary – which indicates two class categorical data,
Category – which indicates categorical data with more than two classes,
String – which indicate the column will not be part of training and testing data set,
Encoding – in case of Binary and Category column type, the encoding must be defined. The flowing encoding is supported:
Binary Encoding with (0,1) – first binary values will be 0, and second binary values will be 1.
Binary encoding with (-1,1) – first binary values will be -1, and second binary values will be 1.
Category Level- which each class treats as numeric value. In case of 3 categories(R,G, B), encoding will be (0,1,2)
Category 1:N- implements One-Hot vector with N columns. In case of 3 categories(R,G, B), encoding will be R = (1,0,0),G = (0,1,0), B = (0,0,1).
Category 1:N-1(0) – implements dummy coding with N-1 columns. In case of 3 categories(R, G, B), encoding will be R = (1,0),G = (0,1), B = (0,0).
Category 1:N-1(-1) – implements dummy coding with N-1 columns. In case of 3 categories(R, G, B), encoding will be R = (1,0),G = (0,1), B = (-1,-1).
Variable – defines features and label. Only one label, and at least one features can be defined. Also the column can be defined as Ignore variable, which will skip that column. The following options are sported:
Input – which identifies the column as feature or predictor,
Output – which identifies the column as label or model output.
Scaling – defines column scaling. Two scaling options are supported:
Missing Values – defines the replacement for the missing value withing the column. There are several options related to numeric and two options (Random and Mode ) for categorical type.
Define the testing data set size by providing information of row numbers or percent.
Define export options
Press Export Button.
As can be seen this is straightforward workflow of data preparation.
Besides the general export options which can be achieved by selecting different delimiter options, you can export data set in to CNTK format, which is very handy if you play with CNTK.
After data transformations, the user need to check CNTK format int the export options and press Export in order to get CNTK training and testing files, which can be directly used in the code without any modifications.
Some of examples will be provided int he next blog post.