With the new year, the new version of ANNdotNET has come. This is mayor updates which brings very exciting features.
Here are what has come with the ANNdotNET v1.2:
Image classification module:- this is the main update to this version. Previously you could run image processing only when you manually create mlconfig file, with text based reader. Now you can perform full image preparation prior to create mlconfig file. For example Cat&Dog image processing deep learning example is provided and can be found at ANNdotNET Start page. Notice: prior to run this example you should download image dataset from Kaggle web site and save it on specific location on you disk. More information can be found at ProjectInfo tab (see image below).
ANNdotNET Feed: brings ability to share interesting annproject‘s to all community. Once the interesting project is added to ANNdotNET Feed it can be viewed by all uses that have installed ANNdotNET v1.2+ version. Currently three examples are provided through ANNdotNET Feed.
Time Series Generator – previously time series could be loaded in Data Import dialog only with one column data without header. Now, more than one column with header can be imported, and only the last one will be generated as time series, while the rest columns will remain as are.
Split Raw Data Set to: train, validation and test sets. Up to now the user could split raw data set on train and validation sets only.
Export to Excel with all three data sets. In case test data set is defined, Export to Excel will also export data set for testing.
Optimization data loading and handling with huge data set.There are some improvements in loading huge data set.
Visual Network Designer improvements. Visual Network Designer now is provided by more options.
Added new Layer types: Convolution, Pooling, etc.
Insert button Insert Layer in the network at specific position.
Some WinForms Dialogs has been converted into WPF based windows in order to fix WInForm DPI issue on scalled monitors. #42
In this blog post, step by step instruction is going to be described in order to prepare clean Windows based machine (virtual) with GPU for deep learning with CNTK, Tensorflow and Keras. Installation of OS is not covered in the post, and this is task is assumed it is already completed.
Preparing the machine
Once you have up and running clean Windows machine, there are several things you should concider:
1. Physical machine with NVIDIA compatible graphics card. This requirement will provide deep learning frameworks to train models on GPU, which speedups the training process rapidly.
2. Virtual Machine with GPU. In case you plan to prepare virtual machine, or Azure virtual machine, be aware that (for my knowledge) only Windows Server 2016 based virtual machine recognize GPU card. So if you install Windows 10 or lower version on virtual machine, you will not be able to use GPU for training deep learning models.
3. Azure N-Series VM In case you plan to select one of Azure virtual machine, only N-series support GPU.
Installation for NVIDIA driver and related stuff
In this blog post only NVIDIA related driver will be described, and no other installation driver will be considered. In case of other driver installation, please refer to related vendor site.
For this blog post, drivers and related stuff for NVIDIA Tesla K80 graphics card will be explained. For other NVIDIA cards the installation process is almost the same.
1. First you have to know what NVIDIA graphics card is installed your machine.
2. Then go to NVIDIA official site, and select appropriate information before driver download. In my case the following information are selected:
3. Press search and download the driver.
Once you download the driver, install it on your machine.
Once you have driver installed, you have to download and install two more NVIDIA software components:
1. CUDA Toolkit 9.0 2. cuDNN 7.4
Those two software components are used by deep learning frameworks (CNTK and TensorFlow) for GPU based computation. The CUDA 9.0 is compatible with the latest versions of CNTK 2.6 and Tensorflow 1.12, so it makes easier to used one CUDA version for both frameworks, which was not the case in the past.
Installation of CUDA 9.0
In order to install CUDA Toolkit, go to CUDA download page and select appropriate information of your machine. The following information I have selected in order to download it:
Once you select the right information, press download button. Once the CUDA 9.0 is downloaded on you machine install it by performing Express installation option.
Installation of cuDNN 7.4
Download the cuDNN from the official site, and then press Download cuDNN button.
Once you press it, the following page should appear. Notice also notice that login page might appear before download page.
Once the cuDNN is downloaded unzip it. Only three files are exist in the installation, and those should be copied on the right place. In order to successfully install cuDNN, perform the following files copy:
1. cudnn64_7.dll to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\bin
2. cudnn.lib to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\lib\x64
3. cudnn.h to C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v9.0\include
Once you’ve done that, the NVIDIA related stuff installation has been completed, and you can switch to installation Python related stuff.
Installation Python development environment
CNTK and TensorFlow support various python environments, but you should always see official site for the compatibility. In order to use CNTK and Tensor flow on the same python environment, it is recommended to use Anaconda3 version 4.1.1 environment.
First download the Anaconda3 v4.1.1 from the official site:
Once the Anaconda is downloaded install it, on standard way using installer.
Prepare python environment for the installation
Once Anaconda3 4.1.1 has been installed several commands needs to be performed in order to install all necessary software. Before start, we need to upgrade pip since Anaconda3 4.1.1 is little bit old. So run the Anaconda Command Prompt from the Start->Anacoda->Anaconda Prompt
Once the Anaconda Prompt is running, type the following command:
python -m pip install --upgrade pip
Now we are ready to install CNTK, Tensorflow and Keras. But before that we should create separate python environment with python 3.5. Once we have the environment we can install those frameworks to it. The new environment must be relies on python 3.5. So type the following command into Anaconda Prompt:
conda create --name mlenv1218 python=3.5
We have created environment named “mlenv1218“. Now don’t forget to activate the environment before installing software. Type the following commands in order to activate environment.
Once we’ve done that, the Anaconda Prompt should looks like (active environment is shown on the left site):
Installation of CNTK, Tensorflow and Keras
It is very important to properly install NVIDIA related stuff, before installation of deep learning libraries, because most of the installation problems are related to it. Once we have NVIDIA and Python environment installed properly, the installation process for deep learning frameworks is very easy. In Anaconda Prompt, with activate “mlenv2118” environment, type the following command in order to install CNTK:
pip install cntk-gpu
The type the following python code to test CNTK installation:
python -c "import cntk; print(cntk.__version__)"
Once you’ve done that, type the following command in order to install Tensorflow:
pip install tensorflow-gpu
Type the following command in order to test installation:
python -c "import tensorflow as tf; tf.enable_eager_execution(); print(tf.reduce_sum(tf.random_normal([1000, 1000])))"
At the end type the following command to install Keras:
pip install keras
In addition it is useful to install the following packages:
That is all to install in order to run CNTK, TensorFlow and Keras.
Install Visual Studio Code to write python code
In order to write python code for deep learning you have two options among many other:
Install Visual Studio 2017
Install Visual Studio Code
Visual Studio Code can be downloaded from official site. Download it and install. Once you install the VS Code, run it. Press Extension button on the lest side and type python in search box. Select on Python extension and press Install.
Restart VS Code, and
Select File->New File
Save file as python_test.py
Change current python environment to “mlenv1218” (by double click)
Run python code, by right click on python code and select “Run Python File in Terminal“
import tensorflow as tf
Advanced Technology Days 14, ATD14, is a two days conference organized by the Microsoft and MS Community in Zagreb the Capital of Croatia. My session about Microsoft Cognitive Toolkit, CNTK on .NET platform held on second day, and I was very happy to talk about this, since only two months ago .NET Core support has finally implemented in the library.
There were more demos that I had time to preset them, so at the end of this blog you can find link for all demos and presentation file. Also the information about data sets need to be downloaded prior to run examples are placed in the code. The last demo about ANNdotNET you can find on https://bhrnjica.net/anndotnet The demos and presentation file can be found at this location: https://1drv.ms/f/s!AgPZDj-_uxGLhY1pCCODeT03qK_T3A
When building deep learning models, it is often required to check the model for consistency and proper parameters definition. In ANNdotNET, ml network models are designed using Visual Network Designer (VND), so it is easy to see the network configuration. Beside VND, in ANNdotNET there are several visualization features on different level: network preparation, model training phase, post training evaluation, performance analysis, and export results. In this blog post we will learn how to use those features when working with deep learning models
Visualization during network preparation and model training
When preparing network and training parameters, we need information about data sets, input format and output type. This information is relevant for selecting what type of network model to configure, what types of layers we will use, and what learner to select. For example the flowing image shows network configuration containing of 2 embedding layers, 3 dense layers and 2 dropout layers. This network configuration is used to train CNTK model for mushroom data set. As can be seen network layers are arranged as listbox items, and the user has possibility to see, on the highest level, how neural networks looks like, which layers are included in the network, and how many dimensions each layer is defined. This is very helpful, since it provides the way of building network very quickly and accurately, and it requires much less times in comparisons to use traditional way of coding the network in python, or other programming language.
ANNdotNET Network Settings page provides pretty much information about the network, input and output layers, what data set are defined, as well as whole network configuration arranged in layers. Beside network related information, the Network Settings tab page also provides the learning parameters for the network training. More about Visual Network Designer the ready can find on one of the previous blog post.
Since ANNdotNET implements MLEngine which is based on CNTK, so all CNTK related visualization features could be used. The CNTK library provides rich set of visualizations. For example you can use Tensorboard in CNTK for visualization not just computational graph, but also training history, model evaluation etc. Beside Tensorboard, CNTK provides logger module which uses Graphviz tool for visualizing network graph. The bad news of this is that all above features cannot be run on C#, since those implementation are available only in python.
This is one of the main reason why ANNdotNET provides rich set of visualizations for .NET platform. This includes: training history, model evaluation for training and validation data set, as well as model performance analysis. The following image show some of the visualization features: the training history (loss and evaluation) of minibatches during training of mushroom model:
Moreover, the following image shows evaluation of training and validation set for each iteration during training:
Those graphs are generated during training phase, so the user can see what is happening with the model. This is of tremendous help, when deciding when to stop the training process, or are training parameters produce good model at all, or this can be helpful in case when can stop and change parameters values. In case we need to stop the training process immediately, ANNdotNET provides Stop command which stops training process at any time.
Model performance visualization
Once the model is trained, ANNdotNET provides performance analysis tool for all three types of ML problems: regression, binary and multi class classification.
Since the mushrooms project is binary ML problem the following image shows the performance of the trained model:
Using Graphviz to visualize CNTK network graph in C#
We have seen that ANNdotNET provides all types of visualizations CNTK models, and those features are provided by mouse click through the GUI interfaces. One more feature are coming to ANNdotNET v1.1 which uses Grpahviz to visualize CNTK network graph. The feature is implemented based on original CNTK python implementation with some modification and style.
In order to use Graphviz to visualize network computation graph the following requirements must be met:
ANNdotNET – is an open source project for deep learning written in C# for developing and training deep learning models. The project is based on Microsoft CNTK (CogNitive ToolKit) Microsoft open source library for deep learning. It is supposed to be higher API for deep learning in .NET, but also provides, data preparation and transformation from rawDataSet into mlready dataset, monitoring the training process with additional evaluation functions, capability of early stopping during training, model evaluation and validation, exporting and deployment options.
The process of creating, training, evaluating and exporting models is provided from the GUI Application and does not require knowledge for supported programming languages.
The ANNdotNET is ideal in several scenarios when user want:
more focus on neural network development and training process using on classic desktop approach, instead of focusing on coding,
less time spending on debugging source code and peripheral tasks like installing and updating packages, debugging errors in the code, and more focusing on different configuration and parameter variants,
to model and is not familiar with supported programming languages,
In case the problem requires more advanced custom models, or training process, ANNdotNET CMD provides high level of API for such implementation. All ml configurations developed with GUI tool, can be handled with CMD tool and vice versa.
To get quick introduction to the tool, there are dozens of pre-calculated projects included in the installer which can be opened from the Start page as well as from CMD tool. The projects are based on famous datasets freely distributed on repositories from several categories: regression, binary and multi-class classification problems, image classifications, times series, etc.
This version brings upgrade of Machine Learning Engine and set of minor bug fixes identified in the application.
The following enhancements has been made in this release
The ANNdotNET MLEngine now relies on CNTK 2.6.
Information about data sets has been added to Network Page
Chart controls on Training and Evaluation pages are simplified and improved visibility.
Refresh button has been removed and added automatic model evaluation.
Test Tab Page had bug which add new rows whenever the user press Evaluate button.
ANNdotNET v1.0 has been release a few weeks ago, and the feedback is very positive. Also up to now there is no any blocking or serious bug in the release which makes me very happy. For this blog post we are going through Export options in ANNdotNET.
The ANNdotNET supposed to be an application which can offer whole life-cycle for machine learning project: from the defining raw data set, cleaning and features engineering, to training and evaluation of the model. Also with different mlconfig files within the same project, the user has ability to create as many ml configurations as wants. Once the user select the best ml configuration, and the training and evaluation process completes, the next step in ML project life-cycle is the model deployment/export.
Currently, ANNdotNET defines three export options:
Export model result to CSV file,
Export model and model result to Excel, and
Export model in CNTK file format.
With those three export option, we can achieve many ML scenarios.
Export to CSV
Export to CSV provides exporting actual and predicted values of testing data set to comma separated txt file. In case the testing data set is not provided, the result of validation data set will exported. In case nor testing nor validation dataset are not provided the export process is terminated.
The export process starts by selecting appropriate mlconfig file. The network model must be trained prior to be exported.
Once the export process completes, the csv file is created on disk. We can import the exported result in Excel, and similar content will be shows as image below:
Exported result is shows in two columns. The actual and predicted values. In case the classification result is exported, in the header the information about class values are exported.
Export to Excel
Export to Excel option is more than just exporting the result. In fact, it is deployment of the model into Excel environment. Beside exporting all defined data sets (training, Validation, and Test) the model is also exported. Predicted values are calculated by using ANNdotNET Excel Add-in, which the model evaluation looks like calling ordinary Excel formula. More information how it works can be found here.
Exported xlsx file can be opened, and the further analysis for the model and related data sets can be continued. The following image shows exported model for Concrete Slum Test example. Since only two data sets are defined (training and validation) those data sets are exported. As can be seen the predicted column is not filled, only the row is filled with the formula that must be evaluated by inserting equal sign “=” in front of the formula.
Once the formula is evaluated for the first row, we can use Excel trick to copy it on other rows.
The same situation is for other data sets separated in Excel Worksheets.
Export to CNTK
The last option allows to export CNTK trained model in CNTK format. Also ONNX format will be supported as soon as being available on CNTK for C# library. This option is handy in situation where trained CNTK model being evaluated in other solutions.
For this blog post, there is a short video which the reader can see all three options in actions.
The October 2018 issue of MSDN magazine brings the article “Sentiment Analysis Using CNTK” written by James McCaffrey. I was wondering if I can implement this solution in ANNdotNET as Dr. McCaffrey written in the magazine. Indeed I have implemented complete solution in less than 5 minutes.
In this blog post I am going to walk you through this very good and well written MSDN article example. I am not going to repeat the text written in the MSDN article, so it is recommendation to read the article first, and back here and implement the example in ANNdotNET. Since the ANNdotNET is GUI tool, it is interesting to see all great visualizations during the model training and evaluation. Also the ANNdotNET provides complete binary model evaluation by providing the confusion matrix, ROC Curve, and other binary performance parameters, this example makes more interesting and valuable to read.
Whole example is implemented in five steps.
Step 1: Prepare files and folder structure
First we need to create several folders and files in order to create empty annproject. This manual creation of folders are necessary because ANNdotNET v1.0 has not option to create Empty project. This will be added in the next version.
So first, create the following set of hierarchically ordered folders:
The following figure shows this set of folder.
Step 2: Download data sets used in the example.
Only thing we need from the MSDN article is train and test data sets. The data can be downloaded from the MSDN sample: Code_McCaffreyTestRun1018.zip. Once the zip file is downloaded unzip the sample, and copy files: imdb_sparse_train_50w.txt and indb_sparse_test_50w.txt to data folder as image above shows.
Step 3: Create MoviewReview.ann and LSTM-Net.mlconfig files
Open Notepad and create file with the following content:
Save file in SentimenAnalysis folder as MovieReview.ann. The following picture shows saved annproject file on disk.
Now open Notepad again, create a new empty file. The empty file is supposed to be mlconfig file with the content shown below. Don’t worry about the content of the file, since all those details will be visible once we open it with ANNdotNET. If you want to know more about structure of the mlconfig file, please refer to this wiki page of the ANNdotNET project.
The file should be saved in the MovieReview folder with LSTM-Net.mlconfig file name. The next image shows where mlconfig file is stored.
Step 4. Open annproject file with ANNdotNET GUI tool
Now we have setup everything in order to open and train sentiment analysis example with ANNdotNET. Since ANNdotNET implements MLEngine which is based on CNTK, data sets are compatible and can be read by the trainer. In order to get better result we have changed learning parameter a little bit. Instead of SGD we used AdamLearner.
In case you don’t have ANNdotNET tool installed on your machine, just go to release section and download the latest version. Or clone the GitHub repository and run it within the Visual Studio. All information about how to run ANNdotNET as standalone application or as the Visual Studio solution can be found at GitHub page https://github.com/bhrnjica/anndotnet.
After simple unzipping binaries of the ANNdotNET on your machine, run it by simply selecting anndotnet.wnd.exe file. Once the ANNdotNET is running, click the Open application command and select the MoveReview.ann file. In a second the application loads the project with corresponded mlconfig file. From the project explorer, click on LSTM-NET three item, and similar content as image below should be appeared.
Everything we have written into mlconfig file are now shown in the Network settings tab page.
Now that we reviewed the network settings, we can switch to the train tab page, and review the training parameters. Since we already setup training parameters in the mlconfig file, we don’t need to change anything.
Start training process by click on the Run application command. After some time we should see the following result:
If we switch to Evaluation page we can perform some statistics analysis in order to evaluate if the model is good or not. Once the evaluation tab page is shown, click on Refresh button to evaluate the model against training and validation data stets.
The left statistics are for the training dataset, and the left side is for the validation data set. As can be seen, the model perfectly predicted all data from the training data set, and about 70% of accuracy described the validation data set. Off cource, the model is not good as we expected for the production, but for this demonstration is good enough. There are also two buttons to show ROC curve, and other binary performance parameters, for both data sets, which the reader my taste.
That’s all needed in order to have complete Sentiment Analysis exemple setup and running. In case you want complete ANNdotNET project, it can be downloaded from here.