ANNdotNET – the first GUI based CNTK tool


anndotnet-logo
ANNdotNET is windows desktop application written in C# for creating and training ANN models. The application relies on Microsoft Cognitive Toolkit, CNTK, and it is supposed to be GUI tool for CNTK library with extensions in data preprocessing, model evaluation and exporting capabilities. It is hosted at GitHub and can be clone from http://github.com/bhrnjica/anndotnet

Currently, ANNdotNET supports the folowing type of ANN:

  • Simple Feed Forward NN
  • Deep Feed Forward NN
  • Recurrent NN with LSTM

The process of creating, training, evaluating and exporting models is provided from the GUI Application and does not require knowledge for supported programming languages. The ANNdotNET is ideal for engineers which are not familiar with programming languages.

Software Requirements

ANNdotNET is x64 Windows desktop application which is running on .NET Framework 4.7.1. In order to run the application, the following requirements must be met:

– Windows 7, 8 or 10 with x64 architecture
– NET Framework 4.7.1
– CPU/GPU support.

Note: The application automatically detect GPU capability on your machine and use it in training and evaluation, otherwise it will use CPU.

How to run application

In order to run the application there are two possibilities:

Clone the GitHub repository of the application and open it in Visual Studio 2017.

  1. Change build architecture into x64, build and run the application.
  2. Download released version unzip and run ANNdotNET.exe.

The following three short videos quickly show how to create, train and evaluate regression, binary and multi class classification models.

  • Training regression model. Data set is Concrete Slump Test is downloaded from the UCI ML Repository and loaded into ANNdotNET without any modification, since the data preparation module can prepare it.

2. Training and evaluation binary classifier model. Data represent Titanic data set downloaded from the public repository.

3. Training and evaluation multi class classification models. Data represents Iris data set downloaded from the same page as above.

Advertisements

Announcement of GPdotNET v5 and ANNdotNET v1.0


As you already know GPdotNET v4 tool consists of several modules which include:

  • GP module for creating and training models based on genetic programming,
  • ANN module for creating and training models based on Feed Forward Neural Networks,
  • GA module for model and function optimization using Genetic Algorithm
  • LGA module is for  linear programming with GA which includes solving Traveling Salesman based problems, Assignment and Transportation problems.

With the latest release the GPdotNET has changed a lot. First of all, the initial idea about GPdotNET was to provide GP method in the application. And as the project grew lot of new implementations were included in the main project. This year I decided to make two different projects which can be seen as the natural evolution of GPdotNET v4.

The first project remain the same which follows the previous version and it is called GPdotNET v5. The project includes only GP related algorithm implementation which is developed for creating and training supervised ML problems (regression, binary and multi-class classification).

The second project uses several ANN algorithms for creating and training supervised machine learning problems.  The project is called ANNdotNET. It is Windows Forms desktop application very similar with GPdotNET, for creating and training ANN models.

I am very prod to announce that the new version of GPdotNET will be released as two  different open source projects.

gpdotnet-evolution

  1. GPdotNET v5 – which is hosted at the same address as previous. The older version GPdotNET v4 has moved at http://github.com/bhrnjica/gpdotnetv4  – and will be the latest version for non GP and ANN modules in GPdotNET.
  2. ANNdotNET v1 – is hosted at separate repository http://github.com/bhrnjica/anndotnet.

 

Highlighted text is not replaced after typing new one in MS Word


Few days ago, I realized that once the text is selected cannot be replaced with new ones  typed in MS Word, and probably in other MS Office products. This is very annoying options. Then I realized that the option for that is unchecked. I am pretty sure I didn’t uncheck the option, so the other reason would be after update installation.

However, I wanted to back previous feature since I need that option, specially when you try to replace some text in Word Equation. So in order to return the previous option to replace the highlighted text with new one, you should go to:

File->Options->Advanced

Data Preparation Tool for Machine Learning


Regardless of machine learning library you use, the data preparation is the first and one of the most important step in developing predictive models. It is very often case that the data supposed to be used for the training is dirty with lot of unnecessary columns, full of missing values, un-formatted numbers etc. Before training the data must be cleaned and properly defined in order to get good model. This is known as data preparation. The data preparation consist of cleaning the data, defining features and labels, deriving the new features from the existing data, handling missing values, scaling the data etc.  It can be concluded that the total time we spend in ML modelling,the most of it is related to data preparation.

In this blog post I am going to present the simple tool which can significantly reduce the preparation time for ML. The tool simply loads the data in to GUI, and then the user can define all necessary information. Once the data is prepared user can store the data it to files which can be then directly imported into ML algorithm such as CNTK.

The following image shows the ML Data Preparation Tool main window.

From the image above, the data preparation can be achieved in several steps.

  1. Load dirty data into ML Prep Tool, by pressing Import Data button
  2. Transform the data by providing the flowing:
    1. Type – each column can be:
      1. Numeric – which holds continuous numeric values,
      2. Binary – which indicates two class categorical data,
      3. Category – which indicates categorical data with more than two classes,
      4. String – which indicate the column will not be part of training and testing data set,
    2. Encoding – in case of Binary and Category column type, the encoding must be defined. The flowing encoding is supported:
      1. Binary Encoding with (0,1) – first binary values will be 0, and second binary values will be 1.
      2. Binary encoding with (-1,1) – first binary values will be -1, and second binary values will be 1.
      3. Category Level- which each class treats as numeric value. In case of 3 categories(R,G, B), encoding will be (0,1,2)
      4. Category 1:N- implements One-Hot vector with N columns. In case of 3 categories(R,G, B), encoding will be R =  (1,0,0),G =  (0,1,0), B =  (0,0,1).
      5. Category 1:N-1(0) – implements dummy coding with N-1 columns. In case of 3 categories(R, G, B), encoding will be R =  (1,0),G =  (0,1), B =  (0,0).
      6. Category 1:N-1(-1) – implements dummy coding with N-1 columns. In case of 3 categories(R, G, B), encoding will be R =  (1,0),G =  (0,1), B =  (-1,-1).
    3. Variable – defines features and label. Only one label, and at least one features can be defined. Also the column can be defined as Ignore variable, which will skip that column.  The following options are sported:
      1. Input – which identifies the column as feature or predictor,
      2. Output – which identifies the column as label or model output.
    4. Scaling – defines column scaling. Two scaling options are supported:
      1. MinMax,
      2. Gauss Standardization,
    5. Missing Values – defines the replacement for the missing value withing the column. There are several options related to numeric and two options (Random and Mode ) for categorical type.
  3. Define the testing data set size by providing information of row numbers or percent.
  4. Define export options
  5. Press Export Button.

As can be seen this is straightforward workflow of data preparation.

Besides the general export options which can be achieved by selecting different delimiter options, you can export data set in to CNTK format, which is very handy if you play with CNTK.

After data transformations, the user need to check CNTK format int the export options and press Export in order to get CNTK training and testing files, which can be directly used in the code without any modifications.

Some of examples will be provided int he next blog post.

The project is hosted at GitHub, where the source code can be freely downloaded and used at this location: https://github.com/bhrnjica/MLDataPreparationTool .

In case you want only binaries, the release of version v1.0 is published here: https://github.com/bhrnjica/MLDataPreparationTool/releases/tag/v1.0