Announcing GPdotNET v5 and related Book


https://www.igi-global.com/Images/Covers/9781522560050.png

After one year of writing and coding, finally I  can announce my two big achievements which are related to each other:

1. The fifth version of my open source project GPdotNET – genetic programming tool, and

2. The book:Optimized Genetic Programming Applications: Emerging Research and Opportunities, published by IGI-GLobal.

Along the book, I was developing GPdotNET application which is explained in Chapter 5. Actually the Chapter 5 described in depth all aspects of the application, with real world examples.

As can be seen GPdotNET v5 is completely rewritten application, with new logo and GUI. As Introduction of the application I have prepared several videos on youtube with quick explanation how to use some of the main modules in GPdotNET.

Advertisements

Using ANNdotNET – GUI tool to create CNTK based model for Iris data set


In this tutorial we are going to create and train Iris model using ANNdotNET.  ANNdotNET is windows application for creating and training CNTK based models without leaving GUI.

All procedures from downloading the data set, to exporting model, can be achieved in 6 steps.

1. Step: Download the data set file from https://archive.ics.uci.edu/ml/machine-learning-databases/iris/iris.data.

2. Step: Open ANNdotNET application. Press New command, select Project 1 tree item and rename the project  into Iris Data Set.

2. Step: Select Data Command from Model Preparation ribbon group, Click File button from Import experimenal data dialog and select the recently downloaded file. Check Comma check box and press Import Data button.

3. Steps: Double click on Scaling for each column, and select MinMax normalization option from the popup ComboBox list. Double click on Type for the output column, and select Category, and 1:N for encoding. More information how to prepare data for ML you can find at https://bhrnjica.net/2018/03/01/data-preparation-tool-for-machine-learning/

4. Steps: Once the data is prepared Click Create Model Command and Model Settings panel is shown. Setup parameters as shown on the image below and click Run command.

5. Steps: Once the model is trained you can evaluate model by selecting Evaluate Command. Depending on the model type (regression, Binary or Multi class classification) The appropriate Evaluation dialog appears. Since this is multi class classification model, the Confusion matrix is shows, with micro and macron performance parameters.

6. Steps: For further analysis you can export model to Excel, or into ONNX. Also you can save the project which can later be opened and retrained again.

Note: Currently ANNdotNET is in alpha version, and more feature will come in near future.

ANNdotNET – the first GUI based CNTK tool


anndotnet-logo
ANNdotNET is windows desktop application written in C# for creating and training ANN models. The application relies on Microsoft Cognitive Toolkit, CNTK, and it is supposed to be GUI tool for CNTK library with extensions in data preprocessing, model evaluation and exporting capabilities. It is hosted at GitHub and can be clone from http://github.com/bhrnjica/anndotnet

Currently, ANNdotNET supports the folowing type of ANN:

  • Simple Feed Forward NN
  • Deep Feed Forward NN
  • Recurrent NN with LSTM

The process of creating, training, evaluating and exporting models is provided from the GUI Application and does not require knowledge for supported programming languages. The ANNdotNET is ideal for engineers which are not familiar with programming languages.

Software Requirements

ANNdotNET is x64 Windows desktop application which is running on .NET Framework 4.7.1. In order to run the application, the following requirements must be met:

– Windows 7, 8 or 10 with x64 architecture
– NET Framework 4.7.1
– CPU/GPU support.

Note: The application automatically detect GPU capability on your machine and use it in training and evaluation, otherwise it will use CPU.

How to run application

In order to run the application there are two possibilities:

Clone the GitHub repository of the application and open it in Visual Studio 2017.

  1. Change build architecture into x64, build and run the application.
  2. Download released version unzip and run ANNdotNET.exe.

The following three short videos quickly show how to create, train and evaluate regression, binary and multi class classification models.

  • Training regression model. Data set is Concrete Slump Test is downloaded from the UCI ML Repository and loaded into ANNdotNET without any modification, since the data preparation module can prepare it.

2. Training and evaluation binary classifier model. Data represent Titanic data set downloaded from the public repository.

3. Training and evaluation multi class classification models. Data represents Iris data set downloaded from the same page as above.

Announcement of GPdotNET v5 and ANNdotNET v1.0


As you already know GPdotNET v4 tool consists of several modules which include:

  • GP module for creating and training models based on genetic programming,
  • ANN module for creating and training models based on Feed Forward Neural Networks,
  • GA module for model and function optimization using Genetic Algorithm
  • LGA module is for  linear programming with GA which includes solving Traveling Salesman based problems, Assignment and Transportation problems.

With the latest release the GPdotNET has changed a lot. First of all, the initial idea about GPdotNET was to provide GP method in the application. And as the project grew lot of new implementations were included in the main project. This year I decided to make two different projects which can be seen as the natural evolution of GPdotNET v4.

The first project remain the same which follows the previous version and it is called GPdotNET v5. The project includes only GP related algorithm implementation which is developed for creating and training supervised ML problems (regression, binary and multi-class classification).

The second project uses several ANN algorithms for creating and training supervised machine learning problems.  The project is called ANNdotNET. It is Windows Forms desktop application very similar with GPdotNET, for creating and training ANN models.

I am very prod to announce that the new version of GPdotNET will be released as two  different open source projects.

gpdotnet-evolution

  1. GPdotNET v5 – which is hosted at the same address as previous. The older version GPdotNET v4 has moved at http://github.com/bhrnjica/gpdotnetv4  – and will be the latest version for non GP and ANN modules in GPdotNET.
  2. ANNdotNET v1 – is hosted at separate repository http://github.com/bhrnjica/anndotnet.

 

Highlighted text is not replaced after typing new one in MS Word


Few days ago, I realized that once the text is selected cannot be replaced with new ones  typed in MS Word, and probably in other MS Office products. This is very annoying options. Then I realized that the option for that is unchecked. I am pretty sure I didn’t uncheck the option, so the other reason would be after update installation.

However, I wanted to back previous feature since I need that option, specially when you try to replace some text in Word Equation. So in order to return the previous option to replace the highlighted text with new one, you should go to:

File->Options->Advanced

Data Preparation Tool for Machine Learning


Regardless of machine learning library you use, the data preparation is the first and one of the most important step in developing predictive models. It is very often case that the data supposed to be used for the training is dirty with lot of unnecessary columns, full of missing values, un-formatted numbers etc. Before training the data must be cleaned and properly defined in order to get good model. This is known as data preparation. The data preparation consist of cleaning the data, defining features and labels, deriving the new features from the existing data, handling missing values, scaling the data etc.  It can be concluded that the total time we spend in ML modelling,the most of it is related to data preparation.

In this blog post I am going to present the simple tool which can significantly reduce the preparation time for ML. The tool simply loads the data in to GUI, and then the user can define all necessary information. Once the data is prepared user can store the data it to files which can be then directly imported into ML algorithm such as CNTK.

The following image shows the ML Data Preparation Tool main window.

From the image above, the data preparation can be achieved in several steps.

  1. Load dirty data into ML Prep Tool, by pressing Import Data button
  2. Transform the data by providing the flowing:
    1. Type – each column can be:
      1. Numeric – which holds continuous numeric values,
      2. Binary – which indicates two class categorical data,
      3. Category – which indicates categorical data with more than two classes,
      4. String – which indicate the column will not be part of training and testing data set,
    2. Encoding – in case of Binary and Category column type, the encoding must be defined. The flowing encoding is supported:
      1. Binary Encoding with (0,1) – first binary values will be 0, and second binary values will be 1.
      2. Binary encoding with (-1,1) – first binary values will be -1, and second binary values will be 1.
      3. Category Level- which each class treats as numeric value. In case of 3 categories(R,G, B), encoding will be (0,1,2)
      4. Category 1:N- implements One-Hot vector with N columns. In case of 3 categories(R,G, B), encoding will be R =  (1,0,0),G =  (0,1,0), B =  (0,0,1).
      5. Category 1:N-1(0) – implements dummy coding with N-1 columns. In case of 3 categories(R, G, B), encoding will be R =  (1,0),G =  (0,1), B =  (0,0).
      6. Category 1:N-1(-1) – implements dummy coding with N-1 columns. In case of 3 categories(R, G, B), encoding will be R =  (1,0),G =  (0,1), B =  (-1,-1).
    3. Variable – defines features and label. Only one label, and at least one features can be defined. Also the column can be defined as Ignore variable, which will skip that column.  The following options are sported:
      1. Input – which identifies the column as feature or predictor,
      2. Output – which identifies the column as label or model output.
    4. Scaling – defines column scaling. Two scaling options are supported:
      1. MinMax,
      2. Gauss Standardization,
    5. Missing Values – defines the replacement for the missing value withing the column. There are several options related to numeric and two options (Random and Mode ) for categorical type.
  3. Define the testing data set size by providing information of row numbers or percent.
  4. Define export options
  5. Press Export Button.

As can be seen this is straightforward workflow of data preparation.

Besides the general export options which can be achieved by selecting different delimiter options, you can export data set in to CNTK format, which is very handy if you play with CNTK.

After data transformations, the user need to check CNTK format int the export options and press Export in order to get CNTK training and testing files, which can be directly used in the code without any modifications.

Some of examples will be provided int he next blog post.

The project is hosted at GitHub, where the source code can be freely downloaded and used at this location: https://github.com/bhrnjica/MLDataPreparationTool .

In case you want only binaries, the release of version v1.0 is published here: https://github.com/bhrnjica/MLDataPreparationTool/releases/tag/v1.0

Using CNTK and C# to train Mario to drive Kart


Introduction

In this blog post I am going to explain one of possible way how to implement Deep Learning ML to play video game. For this purpose I used the following:

  1.  N64 Nintendo emulator which can be found here,
  2. Mario Kart 64 ROM, which can be found on internet as well,
  3. CNTK – Microsoft Cognitive Toolkit
  4. .NET Framework and C#

The idea behind this machine learning project is to capture images together with action, while you play Mario Kart game. Then captured images are transformed into features of training data set, and action keys into label hot vectors respectively.  Since we need to capture images, the emulator should be positioned at fixed location and size during playing the game, as well as during testing algorithm to play game. The flowing image shows N64 emulator graphics configuration settings.

2018-02-15_16-34-03

Also the N64 emulator is positioned to Top-Left corned of screen, so it is easier to capture the images.

Data collection for training data set

During image captures game is played as you would play normally. Also no special agent, not platform is required.

In .NET and C# it is implemented image capture from the specific position of screen, as well as it is recorded which keys are pressed during game play. In order to record keys press, the code found here is modified and used.

The flowing image shows the position of N64 emulator with playing Mario Kart game (1), the windows which is capture and transform the image (2), and the application which collect images, and key press action and generated training data set into file(3).

2018-02-15_16-31-42

The data is generated on the following way:

  • each image is captured, resized to 100×74 pixels and gray scaled prior to be transformed and persisted to data set training file.
  • before image is persisted the hotkey of action key press is recorded and connected to image.

So the training data is persisted into CNTK format which consist of:

  1. |label – which represent 5 component hot vector, indicate: Forward, Break, Forward-Left, Forward-Right and None (1 0 0 0 0)
  2. |features consist of 100×74 numbers which represent pixels of the images.

The following data sample shows how training data set are persisted in the txt file:

|label 1 0 0 0 0 |features 202 202 202 202 202 202 204 189 234 209 199...
|label 0 1 0 0 0 |features 201 201 201 201 201 201 201 201 203 18...
|label 0 0 1 0 0 |features 199 199 199 199 199 199 199 199 199 19...
|label 0 0 0 1 1 |features 199 199 199 199 199 199 199 199 199 19...

Since my training data is more than 300 000 MB of size, I provided just few BM sized file, but you can generate file as big as you wish with just playing the game, and running the flowing code from Program.cs file:

await GenerateData.Start();

Training Model to play the game

Once we generate the data, we can move to the next step: training RCNN model to play the game. For training model the CNTK is used. Also since we play a game and previous sequence will determined the next sequence in the game, LSTM RNN is used. More information about CNTK and LSTM can be found in previous posts. In my case I have collected nearly 15000 images during several round of playing the same level and route. Also for more accurate model much more images should be collected, nearly 100 000. The model is trained in one hour, with 500000 iterations. The source code about whole project can be found on GitHub page. (http://github.com/bhrnjica/LSTMBotGame )

By running the following code, the training process is started with provided training data:

CNTKDeepNN.Train(DeviceDescriptor.GPUDevice(0));

Playing the game with CNTK model

Once we trained the model, we move to the next step: playing a game. The emulator should be positioned on the same position and with the same size in order to play the game.ONce the model is trained and created in th training folder, the playing game can be achive by running:

var dev = DeviceDescriptor.CPUDevice;
MarioKartPlay.LoadModel(“../../../../training/mario_kart_modelv1”, dev);
MarioKartPlay.PlayGame(dev);

How it looks like on my case, you can see on this youtube video: