Using CNTK 2.2 and Python to learn from Iris data


Now that we have setup CNTK 2.2 and Python we can start with first example. For the first time, we can take the Iris data. The data set has categorical output value which contains three classes : Sentosa, Virglica and Versicolor. The features consist of the 4 real value inputs. The Iris data set can be easily found on  the internet. One of the places is on http://kaggle.com

Usually, the Iris data is given in the flowing format:

Since we are going to use CNTK we should prepare the data in cntk file format, which is far from the format we can see on the previous image. This format has different structure and looks like on the flowing image:

The difference is obvious. To transform the previous file format in to the cntk format it tooks me several minutes and now we can continue with the implementation.

First, lets implement simple python function to read the cntk format. For the implementation we are going to use CNTK MinibatchSource, which is specially developed to handle file data. The flowing python code reads the file and return the MinibatchSource.

import cntk

# The data in the file must satisfied the following format:
# |labels 0 0 1 |features 2.1 7.0 2.2 - the format consist of 4 features and one 3 component hot vector
#represents the iris flowers
def create_reader(path, is_training, input_dim, num_label_classes):

#create the streams separately for the label and for the features
labelStream = cntk.io.StreamDef(field='label', shape=num_label_classes, is_sparse=False)
featureStream = cntk.io.StreamDef(field='features', shape=input_dim, is_sparse=False)

#create deserializer by providing the file path, and related streams
deserailizer = cntk.io.CTFDeserializer(path, cntk.io.StreamDefs(labels = labelStream, features = featureStream))

#create mini batch source as function return
mb = cntk.io.MinibatchSource(deserailizer, randomize = is_training, max_sweeps = cntk.io.INFINITELY_REPEAT if is_training else 1)
return mb

The code above take several arguments:

-path – the file path where the data is stored,

-is_training – Boolean variable which indicates if the data is for training or testing. In case of training the data will be randomized.

– input_dim, num_label_classes are the numbers of the input features and the output hot vector size. Those two arguments are important in order to properly parse the file.

The first method creates the two streams , which are passed as argument in order to create deserializer, and then for minibatchsource creation. The function returns minibatchsource object which the trainer uses for data handling.

Once that we implemented the data reader, we need the python function for model creation. For the Iris data set we are going to create 4-50-3 feed forward neural network, which consist of one input layer with 4 neurons, one hidden layer with 50 neurons and the output layer with 4 neurons. The hidden layer will contain tanh- activation function.

The function which creates the NN model will looks like on the flowing code snippet:

#model creation
# FFNN with one input, one hidden and one output layer 
def create_model(features, hid_dim, out_dim):
    #perform some initialization 
    with cntk.layers.default_options(init = cntk.glorot_uniform()):
        #hidden layer with hid_def number of neurons and tanh activation function
        h1=cntk.layers.Dense(hid_dim, activation= cntk.ops.tanh, name='hidLayer')(features)
        #output layer with out_dim neurons
        o = cntk.layers.Dense(out_dim, activation = None)(h1)
        return o

As can be seen Dense function creates the layer where the user has to specify the dimension of the layer, activation function and the input variable. When the hidden layer is created, input variable is set to the input data. The output layer is created for the hidden layer as input.

The one more helper function would be showing the progress of the learner. The flowing function takes the three arguments and prints the current status of the trainer.

# Function that prints the training progress
def print_training_progress(trainer, mb, frequency):
    training_loss = "NA"
    eval_error = "NA"

    if mb%frequency == 0:
        training_loss = trainer.previous_minibatch_loss_average
        eval_error = trainer.previous_minibatch_evaluation_average
        print ("Minibatch: {0}, Loss: {1:.4f}, Error: {2:.2f}%".format(mb, training_loss, eval_error*100))   
    return mb, training_loss, eval_error

Once we implemented all three functions we can start with CNTK learning on the Iris data.

At the beginning,  we have to specify some helper variable which we will use later.

#setting up the NN type
input_dim=4
hidden_dim = 50
num_output_classes=3
input = cntk.input_variable(input_dim)
label = cntk.input_variable(num_output_classes)

Create the reader for data batching.

# Create the reader to training data set
reader_train= create_reader("C:/sc/Offline/trainData_cntk.txt",True,input_dim, num_output_classes)

Then create the NN model, with Loss and Error functions:

#Create model and Loss and Error function
z= create_model(input, hidden_dim,num_output_classes);
loss = cntk.cross_entropy_with_softmax(z, label)
label_error = cntk.classification_error(z, label)

Then we defined how look like the trainer. The trainer will be with Stochastic Gradient Decadent learner, with learning rate of 0.2

# Instantiate the trainer object to drive the model training
learning_rate = 0.2
lr_schedule = cntk.learning_parameter_schedule(learning_rate)
learner = cntk.sgd(z.parameters, lr_schedule)
trainer = cntk.Trainer(z, (loss, label_error), [learner])

Now we need to defined parameters for learning, and showing results.

# Initialize the parameters for the trainer
minibatch_size = 120 #mini batch size will be full data set
num_iterations = 20 #number of iterations 

# Map the data streams to the input and labels.
input_map = {
label  : reader_train.streams.labels,
input  : reader_train.streams.features
} 
# Run the trainer on and perform model training
training_progress_output_freq = 1

plotdata = {"batchsize":[], "loss":[], "error":[]}

As can be seen the batchsize is set to dataset size which is typical for small data sets.  Since we defined minibach to dataset size, the iteration should be very small value since Iris data is very simple and the learner will find good result very fast.

Running the trainer looks very simple. For each iteration, the reader load the batch size amount of the data, and pass to the trainer. The trainer performs the learning process using SGD learner, and returns the Loss and the error value for the current iteration. Then we call print function to show the progress of the trainer.

for i in range(0, int(num_iterations)):
        # Read a mini batch from the training data file
        data=reader_train.next_minibatch(minibatch_size, input_map=input_map) 
        trainer.train_minibatch(data)
        batchsize, loss, error = print_training_progress(trainer, i, training_progress_output_freq)
        if not (loss == "NA" or error =="NA"):
            plotdata["batchsize"].append(batchsize)
            plotdata["loss"].append(loss)
            plotdata["error"].append(error)

Once the learning process completes, we can perform some result presentation.

# Plot the training loss and the training error
import matplotlib.pyplot as plt

plt.figure(1)
plt.subplot(211)
plt.plot(plotdata["batchsize"], plotdata["loss"], 'b--')
plt.xlabel('Minibatch number')
plt.ylabel('Loss')
plt.title('Minibatch run vs. Training loss')

plt.show()

plt.subplot(212)
plt.plot(plotdata["batchsize"], plotdata["error"], 'r--')
plt.xlabel('Minibatch number')
plt.ylabel('Label Prediction Error')
plt.title('Minibatch run vs. Label Prediction Error')
plt.show()

We plot the Loss and Error function converted in to total accuracy of the classifier. The folowing pictures shows those graphs.

The last part of the ML procedure is the testing or validating the model. FOr the Iris data set we prepare 20 samples which will be used for the testing. The code i similar to the previous, except we call create_reader with different file name. Then we try to evaluate the model and grab the Loss and error values, and print out.

# Read the training data
reader_test = create_reader("C:/sc/Offline/testData_cntk.txt",False, input_dim, num_output_classes)

test_input_map = {
    label  : reader_test.streams.labels,
    input  : reader_test.streams.features,
}

# Test data for trained model
test_minibatch_size = 20
num_samples = 20
num_minibatches_to_test = num_samples // test_minibatch_size
test_result = 0.0

for i in range(num_minibatches_to_test):
    
    data = reader_test.next_minibatch(test_minibatch_size,input_map = test_input_map)
    eval_error = trainer.test_minibatch(data)
    test_result = test_result + eval_error

# Average of evaluation errors of all test minibatches
print("Average test error: {0:.2f}%".format(test_result*100 / num_minibatches_to_test))

Full sample with python code and data set can be found here.

Advertisements

Using CNTK with Visual Studio 2017 and Python


In the next few steps will show how to install CNTK and python environment in Visual Studio 2017.

  1. First download the latest CNTK version from the official GitHub page, or just click on the following link: https://github.com/Microsoft/CNTK/releases

The release page will show the latest bits. Click on the CPU only package, accept the license and download the zip file.

  1. Once that you have zip file on your PC, create the folder C:/local on disk and unzip the package in to it.
  2. The next step performs the installation of the library as well as installation of the Python related distribution anaconda 4.1.1.
  3. Open C:\local\cntk\Scripts\install\windows path and run install.bat file. You will need administrative rights in order to successfully install all required components.
  4. The following image shows the installation process:

  1. As can be seen first you have to run batch file (step 2), then press 1 and ENTER in order to continue with the installation process and press ‘y‘, to perform downloading required components.
  2. The installation process takes several minutes to complete. The first component to be installed is Anaconda 4.1.1 which is needed in order to setup  CNTK.

  1. Once the anaconda is installed, the process of CNTK installation starts and passes very quickly since we already download all CNTK bits.

  1. Now that we have CNTK installed, the last installation step is installation of the Visual Studio Tool for Python.
  2. Run the Visual Studio 2017 Installer and after the installed is show, just select the python components similar picture shows below:

  1. Once the installation is completed run Visual Studio 2017.
  2. From the Visual Studio 2017 Tool menu select Python and then select Python Environment:

  1. From the Python Environment window select Anaconda 4.1.1 and update symbols DB, by pressing the button pointed on the image below:

  1. Once we have environment updated, Press “Make this the default environment for the new projects” option in order to apply the environment for the future Python CNTK based projects.
  2. Also the path for Python and Python scripts should be registered in Global Environment OS.

  1. Once the previous steps are performed successfully, we can start writing CNTK aware python code in Visual Studio 2017.
  2. OPen VS 2017 and Anaconda 4.1.1 environment and type.

   import cntk

print(“CNTK verion:”, cntk__version__)

  1. Similar output should be appear
  2. print(“CNTK version:”, cntk.__version__)

 

Introduction to CNTK – Microsoft Cognitive Toolkit


This summer Microsoft released the CNTK 2.0,  C++ open source, cross-platform and cross-OS library for deep learning based on deep neural network (NN). As they said this is the fastest NN library today, which is several times faster than related competitors libraries: TensorFlow, Caffe, Theno and Torch. From the demos which are included in the library it can be said this is very powerfully library, which can be of huge help for those who is doing Data Science and Machine Learning.

Actually CNTK is created by Microsoft speech researcher in 2012, and few years after it became the open source library at the codeplex site in early 2015. A year later it is moved to GitHub, by announcing CNTK 1.0. The first version released in January 2016 as the open source project on GitHub http://github.com/microsoft/CNTK .

In June this year the CNTK 2.0 is released with lot of improvements and benchmarks. In September CNTK 2.2 is released by fully supporting the .NET platform which allows .NET developers to include the library in .NET based applications.

Beside C#, CNTK support C++ as native support, as well as Python which is proven to be first class citizen for this library.

Later the library will be ported on Java and R.

Bihać Developers MeetUp počeo s radom


MeetUp društvena mreža

Prije 15 godina osnivači http://meetup.com društvene mreže imali su na umu nekoliko interesantnih razmišljanja koja su pobudila veliko interesovanje u svijetu. Osnovali su društvenu mrežu MeetUp, koja spaja ljude iz lokalne zajednice sličnih interesa. Naime, osnovna misija MeetUp društvene mreže jeste organizovanje lokalnih grupa i komunikacija kroz Meetup web site radi organiziranja sastanaka, razmjene iskustava, znanja iz određene oblasti i sl. Misija MeetUp društvene mreže predstavlja osnovni postulat svakog demokratskog društva:

MeetUp misija podrazumijeva revitalizaciju lokalne zajednice, pomoć i podrška ljudima cijelog svijeta za samo-organizovanje. MeetUp misija smatra da ljudi mogu mijenjati svoj lični svijet, ili cijeli svijet svojim samo-organiziranjem, koje postaje dovoljno snažno da ima snage da mijenja stvari oko nas. Vođena na ovoj ideji, na društvenoj mreži MeetUp organizovani su milioni ljudi iz raznih područja, društvenih i privrednih oblasti u pomoviranju znanja, u edukaciji, u promoviranju određenih tehnoloških rješenja i sl.

Koliko je ideja zapravo zdrava i snažna pokazuju brojke:

  • svakog mjeseca ovu društvenu mrežu posjeti približno 10 miliona ljudi,
  • broji 9.5 miliona korisnika
  • mjesečno se organizira 280.000 sastanaka
  • aktivnih lokalnih grupa 92.000
  • različitih meetup tema 90.000
  • lokalne grupe iz 45.000 gradova.

MeetUp spaja ljude sa posebnim interesima, koji mogu biti poslovni, edukacijski, socijalni, i bilo koji drugi interesi. MeetUp može okupljati Facebook prijatelje, ali isto tako i prijatelje koji nisu povezani na drugim društvenim mrežama. MeetUp je poseban još i u tome što on predstavlja jedinstvenu zajednicu koja spaja ljude kroz teme i probleme koje ljudi žele da nametnu i diskutuju licem u lice sa članovima. To je jedina društvena zajednica koja fizički spaja ljude.

Šta je Bihać Developers MeetUp grupa

Na ovom tragu grupa nekoliko entuzijasta, kompjuterskih freakova, zaljubljenika u softvare development, iz različitih platformi odlučila je da pokrene Bihać Developers Meetup grupu, čiji je osnovni zadatak razmjena iskustava i promocija razvoja softvera. Ovi ljudi pokušavaju da ljubav prema programiranju pokušaju prenijeti i na druge ljude jer smatraju da je to jedna blagodat koja čovjeka može učiniti prvenstveno sretnim, može mu obezbijediti poslovnu karijeru ili čovjerku biti lijep i interesantan hobi. Poziv “software developer” ili kako ga neki još uvijek zovu “programer” jedan od rijetkih poziva koji se započinje prvo iz znatiželje i ljubavi prema kompjuterima i programiranju, zatim rješavanju matematičkih i sličnih problema, a poslije se pravi karijera. Ono najvažnije, što je u bezbroj primjera i potvrđeno, je da nije nužno da budeš dobar student ili da studiraš kompjutersku nauku da bi bio dobar programer.

Neke od tema o kojima će se pričati na Meetup sastancima su:

  • razvoja softvera na različitim platformama desktop/web/cloud/mobile i različitim OS-ovima,
  • baze podataka, i ORM Framewors
  • cloud computing,
  • umjetna inteligencija, algoritmi mašinskog učenja
  • i druge srodne teme.

Prisustvo na sastancima je slobodno i potpuno besplatno, a svi oni koji žele da saznaju nešto o ovim temama o kojima želimo pričati su dobro došli. MeetUpi će se organizirati na Tehničkom fakultetu u Bihaću, uz podršku Univerziteta i drugih pravnih subjekata iz Bihaća i Unsko-sanskog kantona.

Kako postati član Bihać Developers MeetUp grupe

Bihać Developers prvenstveno cilja na mlade ljude koji tek žele da saznaju nešto oko razvoja softvera – programirnaja, ali i sve populacije od 7 do 77 godina. Prvenstveno se računa na:

  • studente tehničkih usmjerenja, kao i druge studente. ( Op. Neki moji prijatelji koji danas žive od razvoja softvera su završili filozoski odnosno pravni fakultet :))
  • srednjoškolce koji imaju interesa i smisla za programirnaje
  • mlade ljudi koji žele saznati o ovoj branši
  • zaljubljenike, freakove, geekove i ostale manjine,
  • i sve druge

Da bi postao član Bihać Developers MeetUp grupe poterebno je:

  1. Otvoriti račun sa svojim imena na http://meetup.com
  2. Postati član grupe tako što ćete se pridružiti Bihac Developers Meetup  grupi na web stranici: http://meetup.com/bihacdev
  3. sad kada ste postali član, smo je potrebno pratiti objave i najave predavanja i sastanaka te ukoliko imate vremena doći na Tehnički fakultet i prisustvovati sastanku.

Ko organizira sastanke Bihać Developers MeetUp grupe

Bihać Developers Meetup je nastavnak aktivnosti koje je do sada vodila Bihać .NET user grupa, a intencija je da se ova praksa proširi sa drugim temama i platformama. Ideja je podržana kako od Univerziteta u Bihaću i Tehničkog fakulteta, tako i privatnog sektora i softverske kompanije IDK studio i drugih pojedinaca koje možete pogledati na meetup stranici.

Ko sponzorira sastanke Bihać Developers MeetUp

Kako su sastanci i prisustvovanje potpuno besplatni, a Tehnički fakultet i Univerzitet u Bihaću logistički suport i sponzor, to je sada dovoljno da se predavanja i sastanci održavaju. Naravno, sponzori i zainteresirane kompanije koje žele podržati ovakav vid razmjene znanja i promovisanja visokih tehnologija dobro su došle i mogu nam se javiti da sponzorstvo.

Interesuje me kad se održava predavanje

Prvi sastanak Bihać Developers zakazan je 19. (četvrtak) oktobra 2017. godine u 17:00 na Tehnikom fakultetu. Za prvi put smo se odlučili da pričamo općenito o ovoj temi te da prisutnim ispričamo neke od karijara naših ljudi u gradu Bihaću kako su postali Softver Developeri. Već sljedeći sastanak koji planiramo za novembar biće tematski a predavanje pokriti jednu vrlo zanimljivu Web tehnologiju. Ali otom potom..