Machine Learning – the robots are coming and they are in neat downloadable often opensource packages.

People that Learn.

The fundamental principal of Learning is something that is so inherently easy to understand it ends up a challenge to express (almost like the problem of explain the concept of Left and Right). We Learn every day, hour, and second we complete any form of activity.

We Learn all the time

Even in our daily commute any individual will continuously update, correct, analyze and decipher patterns and situations. In order to understand and adapt to our surroundings we analyze and dissect situations daily. From crossing the road at a certain intersection to changing cashier desk in the supermarket we employ learning techniques continuously.

Data In, Learning Out

Daily, we call on our own personal experiences, past information and data which we have in some organic or unconscious way, labeled and categorized. As with any learning exercise, the more data we collect and the better the quality, the easier it is to guarantee or predict the accuracy of the results.

In our own human experience we take part in a complex methodological process where new and old data plays a key part. The essence of this complex learning process, when it happens, can roughly be divided up into three steps:

1.     Observe and take note of the data

2.     Learn a lesson

3.     Apply the lesson to new data.

Machines that Learn

And the general human learning experience can be mapped elegantly yo Machine Learning basics. Machine Learning is nothing more than these steps as a function or computational method. Using software packages, user data, experience, models and algorithms to improve insight or to make predictions.

Arthur Samuel one of the first thought leader in the field of machine learning once defined the field as “…study that gives computers the ability to learn without being explicitly programmed…”.

The only short fall with his assessment, is that no one could have predicted just how much better computers, robots, algorithms would be at recognizing patterns than humans. Add to this the fact there has been an explosion of data. The larger the data set and the better trained the algorithm, the more accurate the results. All of this means that once trained, the algorithm can take new and old data and past experience to predict an outcome much faster than any human. A trend that will only continue.

Where will you see the machines?

As a result Machine Learning’s capacity and openness to Learning, it has taken a foothold in a multitude of sectors. None more significantly impacted than the Finance industry.

Today machine learning has an enormous use in Finance. Not least in taking the drudgery out of many repetitive back office functions. In other areas of Finance, Machine Learning can be trained to solve more complex problems like credit assessment problems or detect fraud. Other departments that once employed armies of personnel are now being dwindled down to a bare bones staff thanks to Machine Learning algorithms. Issues like Compliance, risk exposure, fraud prevention and insider trading can now be handed off to Machine Learning.

Where do we go from here if the Machines learn everything.

Machine Learning stands out in its ability to spot unusual patterns and sift through tons of documents daily in the search for unusual patterns or finding blind spots that would take humans days or weeks to execute. Tasks that are among the most laborious and time consuming for humans are find a happy home with a computer algorithm.

That said, the conversation about automation and the future world of work always sparks one question; Will machines do everything?  If the recent trend of hiring rock star mathematicians and data scientists is anything to go by then the answer is no. The future of Machine Learning is simply an evolution of a job task that exists already.

The nuts and bolts of Machine Learning

And therein lies the key to understanding what Machine Learning is about. New computational methods. Beyond the glamorous and somewhat glitzy background of Machine Learning there are huge new complex sets of machine languages, packages, toolkit, libraries and methods that are making the Machine Learning revolution necessary.

People that Learn that the Machine can learn…but someone needs to compile them.

In order to help guide who is new or interested in pursuing the world of Machine Learning I have divided the world of Machine Learning into the most popular and used machine learning tools available and just what they do.

C++ based machine learning packaged and what they do.

CRF++  – a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data. CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.

OpenCV – The Open Source Computer Vision Library –  Is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.

Caffe: Is a fast open framework for deep learning.

DSSTNE  – Amazon has made a significant contribution into the world of open-source software for deep learning by releasing a library called DSSTNE on GitHub under an open-source Apache license.

CNTK Library API – From C++ and Java you can access the CNTK model evaluation facilities. CNTK core computational, neural network composition & training, as well as model evaluation facilities are exposed through Python and C++. There is also CNTK Python API consists of abstractions for model definition and compute, learning algorithms, data reading and distributed training.

Not to be left out there are plenty of Machine Learning Packages for Java.

Apache Mahout™ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.

MALLET  – MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.

Deeplearning4j – Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers.  deeplearning4j  is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Skymind is its commercial support arm.

H2O – H2O makes it possible for anyone to easily apply machine learning and predictive analytics to solve today’s most challenging business problems.

MLib – MLlib is Apache Spark’s scalable machine learning library.Ease of Use Usable in Java, Scala, Python, and R.

Weka3 Data Mining Software in Java – Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.

And the list would not be complete without Python

NLTK – NLTK the undisputed python Natural Language Toolkit is a platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum to get started.

OpenAI Gym BETA – OpenAI gym for Python is a toolkit for developing and comparing reinforcement learning algorithms. Open source interface to reinforcement learning tasks. The gym open-source project provides a simple interface to a growing collection of reinforcement learning tasks. You can use it from Python, and soon from other languages.

scikit-learn – Is is for Machine Learning in Python. Its a simple and efficient set of tools for data mining and data analysis. Its open andaccessible to everybody, and reusable in various contexts. Built on NumPy, SciPy, and matplotlib Open source, commercially usable – BSD license

Keras: The Python Deep Learning library Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlowCNTK or Theano. It was developed with a focus on enabling fast experimentation.

Theano – Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently.

Lasagne for Python – Lasagne is a lightweight library to build and train neural networks in Theano. Be advised that Lasagne is a work in progress, so if you are n thye mood and you are of the Open source mind set your  input is welcome. The available documentation is limited for now. The project is on GitHub. Best palce to being with this is The Lasagne user guide that explains how to install Lasagne, how to build and train neural networks using Lasagne, and how to contribute to the library as a developer.

MxNet  – MxNext for Python A Flexible and Efficient Library for Deep Learning. For Machine Learnging it’s a Flexible library that Supports both imperative and symbolic programming. It runs on Runs on CPUs or GPUs, servers, desktops, or mobile phones MxNet supports Multiple Languages. From a computer language stand point it Supports C++, Python, R, Scala, Julia, Perl, Matlab and Javascript – All with the same amazing performance. All of which can be Distributed on Cloud Supports distributed training on multiple CPU/GPU machines, including AWS, GCE, Azure, and Yarn clusters

NewtorkX for Python – NextworkX is a High-productivity software for complex networks NetworkX is a Python language software package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

PyMC3 – PyMc3 is a Markov chain Monte carlo sampling toolkit. It oriented towards the world of Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano http://pymc-devs.github.io/pymc3/

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s