For no other reason than personal curiosity I decided to retouch and update the list of machine learning and artificial intelligence libraries and frameworks doing the rounds. Its not going to win me any Pulitzer prize any time soon, but everyone loves lists…so here goes.
C++ based machine learning packaged and what they do.
Encog – an advanced machine learning framework that supports a variety of advanced algorithms, as well as support classes to normalize and process data. Machine learning algorithms such as Support Vector Machines, Neural Networks, Bayesian Networks, Hidden Markov Models, Genetic Programming and Genetic Algorithms are supported. Most Encog training algorithms are multi-threaded and scale well to multicore hardware. A GUI based workbench is also provided to help model and train machine learning algorithms. Encog has been in active development since 2008.
Nvidia Digits – another complete property framework for common deep learning tasks like managing data, designing and training neural networks. Obviously works like a charm on multi-GPU systems. It is completely interactive so that data scientists can focus on designing and training networks rather than programming and debugging.
Intel – A complete framework from Intel to that includes open-source library to develop deep learning frameworks on a variety of compute platforms.
DeepLearning4j – Eclipse Deeplearning4j is written in Java and is compatible with any JVM language, such as Scala, Clojure or Kotlin. The underlying computations are written in C, C++ and Cuda. Keras will serve as the Python API. the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Apache Spark, DL4J brings AI to business environments for use on distributed GPUs and CPUs. With an Commercial model
Eblearn – An open-source C++ library of machine learning by New York University’s machine learning lab, led by Yann LeCun. In particular, implementations of convolutional neural networks with energy-based models along with a GUI, demos and tutorials.
MEKA – is an open source implementation of methods for multi-label learning and evaluation. In multi-label classification, we want to predict multiple output variables for each input instance. This different from the ‘standard’ case (binary, or multi-class classification) which involves only a single target variable. MEKA is based on the WEKA Machine Learning Toolkit; it includes dozens of multi-label methods from the scientific literature, as well as a wrapper to the related
Weka – Waikato Environment for Knowledge Analysis (Weka) is a suite of machine learning software written in Java, developed at the University of Waikato, New Zealand. It is free software licensed under the GNU . Weka contains a collection of visualization tools and algorithms for data analysis and predictive modeling, together with graphical user interfaces for easy access to these functions
CRF++ – Designed for generic purpose NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking, CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data.
OpenCV – The Open Source Computer Vision Library – Is an open source computer vision and machine learning software library. OpenCV was built to provide a common infrastructure for computer vision applications and to accelerate the use of machine perception in the commercial products.
Caffe: Is a fast open framework for deep learning. Caffe can be used to construct CNNs for image classification and works best on GPU.
CNTK Library API – From C++ and Java you can access the CNTK model evaluation facilities. CNTK core computational, neural network composition & training, as well as model evaluation facilities are exposed through Python and C++. There is also CNTK Python API consists of abstractions for model definition and compute, learning algorithms, data reading and distributed training.
TensorFlow – If you’re developing AI apps for desktop, mobile or server side then it is likely one of the first AI frameworks you’ll encounter. It’s from, Google. Open source. It’ll work on C++ and Python to run numerical computations on data flow graphs. Tensoflow will run on a CPU (if you like crazy) or GPU.
Microsoft Cognitive Tool Kit – Previously known as CNTK, makes use of on C++ like it does C#/.NET, Python, Java. An open source deep-learning toolkit from MS.
MLPack – is a fast, flexible machine learning library, written in C++, for fast, extensible implementations of machine learning algorithms. mlpack provides these algorithms as simple command-line programs, Python bindings, and C++ classes which can then be integrated into larger-scale machine learning solutions.
Java based machine learning packaged and what they do.
Apache Mahout™ project’s goal is to build an environment for quickly creating scalable performant machine learning applications.
MALLET – MALLET is a Java-based package for statistical natural language processing, document classification, clustering, topic modeling, information extraction, and other machine learning applications to text.
Deeplearning4j – Deeplearning4j aims to be cutting-edge plug and play, more convention than configuration, which allows for fast prototyping for non-researchers. deeplearning4j is the first commercial-grade, open-source, distributed deep-learning library written for Java and Scala. Integrated with Hadoop and Spark, DL4J is designed to be used in business environments on distributed GPUs and CPUs. Skymind is its commercial support arm.
H2O – H2O makes it possible for anyone to easily apply machine learning and predictive analytics to solve today’s most challenging business problems.
MLib – MLlib is Apache Spark’s scalable machine learning library. Ease of Use Usable in Java, Scala, Python, and R.
Weka3 Data Mining Software in Java – Weka is a collection of machine learning algorithms for data mining tasks. The algorithms can either be applied directly to a dataset or called from your own Java code. Weka contains tools for data pre-processing, classification, regression, clustering, association rules, and visualization. It is also well-suited for developing new machine learning schemes.
Datumbox API – offers a large number of off-the-shelf Classifiers and Natural Language Processing services which can be used in a broad spectrum of applications including: Sentiment Analysis, Topic Classification, Language Detection, Subjectivity Analysis, Spam Detection, Reading Assessment, Keyword and Text Extraction and more. All services are accessible via our powerful REST API which allows you to develop your own smart Applications in no time.
Python based machine learning packaged and what they do.
Spacy – Python package for natural language processing (NL). It comes with fundamental features such as tokenization, tagging, dependency parsing, entity recognition, word vectors, and it’s incredibly fast and trainable. I’m not affiliated with them, just a happy user.
PyBrain – is a modular Machine Learning Library for Python. Its goal is to offer flexible, easy-to-use yet still powerful algorithms for Machine Learning Tasks and a variety of predefined environments to test and compare your algorithms. PyBrain is short for Python-Based Reinforcement Learning, Artificial Intelligence and Neural Network Library. In fact, we came up with the name first and later reverse-engineered this quite descriptive “Backronym”.
Cython – is an optimising static compiler for both the Python programming language and the extended Cython programming language (based on Pyrex). Cython gives you the combined power of Python and C to let you write Python code that calls back and forth from and to C or C++ code natively at any point. It makes writing C extensions for Python as easy as Python itself
NLTK – NLTK the undisputed python Natural Language Toolkit is a platform for building Python programs to work with human language data. It provides easy-to-use interfaces to over 50 corpora and lexical resources such as WordNet, along with a suite of text processing libraries for classification, tokenization, stemming, tagging, parsing, and semantic reasoning, wrappers for industrial-strength NLP libraries, and an active discussion forum to get started.
OpenAI Gym BETA – OpenAI gym for Python is a toolkit for developing and comparing reinforcement learning algorithms. Open source interface to reinforcement learning tasks. The gym open-source project provides a simple interface to a growing collection of reinforcement learning tasks. You can use it from Python, and soon from other languages.
Gensim – Gensim (gensim = generate similar) a powerful and fully scalable package for topic modeling and statistical text analysis. Gensim is a robust, efficient and hassle-free piece of software to realize unsupervised semantic modelling from plain text. Gemsim stands in contrast to brittle homework-assignment-implementations that do not scale on one hand, and robust java-esque projects that take forever just to run “hello world”.
scikit-learn – Is is for Machine Learning in Python. Its a simple and efficient set of tools for data mining and data analysis. Its open andaccessible to everybody, and reusable in various contexts. Built on NumPy, SciPy, and matplotlib Open source, commercially usable – BSD license
Keras: The Python Deep Learning library Keras is a high-level neural networks API, written in Python and capable of running on top of either TensorFlow, CNTK or Theano. It was developed with a focus on enabling fast experimentation. Can be considered less of a pure machine learning framework and more of a high level abstraction layer for easy configuration of most machine learning frameworks.
Theano – Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. It goes head to head with TensorFlow, and is a powerful Python library for numerical operations on a GPU.
Lasagne for Python – Lasagne is a lightweight library to build and train neural networks in Theano. Be advised that Lasagne is a work in progress, so if you are in the mood and you are of the Open source mind set your input is welcome. The available documentation is limited for now.
NetworkX – NetworkX is a High-productivity software for complex networks. Its for Python for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks. Its best put to use with its graph features that include Data structures for graphs, digraphs, and multigraphs, standard graph algorithm and Network structure and analysis measures.
PyMC3 – PyMc3 is a Markov chain Monte Carlo sampling toolkit. It oriented towards the world of Probabilistic Programming in Python: Bayesian Modeling and Probabilistic Machine Learning with Theano http://pymc-devs.github.io/pymc3/