Analytics is a multidimensional field that encompasses mathematics, statistics and machine-learning techniques to find meaningful patterns and knowledge in recorded data. In
today's day and age when all types of data are available, it is imperative that we use this data to make better fact-based decisions. The analytics modules within Sigma Magic
are built on top of the R platform. While R is a very powerful open source platform, it is also relatively hard to understand and use since it requires learning the R Script
and programming expertise. Sigma Magic brings in some of these powerful tools to you in a easy to use manner.
This template can be used to pre-process the data prior to using the analytics functionality. For example, you can use this template to handle missing values,
convert text columns to numeric columns, and center/scale numeric columns.
Clusters - Hierarchical
This template seeks to build a hierarchical cluster for a given data set. Typically used for continuous data where similar observations are grouped into
Clusters - K-Means
K-means aims to cluster observations into K clusters where each observation is assigned to the cluster for which it is the closest. Typically used
for continuous data to group similar observations into one bucket.
Correspondence analysis is a graphical way of displaying the data to determine possible relationships between row and column variables of a contingency
Decision Tree - Conditional
Use significance testing procedures to determine a recursive split of a dependent variable and develop a decision tree. The decision tree can be used to
predict future values of the dependent variable based on the developed tree.
Decision Tree - Recursive
Uses a recursive partitioning technique to arrive at the decision tree. This employs information measures to determine the best split of the tree. The
decision tree can be used to predict future values of the dependent variables.
This template can be used to build a model for different groups (where the groups are known upfront). The objective of the model is to be able to predict which
group a new observation belongs based on certain parameters.
Factor Analysis - Exploratory
This template can be used to create a determine which factors can be combined together in order to express a large number of variables into a parsimonious
set of factors.
Generalized Linear Model
Generalized linear models are a generalization of ordinary least squares approach to build more complex real-world models where the linear relationship between
the independent and dependent variables are no longer valid. It also allows response variables to have error distribution that are not normally distributed.
KNN analysis is a classification technique that determines the classification of data sets based on the distance from other data points in the set. It can
recognize patterns in data without requiring an exact match with stored data sets.
Naive Bayes is a classification technique that uses probability theory to perform the classification. It is a simple and fast algorithm that can be trained to classify
patterns involving thousands of attributes and hence typically used for text mining applications.
Neural Networks provides a linear and non-linear classification algorithm that is made up of a number of simple, highly interconnected processing elements, which
processes information by their dynamic state response to external inputs.
Random Forest is an extension of the single classification trees. A Random Forest grows many classification trees to classify a new object and each tree gives a
classification and forest chooses the classification having the most votes over all trees in the forest.
XGBoost (Extreme Gradient Boosting) is an optimized distributed gradient boosting library that converts weak learners into strong learners. It has many powerful
features to perform fast analysis with improved accuracy.