Help Manual

Contents






Sigma Magic Help Version 15

Analytics Templates

Analytics is a multidimensional field that encompasses mathematics, statistics and machine-learning techniques to find meaningful patterns and knowledge in recorded data. In today's day and age when all types of data are available, it is imperative that we use this data to make better fact-based decisions. The analytics modules within Sigma Magic are built on top of the R platform. While R is a very powerful open source platform, it is also relatively hard to understand and use since it requires learning the R Script and programming expertise. Sigma Magic brings in some of these powerful tools to you in a easy to use manner.

Pre-Process Data

This template can be used to pre-process the data prior to using the analytics functionality. For example, you can use this template to handle missing values, convert text columns to numeric columns, and center/scale numeric columns.

Clusters > Hierarchical

This template seeks to build a hierarchical cluster for a given data set. Typically used for continuous data where similar observations are grouped into one bucket.

Clusters > K-Means

K-means aims to cluster observations into K clusters where each observation is assigned to the cluster for which it is the closest. Typically used for continuous data to group similar observations into one bucket.

Correspondence Analysis

Correspondence analysis is a graphical way of displaying the data to determine possible relationships between row and column variables of a contingency table.

Decision Tree > Conditional

Use significance testing procedures to determine a recursive split of a dependent variable and develop a decision tree. The decision tree can be used to predict future values of the dependent variable based on the developed tree.

Decision Tree > Recursive

Uses a recursive partitioning technique to arrive at the decision tree. This employs information measures to determine the best split of the tree. The decision tree can be used to predict future values of the dependent variables.

Discriminant Analysis

This template can be used to build a model for different groups (where the groups are known upfront). The objective of the model is to be able to predict which group a new observation belongs based on certain parameters.

Factor Analysis - Exploratory

This template can be used to create a determine which factors can be combined together in order to express a large number of variables into a parsimonious set of factors.

Generalized Linear Model

Generalized linear models are a generalization of ordinary least squares approach to build more complex real-world models where the linear relationship between the independent and dependent variables are no longer valid. It also allows response variables to have error distribution that are not normally distributed.

K-Nearest Neighbors

KNN analysis is a classification technique that determines the classification of data sets based on the distance from other data points in the set. It can recognize patterns in data without requiring an exact match with stored data sets.

Naive Bayes

Naive Bayes is a classification technique that uses probability theory to perform the classification. It is a simple and fast algorithm that can be trained to classify patterns involving thousands of attributes and hence typically used for text mining applications.

Neural Networks

Neural Networks provides a linear and non-linear classification algorithm that is made up of a number of simple, highly interconnected processing elements, which processes information by their dynamic state response to external inputs.

Random Forest

Random Forest is an extension of the single classification trees. A Random Forest grows many classification trees to classify a new object and each tree gives a classification and forest chooses the classification having the most votes over all trees in the forest.

XGBoost Analysis

XGBoost (Extreme Gradient Boosting) is an optimized distributed gradient boosting library that converts weak learners into strong learners. It has many powerful features to perform fast analysis with improved accuracy.