PAMA – Progressive Analysis Methodology & Algorithm

The following article is an idea for my research.

Abstract

Pattern recognition is a branch of Machine learning that focuses on recognition of patterns and regularities in data, although in some cases, it is considered to be nearly synonymous with machine learning.

Source: From WikiPedia, http://j.mp/PatRecog

This article is an early vision for understanding and recognizing the right algorithm towards identifying the available patterns with the help of parameters that are exposed by the data points of any given situation.

1. Introduction

The challenge in this theory is identifying the patterns that are evolving form the enormous amount of data for any given time. Not just that, identifying the right pattern from these possible combinations and executing the analysis with a common methodology.

The success of this methodology depends on the algorithm that is used to analyze the available pattern. The crucial situation is to match with the previous data points as well as the current data points.

To perform analysis on any given number of data points, there has to be a baseline point for the given data. To analyze and recognize the pattern, the baseline plays a key role in this methodology. Very often, these data points will express some similarity within them. Understanding these similarities and preparing a pattern is the exact definition of the behavior of the data points.

Huge amounts of documents and research papers available on Pattern recognition, not on Pattern Analysis and the need for an algorithm.

2. Problem Statement

Humans has special capability and can quickly identify the patterns around their surroundings with the intellect that is possessed within the neuron system. Whereas machines are purely algorithm based and function with the logic that is fed at some point of time. Machine learning happens only when there is enough amount of data and the right baseline for this data.

2.1. Dimensional Model

At times, 3 data points is sufficient for defining the baseline, in plotting a linear curve. clip_image001
In this example, these three points give us a pattern when they are analysed with X-Y axis graph.

clip_image002
This graph will give us lot of data. But, if we have to understand the pattern, the angle of slope between these data points is enough to identify the pattern in these data points. Once the pattern is identified, the distance between any two data points will help in predicting the next data point as well as a straight line.

Thus, it is very clear, a common methodology like this and an algorithm to predict the pattern will help the machine to learn as the data accumulates over the period of time. In the current digital world, we have enormous amount of data that is available for all kinds of analysis.

The missing point is the right algorithm to analyze this data as well as a proven methodology to make the machines to learn from the past data and continue to learn with the on stream flow of the present data.

2.2. Machine learning thru Pattern analysis

Any machine learning is materialized through several theories and research on the available data. These theories are put into reality with the past available data as well as currently streaming of data over all means of available sources. These sources can be using the medical equipment’s, personal gadgets, webservers, social networks, network of connected systems.

Hence, any machine learning is classified as rule-based algorithm. These rules are the identified methodology for these data points and the base line is shifted with the variation of these data points over the period of time region.

If we divide a region of a space into regular cells, then the number of such cells grows exponentially with the dimensional of space. The problem with an exponentially large number of cells is that we need an exponentially large quantity of training data in order to ensure that the cells are not empty [[1]]. The key idea over the large quantities of data is the training that is embedded onto the machines that process these data. The success of this analysis is in identifying the description and deeds expressed of these large quantities of data.

These large quantities of data will not yield any benefit if they are not processed with right algorithm and they are defined with reference to their boundaries as well as the measurement units. The level or scale of measurement depends on the properties of the data, it’s very important to establish the scale of measurement of the data to determine the appropriate statistical test to use when analyzing the data [[2]].

3. Problem interpretation

Any data that is analyzed has to be based on the baseline of its current state. In most cases, the baseline can be between the central tendency and the sum of central tendency & the quarter length of the current series. The relation among these data points can also be analyzed by the regression analysis.

clip_image004

Where

  • bl is baseline pointer 
  • n is natural set of the data 
  • l is the length of the data 
This definition of this baseline pointer is the initial proposal and interpretation before the research on this concept.

This base line is identified with the repetition of values over the same intervals. Once the first repetition of these values are found, then the base line can be prepared from that first occurrence of the repeated data points.

3.1. Baseline Justification

When Sine curve is plotted with negative values as well as positive values, it gives a clear reflection of mirror image for all values.

clip_image006

The same when plotted with the repeated data points, a graph can be plotted as below. clip_image008

Another example in understand the common behavior of the data points and identifying the baseline point is the Electrocardiography of human.
clip_image010
Source: ECGPedia.org

All such data points produce data that is meaningful and help to understand the behavior. These small amounts of data that can be calculated over a short span of measure. Whereas the challenge is not with small set of data and with short span of measure.To understand or to analyze the data points, it is very much important to start from begin and do the understanding mechanism regressively.

Thus, as the data progresses the learning becomes an asset for the algorithm and the analysis becomes easy.  Hence, the progressive approach methodology and an algorithm.

3.2. Regressive Study

While understanding data points, it is much necessary to understand the properties of the data points. Each data property should be identified with a finite set of variables that are very common in all these data points. Understanding these variables and selecting them is nothing but a particular form of model selection [[3]]. Once the data points are identified with the common data properties, the processing methodology will evolve. A single data property and related data points makes clear the usage of Linear Regression. When nonlinear combination of these parameters depend on one or more independent data points, a nonlinear regression model can be applied.

But, the choice of these models can be applied by the common algorithm that is learning the data points progressively. The input for this algorithm is predictive parameter identification along with the data behavior. Hence, a step wise regression has to be performed on all the data points.

4. Conclusion

While working with machine learning, it is very important to have a methodology and a set of algorithms for analyzing the pattern and form a basis for the pattern recognition.  There were lot of known pattern recognition solutions were available, they are all prepared on a paired set of variables and with a prescribed data rules. The need is with common guidelines as well as reusable set of variables is very much a necessity for machine learning and predictive analysis.


[1] Chapter 1.4. The Curse of Dimensionality, “Pattern Recognition and Machine Learning” by Christopher M. Bishop

[2] Chapter 3. Defining, Measuring and Manipulating Variables, Scales of Measurement, “Research Methods and Statistics, A critical thinking approach” by Sherri L. Jackson

[3] Overfitting in Making Comparisons between Variable Selection Methods, by Juha Reunanen for the Journal of Machine Learning Research 3 (2003) 1371-1382   

Comments

Popular posts from this blog

Network Intrusion Detection using Supervised ML technique

Keep the system active, to avoid the auto lock

Common mistakes by Interviewer