Research

Exploratory and Big Data Analysis

Exploratory and Big Data Analysis

Exploratory Data Analysis (EDA) based on multivariate projection techniques has been extensively employed in many research fields, including social sciences, education, medicine, chemistry and related fields. EDA based on projection models, also known as Multivariate EDA (MEDA), allows the derivation of a set of visualizations that simplify the understanding of complex data. The active interaction with these visualizations leads the analysis to uncover patterns in the data and gain knowledge from them.

The main difference of MEDA with data mining is its interactive nature: the analyst does not solely rely on extensive automatic computations but drives the analysis from what it is seen from the visualizations. In this regard, MEDA follows a similar approach to that of Visual Analytics, but while the former is focused on data processing the latter is focused on the theory of human visualization and interaction.

The MEDA tools are extremely powerful when applied to normal size data, as illustrated in hundreds of applications in a wide range of areas. However, they are harder to extend to the Big Data paradigm than data mining techniques, which require little or none user interaction. The MEDA Toolbox in Matlab, a software initiative I lead, has been one of the first attempts to perform such extension. The MEDA Toolbox is open software available at the Github repository (https://github.com/josecamachop/MEDA-Toolbox). It combines clustering and kernel computations to extend common and new MEDA visualization tools to unlimited numbers of observations. This toolbox has been employed with success in several research and development projects, showing its potentiality to handle very complex data of disparate nature: medical, chemical, biological, computer traffic and security data, etc.