top of page

Topics

 

The course in slides format

The slides in this page are organised in 4 chapters that cover all the topics in the course  "Basic Skills in Visualising Data and Exploratory Data Analysis Using R”.

  • ​Univariate datasets.

  • Multivariate datasets.

  • Clustering.

  • BiClustering.

​The focus of this course is on both skills and the ideas behind. In other words,  we focus on the questions "How to do it ?" and "why to do it ?" We illustrate all methods/tools/techniques in R and use all available R packages for graphics. 

This is NOT the last version of the course

Univariate datasets

This part of the course presents different visualisation/exploratory methods and tools for univariate data and cover the following topics:

  •  Stem-and-leaf and the 5 numbers summary.

  • Location, spread and shape of a distribution.

  • Graphical and numerical summaries for location, spread and shape.

  • Dotplot, stripplot, histograms and density plot, Boxplot and violin plot, qqplot and qqnormal plots.

Clustering

In this part of the course we focus on clustering of a high dimensional data matrix. We cover the following methods:

  • Hierarchical clustering methods.

  • Clustering trees &  dendrogram.

  • Distance & Dissimilarity measures and  distance matrices.

  • Partitioning methods and K-means clustering.

  • The GAP statistic.

  • Single and multi-data clustering.

Bivariate and multivariate datasets

 

In this part of the course we discuss different visualisation/exploratory methods and tools for bivariate and multivariate data.  We focus on association and correlation structure between variables in the data and visualisation of the trend over time (for both location and spread) for longitudinal data.  In addition, we discuss tools for visualisation of a distribution across a factor levels. Topics covered in this part include:

  • Conditional plots.

  • Scatterplots and scatterplot matrix.

  • Scatterplot smoothers and estimation of trend.

  • Correlation matrix plots.

  • Subject profile plots.

BiClustering

In this part of the course we focus on biclustering of a high dimensional data matrix. We discuss that can be used to detect local patterns in data large data matrix for binary, categorical and continuous data. All examples are illustrated using the biclustGUI package. The following methods are discussed:

  • Local and global patterns in a data matrix.

  • Delta Biclustering

  • Bimax & xMotif

  • The Plaid model.

  • Fabia and MFA.

  • BiBitR.

The slides for multivariate datasets will be available on Q2 2023

Slides (part 2): EDA/VD in multivariate datasets
bottom of page