This module is designed to provide students with an introduction to the use of data analytics tools on large data sets including the analysis of text data. The module will begin by discussing the principles of text-mining and big data. The module will then discuss the techniques that can be used to explore large data sets (including pre-processing and cleaning) and the use of multivariate statistical techniques for supervised and unsupervised learning. The module will conclude by considering several data mining techniques.
Syllabus: What is "big data"? What is text mining? Exploratory data analysis for large datasets, and pre-processing and cleaning; Multivariate statistical analysis (both unsupervised, e.g. factor analysis or principle component analysis, and supervised, e.g. linear discriminant analysis); Data security; Data mining including techniques such as classification trees, neural networks, clustering, text analysis or network analysis.
30 contact hours of computer-based workshops
120 hours of private study
Total number of study hours: 150
100% coursework
Aggarwal. Data mining: the textbook (2015). Springer.
Han, Kember and Pei. Data Mining: Concepts and Techniques. 3rd Edition (2013). Morgan Kaufmann.
Friedman et al. The Elements of Statistical Learning (2009). Springer.
Hand, Mannila and Smyth. Principles of Data Mining (2001). MIT Press.
Silge and Robinson. Text Mining with R: A Tidy Approach (2017). O'Reilly.
See the library reading list for this module (Canterbury)
The intended subject specific learning outcomes.
On successfully completing the module students will be able to:
1 demonstrate knowledge and critical understanding of the underlying concepts and principles related to the exploration and analysis of different types of large datasets;
2 use a range of established techniques with a reasonable level of skill to access, explore and pre-process large datasets and to analyse using multivariate statistical and
data mining techniques;
3 make appropriate use of IT tools for accessing and analysing large datasets, and for presentation of the results of these analyses, both in written and other forms.
The intended generic learning outcomes.
On successfully completing the module students will be able to:
1 make effective use of IT facilities for solving problems;
2 demonstrate the skills needed to work and communicate in a group, including an understanding of the roles of different individuals within a team;
3 communicate straightforward arguments and conclusions reasonably accurately and clearly;
4 manage their own learning and development;
5 communicate technical and non-technical material competently;
6 demonstrate critical thinking skills.
University of Kent makes every effort to ensure that module information is accurate for the relevant academic session and to provide educational services as described. However, courses, services and other matters may be subject to change. Please read our full disclaimer.