Data Management
Course materials (password protected).
Course Organiser: Dr Adrian Shepherd
With the rapid growth in the quantity of biological data, there is a growing need for bioinformaticians to manage data in an efficient and reliable manner. The first half of this module explains how to design, create and query relational databases using the Open Source software MySQL.
In the second half of the module we cover other important data-handling topics: The use of XML for data exchange and for handling poorly-structured data (using the Open Source native XML database management system eXist); techniques for handling large data sets, including data warehousing and data mining; and the emerging scientific workflow paradigm for building database (and other) applications.
Relational databases and SQL
Lecture 1: Introduction to Biological Databases (Adrian
Shepherd)
Lecture 2: Data Modelling (Adrian
Shepherd)
Lecture 3: Database Design using UML (Adrian
Shepherd)
Lecture 4: Creating & Updating a Database Using SQL (Adrian
Shepherd)
Lecture 5: Database Queries in SQL (Adrian
Shepherd)
Lecture 6: Database Applications Programming using the Perl DBI (Andrew Martin)
Data handling with XML
Lecture 7: Parsing XML in Perl (XML::DOM) (Andrew Martin)
Lecture 8: Native XML Databases (Adrian
Shepherd)
Advanced data handling techniques
Lecture 9: Data Warehousing (Adrian
Shepherd)
Lecture 10: Mining Large Data Sets (Adrian
Shepherd)
Lecture 11: Scientific Workflows (Adrian
Shepherd)