Information about

Data Management

Course materials (password protected).

 

Course Organiser: Dr Adrian Shepherd

With the rapid growth in the quantity of biological data, there is a growing need for bioinformaticians to manage data in an efficient and reliable manner. The first half of this module explains how to design, create and query relational databases using the Open Source software MySQL.

In the second half of the module we cover other important data-handling topics: The use of XML for data exchange and for handling poorly-structured data (using the Open Source native XML database management system eXist); techniques for handling large data sets, including data warehousing and data mining; and the emerging scientific workflow paradigm for building database (and other) applications.

Relational databases and SQL
Lecture 1: Introduction to Biological Databases (Adrian Shepherd)
Lecture 2: Data Modelling (Adrian Shepherd)
Lecture 3: Database Design using UML (Adrian Shepherd)
Lecture 4: Creating & Updating a Database Using SQL (Adrian Shepherd)
Lecture 5: Database Queries in SQL (Adrian Shepherd)
Lecture 6: Database Applications Programming using the Perl DBI (Andrew Martin)

Data handling with XML
Lecture 7: Parsing XML in Perl (XML::DOM) (Andrew Martin)
Lecture 8: Native XML Databases (Adrian Shepherd)

Advanced data handling techniques
Lecture 9: Data Warehousing (Adrian Shepherd)
Lecture 10: Mining Large Data Sets (Adrian Shepherd)
Lecture 11: Scientific Workflows (Adrian Shepherd)