Biological Databases Repository

Bioinformaticians need access from any compute node to international reference databases recording biological resources such as protein or gene sequences and associated data, protein structures or complete genomes. These databases are annually referenced in an annual “Database” issue of the scientific journal Nucleic Acids Research. The 2011 edition lists 1330 carefully selected molecular biology databases.

We have built a virtual appliance that acts as a proxy between the internet where all the reference databases are published and the cloud instances that will compute the bioinformatics analyses. To import and maintain the required biological databases, we have used the BioMaJ system developed in France by the RENABI network. Once the property files are installed for the selected databases, BioMaj regularly checks if some bases need to be updated and stores the data in files organized from a root directory `/biodb'. We have also configured a read-only NFS export of this root `biodb' to all the bioinformatics computing machines of the StratusLab reference cloud. For those reasons, it is very important that this virtual appliance has high-availability feature and is being kept running even if the StratusLab physical node crashes.

Although NFS sharing may not be efficient at a large scale, it is needed by some bioinformatics applications like BLAST or FastA that require a standard POSIX local access to the flat-file databases used as reference for the computational analysis. A promising perspective would be to have an EBS-like volume on the StratusLab cloud that the “biological databases repositor” instance will mount in a read-write mode to install and update the databases. And that the “bioinformatics comput” node instances will mount in a read-only mode to make the bioinformatics tools connected to the reference data. Having an EBS-like system will also help to solve the demand of such a central repository by providing efficiently terabytes of shared storage.

Usage

To import and maintain the required biological databases, we have used the BioMaJ system developed in France by several RENABI platforms (http://biomaj.genouest.org). BioMaj has software dependancies mainly on perl, ant, java, httpd, tomcat6 and mysql-server. Within the BioMaj system, we filled a properties file with the related parameters for each database we want to install and keep updated. Once the property files are installed for the chosen databases, BioMaj regularly checks if some bases need to be updated and stores the data in files organized from a root directory `/biodb'. We have also configured a read-only NFS export of this root `biodb' to all the bioinformatics computing machines of the cloud. For those reasons, it is very important that this virtual appliance has high-availability feature and is being kept running even if the StratusLab physical node crashes.

To access biological databases on StratusLab cloud, you can use the running instance we have deployed and we are maintaining up-to-date, or you can deploy your own instance of the appliance.

StratusLab reference biodata instance

The reference Biological Databases appliance is available at http://62.217.122.229:8080/BmajWatcher. You can go to this URL and see which biological databases are maintained up-to-date on the reference instance.

These databases are made available to your own virtual machines through NFS mount from 62.217.122.229:/biodb.

Your own biodata instance

You can also deploy your own instance of the Biological Databases appliance. It is available from the StratusLab appliances repository under the 'bio/data' sub-directory. Once running, you can access the biomaj Web interface at http:<your-vm-ip>:8080/BmajWatcher. On the main page you can connect with the user 'biomaj' and password 'biomaj2011', and then manage your databases as recommend by the biomaj user guide (available onthe BioMaj Web site at http://biomaj.genouest.org/?page_id=31 ).

  • Bookmark at
  • Bookmark "Biological Databases Repository" at del.icio.us
  • Bookmark "Biological Databases Repository" at Digg
  • Bookmark "Biological Databases Repository" at Reddit
  • Bookmark "Biological Databases Repository" at Google
  • Bookmark "Biological Databases Repository" at StumbleUpon
  • Bookmark "Biological Databases Repository" at Facebook
  • Bookmark "Biological Databases Repository" at Twitter