ExperiBase  
   

A Unique Opportunity in Biological Information Object Standards

C.F. Dewey, Jr.1,2 and Shixin Zhang1

1Department of Mechanical Engineering
2
Division of Biological Engineering

Massachusetts Institute of Technology, Cambridge MA USA

Introduction

Over the past several years, the explosive growth of biological data generated by new high-throughput instruments has literally begun to drown the biological community.  There is no established infrastructure to deal with these data in a consistent and successful fashion.  This paper discusses the opportunity to develop a new informatics platform to handle a large subsection of the experimental protocols that currently exist.  A consistent data definition strategy is outlined that will handle gel electrophoresis, microarrays, fluorescence activated cell sorting, mass spectrometry, and microscopy within a single coherent set of information object definitions.

Methods

Several important experimental techniques in contemporary biology have been used to create a single composite schema.  The results bear a striking relationship to the DICOM standard of 1993 that provides information object definitions of all of the major medical imaging modalities (MR, CT, US, XA, NM, VL, CR, and Waveforms).  The de novae information object definition we developed for gel electrophoresis turned out to be very similar to the existing MAGE-OM information model for microarrays.  Further investigation revealed that similar object definitions characterized other experimental biology methods as well. 

Results

A first implementation of this work is called ExperiBase.   It can store and query data generated by the leading experimental protocols used in biology within a single database.  ExperiBase also has provisions to store derived data from analysis as a part of an expanded definition of the information object.  Transport of the raw data and analytical results between ExperiBase and external analysis packages uses web-based network technologies and XML representation of the data itself.  The information object model is used to define the form of the XML data document.  Import and export of data in spreadsheet format is also supported.  ExperiBase has been ported to three leading database platforms: Oracle, DB2 and Informix.  There are no platform-specific dependencies.

Discussion

We have submitted this work to the Interoperable Informatics Infrastructure Consortium (the “I3C”) to assist in developing approved methods and to promote international standards.  Participation by standards organizations such as OMG is encouraged and anticipated.

Conclusion

The medical and biological communities are invited to participate in this effort to develop international standards to handle the massive data collections that are now being created in every pharmaceutical company and every academic biology laboratory.  Having consistent formats for the information objects will greatly speed the development of analysis tools

Acknowledgements

This research was supported by the Defence Advanced Research Projects Agency and the Pacific Northwest National Laboratories (Department of Energy).

References

  1. Chris F. Taylor, et al. A systematic approach to modeling, capturing, and disseminating proteomics experimental data. Nature biotechnology, March 2003, Volume 21, page 247-254. http://pedro.man.ac.uk. [Proposed ontology for mass spectrometry and 2D gel data that has been used as the basis of the ExperBase definition for these experimental methods.]

  2. Gollub J, Ball CA, Binkley G, Demeter J, Finkelstein DB, Hebert JM, Hernandez-Boussard T, Jin H, Kaloper M, Matese JC, Schroeder M, Brown PO, Botstein D, Sherlock G. The Stanford Microarray Database: data access and quality assessment tools. Nucleic Acids Res 2003 Jan 1;31(1):94-6. http://genome-www.stanford.edu/microarray [Proposed ontology for Stanford microarray data that has been used as the basis of the ExperBase definition for this experimental method.]

  3. MAGE-OM, Gene Expression RFP [Proposed ontology for microarray data that has been used as the basis of the ExperBase definition for this experimental method.]

  4. Lao H. Saal, Carl Troein, Johan Vallon-Christersson, Sofia Gruvberger, Åke Borg and Carsten Peterson. BioArray Software Environment: A Platform for Comprehensive Management and Analysis of Microarray Data. Genome Biology 2002 3(8): software0003.1-0003.6. http://base.thep.lu.se/. [Proposed ontology for BASE data that has been used as the basis of the ExperBase definition for this experimental method.]

  5. CDISC laboratory data interchange standard, http://www.cdisc.org/pdf/Lab1-0-0-Specification.pdf. [Proposed ontology for user and laboratory data that has been used as the basis of the ExperBase definition for these elements.]

  6. R.C. Leif, S.H. Leif, S.B. Leif, "CytometryML, An XML Format based on DICOM for Analytical Cytology Data ", accepted for publication Cytometry (2003). http://www.newportinstruments.com/cytometryml/cytometryml.html. [Proposed ontology for flow cytometry data that has been used as the basis of the ExperBase definition for this experimental method.]

  7. Informatics and quantitative analysis in biological imaging.
    Swedlow JR, Goldberg I, Brauner E, Sorger PK.
    Science . 2003;300(5616):100-102. http://www.openmicroscopy.org/index.html. [Proposed ontology for optical microscope image data that has been used as the basis of the ExperBase definition for this experimental method.]