EpiInformatics Working Group - Title
National Cancer Institute

Home

Meetings

Agendas

Minutes

Members

Publications List

BB Journal Club Papers

 

 

Bioinformatics and Computational Biology Roadmap
Grand Challenges

Printer Friendly

Cassatt, NIGMS

From the amino acid sequence of a protein determine reliably its high resolution structure.

Time frame: 5 to 10 years

Givens: Within the time frame of this grand challenge, we can expect to have (1) through the NIGMS funded protein structure initiative, a library of representative structures of families of soluble proteins sufficiently complete that from the sequence of an unknown protein a low resolution structure (chain tracing) can be obtained through homology modeling and (2) computers will have advanced to the point that petaflop machines will be available for the task.

Task: There is sufficient information contained within the amino acid sequence to determine its three dimensional structure. To date scientists have been unable to decipher this code and it is unlikely, without a major unanticipated discovery, that they will be able to do so in the foreseeable future. However, the availability of a low resolution structure that limits the conformational space through which a computer will have to search makes the problem tractable, especially with the expected availability of the new generation of high performance computers. Thus within the time frame of this "grand challenge", the problem becomes tractable.

Impact: The impact of the success of this "grand challenge" will be enormous. Even now, biologists routinely asking questions that can only be answered at atomic resolution. The solution is the arduous task of expressing and crystallizing the protein for structure determination-generally a one-year task with no assurance of success. On a more practical level, the success of this grand challenge will revolutionize the discovery phase of new pharmaceuticals.

Mike Waters, NIEHS
1. Understand the molecular basis of environmentally induced toxicity and human disease.
2. Discovery and functional characterization of genetic variation in environmental susceptibility genes.
3. Integrate environmental and genetic factors into our understanding of the etiology of human disease.
4. Develop tools and assays that translate current research activities into methods suitable for driving public health decisions


Dennis Glanzman, NIMH

Understanding the neural (physiological, biological) basis of cognition. The illnesses for which NIMH takes primary responsibility involve, to a significant degree, errors of cognition -- and the key to finding treatments, cures and preventions is knowing the source of the disorder. It is assumed at present that this may at least be approached through the use of dynamical imaging technologies, e.g., fMRI, PET, etc. The sheer size and complexity of imaging datasets presents a bioinformatics challenge, and an even more elusive goal relates to making sense of this information. Without an adequate theoretical basis, and computational tools tailored to the tasks at hand, we may be hampered by a reliance on the perceptions and analytical capabilities developed over the past century.

The future for understanding complex issues lies in the application of newly evolving analytic tools for framing the interpretation of activity across levels of organization. For example, the 'noise' at one level, such as the stochastics of ion channel opening and closings, may underlie the 'variability' at another, such as gene induction, membrane potentials, and spiking activity in neurons. These dynamics of behavior across scales of time and dimension, and their interaction and predictive power, represent the gold standard of tying together information from molecular, genetic, cellular, systems and whole organism biomedical research.

Huerta, NIMH

Background: Enormous sums have been invested in studying the molecular pathways that underlie cellular physiology and pathophysiology. Such studies are increasingly considered in the context of (and often driven by) sequence information. For sequence data, bioinformatics tools and resources can be used to relate the findings from one study to those of another, often very different (species, cell type, etc.), studies. These approaches have broadened the impact of individual studies, demonstrated the convergence and divergence of these processes, and provided a sense of how pieces of the puzzle might fit together. Yet, these approaches for sequence data represent only an indirect, first step of a new way to understand the relationship of particular molecular pathways to each other, and to specific cellular functions and dysfunctions.

Need: While biomedicine is awash in information about molecular processes, detailed quantitative information, and important qualitative information, is sparse. And, despite the use of sequence-related informatics to fuel this area of research to date, existing informatics tools and resources are not adequate to accommodate or make full use of the more complex and dynamic molecular data. Just as the use of sequence-based lines of inquiry have brought together disparate findings, moving to this next level of sophistication would be a key step toward a molecules-to-systems integrative understanding. This initiative would have important and immediate implications for therapeutic interventions for disease, including drug discovery and drug development, as well as for illuminating basic biological mechanisms.

Goals: The goals of this initiative would be to conduct research to obtain a quantitative understanding of specific molecular pathways (across species, tissue types, and states of health and disease), to develop informatics tools and resources to share and make sense of these data, and to develop computational biology approaches (models, simulations, etc.) to integrate and use these data to push forward discovery in ways not possible solely through actual experiments. Biomedical research could be constrained by focusing on particular molecular pathways (in particular tissues, species, states of health, etc.) that are of interest to multiple ICs (see "Implementation" below), such as receptor-effector signal transduction pathways common to a variety of different cell types, or could simply be conducted in accord with existing research interests of ICs. Quantitative biomedical data of interest would include reaction rates, concentration parameters, etc., of molecular species involved in specific cascades. Also of interest would be qualitative data currently lacking, such as the subcellular, compartmental localization of particular aspects of particular molecular pathways. Informatics efforts would include community-based development of ontologies, semantic relationships, data models, data schemas, etc., as well as the development of informatics tools for data analysis, visualization, etc. Informatics resources would also be developed, including databases, query approaches, data sharing protocols, etc. The computational biology efforts would develop models and simulations that would bring data in from the informatics resources to integrate those data and to serve as dynamic engines of discovery, exploring parameter space and performing virtual experiments to generate new hypotheses to be tested by actual experiments.

Implementation: This initiative lends itself to (but would not require) phased implementation, from a demonstration project to a much broader and larger effort. The demonstration phase might focus on a small number of specific pathways chosen and prioritized on the basis of their ubiquity and known or suspected importance in physiological and pathophysiological processes. Since molecular pathways are fundamentally similar, the knowledge, informatics, and computational approaches developed for these particular "demonstration" pathways could be used in the study of other pathways. Thus, even if this began as a highly focused effort, it could scale up quickly to encompass broader areas. Expansion might be determined on the basis of the interests of different ICs or clusters of ICs, which might, for example, differentially emphasize their support for work on different molecular pathways. Nevertheless, it is essential that this initiative be conducted in a fully coordinated fashion across ICs.

Links to other roadmap activities: This grand challenge has clear scientific links to the following roadmap groups: Building Blocks, Molecular Libraries, and Molecular Imaging. It would likely also relate scientifically to the Structural Biology and Nanotechnology groups, and would relate to virtually all of the roadmap groups that are focusing on the way science is conducted (e.g., Pathways, Multidisciplinary Teams, Public-Private Partnerships, etc.)

Good, NHGRI

1) Establish a comprehensive knowledgebase of human genes and their function that will assist biologists and clinicians to understand genetic contributions to human biology and disease.

2) Develop computational methods to discover which of the millions of genetic variants are associated with specific phenotypes, in order to find genes contributing to common disease and drug response.

3) Develop new structured vocabularies that enable the interchange of information between basic biologists and clinicians. This new vocabulary will help translate the information from basic research to develop new insights into the cause and treatments of human disease.

4) Develop a computer model of a eukaryotic cell that will predict the behavior of the cell.
Corollary: Model the genetic networks and protein pathways in human cells and predict how they contribute to cellular and organismal phenotypes.

5) Train scientists that have the necessary background in a quantitative approach to science to efficiently use computational approaches to advance biomedical research.

6) Not necessarily a Grand Challenge: Enable the prepublication release of small and large datasets to facilitate computational approaches to biology (See Wellcome Trust/NHGRI 2003 workshop on sharing data).


Farber, NCRR

One way to divide computational biology is to distinguish problems involving modeling and simulation from those that involve data. Examples of modeling and simulation involve molecular dynamics, models of collections of proteins that act together to accomplish a biological function, and models of how tissues or organs function. It is not clear to me that this area should be the subject of a grand challenge since improvements in this area are likely to depend on improvements in algorithms and in descriptions of the specific problem of interest. Improvements in software and in access to computational power will have a major impact on this field. NIH should be concerned about improving the quality and availability of software developed to solve these problems.

Problems involving data seem to be a better area for a grand challenge. There are two major problems in this area. The first involves attempts to use large databases to find patterns that can, in turn, make predictions about the behavior of a system. Tools for effective data mining are few and far between. Many biologists have problems using or creating the necessary tools, and many computational scientists have problems understanding the quality of the data in the database. It is not hard to imagine a grand challenge in the development of tools to analyze data.

An additional problem is to combine data measured by different laboratories, or using different instruments. One could certainly imagine that NIH could collect all of this data in a centralized database and make it available to the public. However, there are a number of drawbacks to this approach. The idea of federating small to medium sized databases to allow an investigator to easily access and combine data from different sources seems to be a much better approach. There are certainly major problems in this area that could be the subject of a grand challenge. Among them are the language needed to describe data (defining ontologies or standards) as well as the development of tools to use these ontologies to make the data readily available. In addition to technical challenges, there are serious problems in making sure that investigators get appropriate credit for creating data as well as for interpreting/combining it. Care also must be taken with data from human subjects.


Guo, NIAAA

Generation of an interactive database (or networking databases) that incorporates all (or most) genetic, genomic, functional genomic, and functional information for the purpose of mining the relationships between all cellular components (DNA, RNA, protein, lipid. etc), functional modules, "super-modules", and networks under specified conditions.

Mangan, NIDCR

We are in our infancy in this field… almost every level is going to be a grand challenge as the complexity of the system increases. Start with the ability to quantitatively study and describe biology in the cell, move up to tissue, organs and eventually to the organism.

In the microbiology field, the study of microbes as singular organisms, as well as organisms associated with both human tissues, defenses (innate and adaptive host immune factors) and other bacterial species (biofilms).
Public health benefit: new molecular targets for therapy, early diagnosis, prevention, …

In medicine, the ability to accurately predict how an human will respond to external and internal stimuli based on the person's genetic makeup. (Using sophisticated mathematical modeling of genetic principles … reverse biology) A complete computer representation of the human.
Public Health benefit: (almost unlimited number of possibilities) early diagnosis of disease, disease prevention, individual pharmacology, so forth

Voluminous biological data will be open to all researchers worldwide via the Internet. All life science degrees will have to offer training in how to use and extract information from these databases. Various interfaces and shadow software systems will need to be developed to enable inexperienced researchers and scientists to access the data in meaningful ways.

Some individual challenges:

Storage of huge numbers of data sets
Common terminology between databases
Improved high throughput technologies
Graphic representation of molecular interactions (e.g., protein-protein)
Development of biological principles from existing data
Creation and testing of evolutionary theories based on new data

Liu, NINDS

  • Analyze and interpret the enormously complex datasets obtained from biomedical experiments
  • Integrate our knowledge obtained from different levels of biomedical research at genetic, molecular, cellular, system levels to behavioral and clinical research
  • Model biological systems based on experimental data to generate testable hypotheses that will predict the mechanisms underlying normal and abnormal functions

Rosenberg, NCI

BACKGROUND: In our division (NCI/DCEG) we are beginning to put together large-scale datasets on people, the environment, and the genome to identify causes of cancer. Our studies are population-based and include case-control and cohort designs (including trials). A large study of either type is equal in complexity to a large randomized clinical trial. The informatics needs of a traditional epidemiological study are substantial. With the advent of genomic information from SNPs, microarrays, and the proteome, the complexity of our studies is rising to a new level. I believe our problem is becoming widely encountered at NIH.

Broadly speaking, the biomedical informatics issues we face coalesce around two themes:

1. How do we get there without breaking the bank? The "there" is the point that we have a dataset in hand in a format suitable for analysis.
2. What do we do with the information once we get it? How do we analyze it to obtain meaningful conclusions?

For both issues, the "we" has a dual meaning - each individual investigator and also the scientific community.

I believe the first issue has a specific cause that can be remedied. In my opinion, one underlying problem is that biomedical data has few standards - we live in a world of "data chaos." A simple example should make this point clear. Right now, at least six different ways are commonly used to code the variable sex - as character 'M' or 'F', character 'Male' or 'Female', numeric 0 and 1 for Male and Female, respectively, or the converse, and numeric 1 and 2.

It doesn't matter scientifically how the data are coded - but every time the data changes hands people have to get in the loop to transmit the codes and reformat the data. This is very costly, inefficient, and error prone. It also is increasingly infeasible as we begin to measure vast numbers of genomic variables.

Therefore, I propose the following solution:

Grand Challenge 1: NIH should create a working group to define standard schema for biomedical data, which I propose we call the Data Markup Language or DML.

I outline a proposed overview of this concept below.

With a standard way to represent biomedical data in hand, a number of specific activities would build on the DML standard to create infrastructure that will help scientists make sense of the impending "data avalanche."

Grand Challenge 2: NIH should promote the creation and dissemination of analytical methods compatible with DML. Specifically, NIH should:

  • Develop pilot projects to create a "data warehouse" using DML and browser-based tools to view, subset, manipulate, and download biomedical data to a diversity of formats for analysis.
  • Work with commercial database software companies to build such warehouses.
  • Work with commercial software companies (including the developers of SAS, STATA, S-PLUS, and MATLAB) to make these widely used commercial packages DML compatible. DML compatibility would insure that each package could import and export data using the DML format.
  • Work with public-domain software development groups, including the group developing the R language, to make these packages DML compliant.
  • Develop server-side analysis software that uses the DML standard.

The informatics landscape would be dramatically different in such a world. Basic biomedical information could flow efficiently without costly human intervention and error. Raw data could be disseminated in a standardized format. Each investigator would have access to basic analytical tools via their browsers, and could obtain data for any other software package in analysis-ready format without the need for recoding.

Summary

In a biomedical world with standards for data, we would be better positioned to get correct answers faster. A common format for data exchange would have many advantages. The Data Markup Language (Grand Challenge 1) would provide a standardized way to disseminate "raw" data to meet any current and future requirements to make biomedical data available, and it would create efficiencies for collaborative studies, including consortia, multi-center studies, and complex studies involving extensive laboratory and clinical analysis. DML- compatible analysis tools (Grand Challenge 2) would give each investigator access to a basic set of analytical tools via server-side analysis; foster openness and transparency in the analysis of data; support efforts to validate analysis results, and help independent teams replicate key findings.

DML Architecture - Concept

DML is a set of schema, not a particular software product. It is not designed to replace existing database or analysis software, but rather to help biomedical investigators make better use of existing software and algorithms. Each schema identifies a class of variables by defining properties of the data along with the metadata that gives meaning to the values. For example, a standard definition of the variable sex could be a categorical with value 1 corresponding to the label 'male' and 2 corresponding to 'female'.

A central DML schema is a dataset. A dataset is an aggregation of variables measured on a set of cases, plus meta-data about the dataset including labels that distinguish the cases.

DML would contain a set of 'generic' schema for statistical variables, including but not limited to continuous, categorical, and string variables. A minimum of seven types of statistical variables would be needed to support the basic types biostatistical data analysis (not defined here).

DML would include a set of variables that could be called 'core variables for human subjects research' or HSC variables. HSC variables would include
Schema for: sex, age, race/ethnicity, date of birth, time, date, SNPs, each component of the complete blood count or CBC, specific CLIE assays, etc.

Each schema would define the relevant data and meta-data. A schema for a SNP, for example, could include gene name, variant, Unigene ID, assay, primers, reaction conditions, etc., plus a convention that 0, 1, or 2 meant homozygous for minor allele (defined by the meta-data), heterozygous, and homozygous for the variant allele, respectively. To transmit a SNP variable, one would transmit not only the raw data (a series of 0's, 1's, and 2's) but also the meta-data.

Whenever possible DML would adopt existing standards, i.e. standards for image data, emerging standards for microarrays, etc.

The combination of generic statistical variables, HSC variables, and existing standards would provide containers for every type of data - everything could be stored, processed, and transmitted.


Lyster, NIBIB

(1) An overview of grand challenge efforts in Computational Biology is presented in the article "Trends in Computational Biology: A Summary Based on a RECOMB Plenary Lecture, 1999" by John Wooley, J. Comp. Biol, 6, 3/4, 1999. Wooley is a good general outside resource http://medicine.ucsd.edu/pharmaco/jwooley.html. In addition to this, the following pages describe grand challenges in computational structural biology http://cbcg.lbl.gov/ssi-csb/Program.html

(2) (a) Analysis, data mining, epidemiology, informatics, and knowledge bases that relate genotype to phenotype:
In order to facilitate analysis and understanding of the relationship between genomic and phenotypic data, NIH needs to foster research and tools for data analysis, data mining, epidemiology, informatics, and knowledge bases. Most of the grand challenge here is analytical tools and methods as well as information technology guided by scientific judgment; there is little high end computing. Suggested outside expert for consultation: Russ Altman, Pharmacogenomic network (PharmGKB)
http://smi-web.stanford.edu/people/altman/

(2) (b) Population to Ecosystem modeling:
This involves modeling the interaction with population and the ecosystem. Hence it involves elements of biostatistics, epidemiology, informatics as well as earth system modeling approaches such as climate modeling. This is definitely somewhat speculative, and maybe outside of the near-term goals of NIH research. It is not clear if there are many legitimate outside experts(?). Try Anthony Busalacchi http://www.essic.umd.edu/

(3) Multiscale physiologic modeling:
This involves modeling systems whose spatial and temporal scales vary from subcellular to organismal. This covers all organ systems. As a case study, for whole heart electromechanics you need a minimum of about 50 million grid points, each grid point has 100 variables, and a 1ms time step. This requires 5 teraflop/s (5 x 10^12 floating point operations per second). In addition to this high end computing requirement, much research is needed in the development of scientific algorithms to handle subgridscale and other unresolved processes (e.g., mechanics, ion channel responses, G-coupled protein receptors, tyrosine kinase pathways, and protein expression changes). Algorithm and software development is at least as important as hardware development for speedups: Suggested outside experts for consultation: Andrew McCulloch http://cardiome.ucsd.edu/ Rai Winslow http://www.cmbl.jhu.edu/ and Peter Hunter http://www.esc.auckland.ac.nz/People/Staff/Hunter/

(4) Computer Assisted Surgery:
Computer Assisted Surgery may include computational elements from Multiscale Physiological Modeling (item 3) with the emphasis on Real Time computation update. This is expected to scale to Petaflop/s (10^15 floating point operations per second). In addition there are considerable infrastructural and networking issues to be dealt with such as the time to feed segmented image information back to the models: Suggested outside expert for consultation: Ron Kikinis http://splweb.bwh.harvard.edu:8000/pages/ppl/kikinis/

Twery, NHLBI

Simulation of Biological Systems: Genome to Function
Current biological approaches based on characterizations of molecular, cellular, and organ processes in one or two dimensions must be greatly enhanced to collect and interpret data with complexity approximating that of physiological interactions. Teams of biologists, informaticists, and computational biologists will be needed to advance competitive strategies for close collaboration; systematic large scale collection of data; models to visualize and interpret the cellular behavior as a function of component pathways; elucidate pathophysiological abnormalities; and provide plans for full integration of project data and models with other teams in the Grand Challenges program. The heart, lung, blood, and sleep fields offer rich opportunities for pioneering the development of new standards for physiological modeling. Detailed models will be useful in predicting responses to experimental and environmental manipulations challenges, and testing potential therapies related to cell function, blood flow, lung ventilation and particle deposition, intensive patient managment, and sleep disorders.

The Translation of Clinical Research to Clinical Practice
The NHLBI has vigorous programs on the development and assessment of treatments, preventions, and diagnostics. Fundamental research in genetics, genomics, proteomics, tissue engineering, instrumentation, cell and developmental biology, epidemiology, and clinical trials also contribute an abundance of data, resources, and tools that could be applied to clincial application. Biomedical informatic approaches are needed to enhance knowledge building through improved clinical trial data management tools allowing data-sharing and analysis across studies, and mechanisms to facilitate quality control. Bioinformatic and computational approaches facilitating the integration of diverse information from clinical and basic research will help to identify potential targets for therapy and enhance the overall ability to translate fundamental research findings to clinical application.

Virtual Reality Systems for Research and Medicine
There is a need for virtual reality interfaces that emulate biological systems for remote robotic delivery of surgical care; virtual experimental research; and training of investigators and medical caregivers. Semi-autonomous simulations of biological systems are essential to enhance the quality of the interface, improve reliability, and decrease the amount of data transferred. An in depth systematic understanding of underlying biological systems and tissue mechanics will be needed to model these processes for application in virtual reality interfaces. New instrumentation and informatic capabilities will also be needed to monitor the tissue response to manipulations and dynamically adjust the emulation as needed.



Suggested Consultants

Lindberg, NLM

David Botstein, PhD Princeton U
Ted Shortliffe, MD, PhD, Columbia P&S

Cassatt, NIGMS

Bioinformatics:
Phil Bourne-UCSD
Shankar Subramaniam-UCSD

Computational Biology:
Michael Levitt-Stanford; one of the broadest thinking computational biologist I know. Focuses on proteins
Michael Klein-U Penn; Interfaces molecular dynamics with high end computers
Mel Simon-Cal Tech; Cellular Modeling
Stan Leibler-Rockefeller; One of the best in the business of cellular modeling

Waters, NIEHS

Trey Ideker, Whitehead Institute, MIT. After his Ph.D., Trey remained in Seattle as a research scientist at the Institute for Systems Biology, working on methods to integrate the large amount of diverse biological data generated by genomic sequencing, mRNA microarrays, and proteomics. He continues this work at the Whitehead Institute for Biomedical Research as Pfizer Fellow of Computational Biology. Ideker also serves on the advisory board of Genstruct, has served as Bioinformatics Lecturer for ISTR, Inc., and holds several patents in the fields of microarray analysis and systems biology.

John Quackenbush, TIGR. In January 1997, John Quackenbush joined the faculty of The Institute for Genomic Research (TIGR) in Rockville, MD. He is currently an Investigator at TIGR, where he leads a number of research projects in DNA microarray analysis and bioinformatics. He is currently working on an analysis of gene expression in rodent models of human disease, human colon cancer metastasis and in Arabidopsis thaliana.

Mark Miller, San Diego Supercomputer Center. Mark Miller is Program Coordinator of Integrative BioSciences at the San Diego Supercomputer Center (SDSC) -- a research unit of the University of California, San Diego, and the leading-edge site of the National Partnership for Advanced Computational Infrastructure. SDSC researchers conduct studies in computational science, develop high-performance computing and networking technologies.

Edward Marcotte, UT Austin. Ed Marcotte's research group combines computational and bioinformatics approaches with experimental approaches to study protein function and protein-protein interactions. Using these techniques and information from over 30 fully sequenced genomes, his group was able to calculate the first genome-wide predictions of protein function, finding very preliminary function for over half the 2,500 uncharacterized genes of yeast. Now, with over 80 genomes in hand, they're extending these techniques, as well as asking fundamental questions about the evolution of protein interactions and the evolution of genomes.

HT Banks, NC State University and The Statistical and Applied Mathematical Sciences Institute (SAMSI). Dr. Banks has 30 years of experience in advancing the use of computational mathematics and statistics as tools in the analysis of biological data. Dr. Banks' expertise is in the modeling of response at the level of the tissue/organ. Dr. Banks is also the Associate Director of SAMSI, a national institute whose vision is to forge a new synthesis of the statistical sciences and the applied mathematical sciences with disciplinary science to confront the very hardest and most important data- and model-driven scientific challenges.

Glanzmann, NIMH

Dr. Arthur Toga, Ph.D., University of California at Los Angeles. Dr. Toga is a neuroscientist with extensive experience in computer science and engineering. His professional interests are centered around the development of extensive neuroanatomical tools using MRI and fMRI for the creation of a "probabilistic map" of the human brain. He recently reported that he has successfully localized fMRI signals to within the width of a single cortical barrel (about 300 microns). In addition, he heads the first team to record Optical Intrinsic Signals and fMRI in the same (human) subjects, showing a not-surprising high degree of correlation between these two imaging techniques. http://neurology.medsch.ucla.edu/faculty/TogaA.htm

Dr. Lawrence Abbott, Ph.D., Brandeis University. Dr. Abbott is a theoretical physicist who has worked in neuroscience for the past twenty years. His research involves the mathematical modeling and analysis of neurons and neural networks. Analytic techniques and computer simulations are used to study how different conductances contribute to the electrical characteristics of a neuron, how neurons interact to produce functioning neural circuits, and how large populations of neurons represent, store, and process information. He is especially interested in the mechanism that control the development and maintenance of neural circuits, dynamic properties of large neural networks, and methods used by populations of spiking neurons to represent and process information. http://www.bio.brandeis.edu/faculty01/abbott.html

Dr. Henry Abarbanel, Ph.D., University of California at San Diego. Dr. Abarbanel is a physicist who had 30 years of experience in laser communications before he began studying the dynamics of small assemblies of neurons which have the job of translating sensory inputs into nearly rhythmic output to muscles. He has investigated the oscillations of these neurons individually as well as in subcircuits of the whole system, where he has shown that the oscillations of neurons require some additional slow dynamical process in addition to the usual ion channel dynamics of the Hodgkin-Huxley models. His work is highly theoretical, and computational, and his breadth of knowledge in many fields would make him a valuable member of an advisory team.
http://inls.ucsd.edu/~hdia/

Huerta, NIMH

Dr Peter D. Karp
Director, Bioinformatics Research Group
Artificial Intelligence Center
SRI International
Room EK207
333 Ravenswood Avenue
Menlo Park, CA 94025-3493
Tel: 650-859-4358
Email: pkarp ai.sri.com

Eric Neumann, Ph.D.
Vice President of Informatics
Beyond Genomics
40 Bear Hill Road
Waltham, MA 02451
Tel: 781-890-1199
Email: ENeumann@BeyondGenomics.com

Frank Olken
Lawrence Berkeley National Laboratory
Computational Science Research Division
Bldg. 50B3238
1 Cyclotron Road
Berkeley, CA 94720-8147
Tel: 510-486-5891
Email: olken@lbl.gov

James Schwaber, Ph.D.
Director, Daniel Baugh Institute of Functional Genomics/Computational Biology
Pathology
1020 Locust Street, Room 520 JAH
Thomas Jefferson University Medical College
Philadelphia PA 19107
Tel: 215-503-7823
Email: james.schwaber@mail.tju.edu

Good, NHGRI

Sean Eddy, computational biology at Washington University
David Haussler, computational biology and the human genome browser at U. California Santa Cruz
Judy Blake, ontologies and the mouse genome databases at JAX labs
Rolf Apweiler, databases and data standards, EBI

Farber, NCRR

Ralph Roskies, Pittsburgh Supercomputer Center
Michael Klein, University of Pennsylvania
Larry Smarr, UCSD
Mark Ellisman, UCSD
Peter Arzberger, UCSD
Peter A. Freeman, Assistant Director, CISE, NSF

Guo, NIAAA

Lee Hartwell (U. of Washington) or Andrew Murray (Harvard) - "Modular Biology"
Aravinda Chakravati (Hopkins) - bioinformatics in genotyping
Richard Young (MIT) - Functional Genomics, biology
Eric Lander (MIT) - Sequencing, genotyping, and database
Neil Risch (Stanford) - Statistics in gene mapping and linkage study
David Botstein (Stanford): Functional genomics, gene mapping, biology

Mangan, NIDCR

Leslie Loew (Univ of Connecticut): The Virtual Cell Project; Systems Biology Markup Language- biochemical network models

Richard Young (MIT): Genome transcriptional regulatory networks; Validation of metabolic data using Saccharomyces.

David Gifford (MIT): predictive modeling in biology using voluminous data from high throughput technologies.

Liu, NINDS

Perry L. Miller MD/PhD
Director, Center for Medical Informatics, Professor of Anesthesiology & Molecular, Cellular, and Developmental Biology, Yale University School of Medicine. Dr. Miller served on the Board of Directors of American Medical Informatics; Scientific Advisory Committee and Central Committee on Education of the American College of Medical Informatics (ACMI); and as Session Chair on the Bioinformatics Workshop Planning Committee of National Academies of Science. (http://ycmi.med.yale.edu/people/miller.html)
Phone: 203-764-6715 Email: perry.miller@yale.edu

Mark H. Ellisman PhD
Director, Biomedical Informatics Research Network (BIRN); Council Member, NCRR; Director, NCRR's National Center for Microscopy and Imaging Research at San Diego; Professor of Neurosciences and Bioengineering, Director, Center for Research in Biological Structure UCSD.
(http://medicine.ucsd.edu/neurosci/the-faculty/ellisman.html)
Phone: 853-534-2251 Email: mark@ncmir.ucsd.edu

Terrence J. Sejnowski PhD
Professor, Head, Computational Neurobiology Laboratory, Salk Institute.
Dr Sejnowski is a pioneer in the field of computational neuroscience. (http://www.salk.edu/faculty/faculty/details.php?id=48)
Phone: 858-453-4100 Email tsejnowski@ucsd.edu

John Miller PhD
Professor of Neuroscience and Director, Center for Computational Biology, Montana State University
Dr. Miller served on the President's Information Technology Advisory Committee (PITAC)
(http://www.nervana.montana.edu/people/jpmbio.html)
Phone: 406-994-7332 Email: jpm@cns.montana.edu

Shankar Subramaniam PhD
Professor, Department of Bioengineer, Department of Chemistry, Biochemistry and Biology, San Diego Supercomputing Center, UCSD.
Dr. Subramaniam served on the original committee that developed the BISTI Report in June 1999
(http://www-bioeng.ucsd.edu/research/research_groups/compbio/shankar.html)
Phone: 858-822-0986 Email: shankar@sdsc.edu

David B. Searls PhD
Director, Bioinformatics Division, Genetics Research, GlaxoSmithKline Pharmaceuticals.
Dr. Searls is the first one to use sophisticated linguistic methods in the analysis of DNA sequences. He served on the BISTI Workshop in January 2003
(http://www.gsk.com/press_archive/sb/1996/press_19960610.htm)
Phone: 610-270-4551 Email: david_b_searle@gsk.com

Rosenberg, NCI

It might be very helpful to hear from the CEO of Quintiles, a large premiere contract research organization that conducts clinical trials for industry. It might also be worthwhile to here from Peter Goodfellow, who recently was Senior vice president for discovery and research, GlaxoSmithKline. Dr. Goodfellow was invited to speak at the 2001 Jackson Lab Short Course, where I heard him talk about how well (or not) the big pharma companies are able to make drug discoveries.

Twery, NHLBI

Howard J. Jacob, Ph.D.
Medical College of Wisconsin (MCW)
phone (414) 456-4887; fax (414) 456-6516
Email:JACOB@MCW.EDU
Biographical page - http://hmgc.mcw.edu/laboratories/jacob/jacoblabpage.html

Dr. Jacob is a leading figure in the field of functional genomics. He chairs a consortium of ten NHLBI supported centers in the Program for Genomic Applications, and serves the MCW as Director of the Human and Molecular Genetics Center, and the Warren P. Knowles endowed chair in genetics. Dr. Jacob has led development of a genomic strategies to elucidate disease pathophysiology in rat and human. Specifically Dr. Jacob is elucidating the role of genetic factors in hypertension, cardiovascular disease, cancer, and diabetes. His approach is based on the premise that the onset and later progression of disease may not represent a linear cascade of events linked simply by a chronological exacerbation of the pathological features, and that the genetic factors that determine the severity of the final pathological outcome may be different than those that triggered the disease process.

Christopher R. Johnson, Ph.D.
University of Utah
phone (801) 581-7705; fax (801) 585-6513
Email: crj@sci.utah.edu
Biographical page - http://www.sci.utah.edu/personnel/crj.html

Dr. Johnson is the founding Director of the Scientific Computing and Imaging Institute at Utah and is widely recognized for his application of scientific computing approaches to biomedical problems including imaging problems, adaptive methods for partial differential equations, automatic mesh generation, numerical analysis, large scale computational problems in medicine, and scientific visualization. He currently directs several center programs supported by NHLBI and NCRR to enhance biomedical informatic approaches.

Lenord Zon, Ph.D.
Harvard University
phone 617-355-7707; fax 617-738-5922
zon@rascal.med.harvard.edu
Biographical page - http://cbr.med.harvard.edu/jpitm/mentors/zon.html; http://zon.tchlab.org/

Dr. Zon is President of the International Society for Stem Cell Research, and a Howard Hughes Investigator. His use of genomic and developmental approaches in zebrafish to study hematopoiesis and neoplasia are powerful tools leading to the development of novel interventions and chemotherapeutic agents. Dr. Zon is also an active participant in the zebrafish genome mapping project.

Shankar Subramaniam, Ph.D.
University of California at San Diego
phone (858) 822-0986
E-mail shankar@sdsc.edu

Dr. Subramaniam is a leading member of the San Diego SuperComputer Center faculty and the center program, Alliance for Cellular Signalling, that is pioneering informatic approaches for the collaborative study of cellular pathways in cardiomyocytes and the immune system. His research includes the analysis of protein structure and the development of microarray strategies for detecting genes and pathways.

Perry Miller, M.D., Ph.D.
Yale University School of Medicine
phone 203-764-6715; fax 203-764-6717
Biographical page - http://fondue.med.yale.edu/people/miller.html

Dr. Miller has broad experience ranging from computer-based decision support for public health to computational biology for genomic and proteomic research. He is Director of the Center for Medical Informatics at Yale University School of Medicine, and is widely recoganized as a leader concerned with biomedical informatics research and training. His research includes working on informatic approaches for the study of infectious agents, neurobiology, and cardiovascular disease.

Andrew D. McCulloch, Ph.D.
University of California, San Diego
phone (858) 534-2547; fax (858) 534-5722
Biographical page - http://cardiome.ucsd.edu/Research.html

Dr. McCulloch is Director of the BioNOME Resource, a web-based repository of bio-computational models and observational data, at the San Diego Supercomputer Center.
His approach is to integrate experimental and computational models to investigate the cellular structure of cardiac muscle and the electrical and mechanical function of the heart. He is involved in a broad range of Informatic dependent activities including high-throughput cardiac phenotyping; tissue engineering of the cell microenvironment using microlithography to study cell-matrix interactions; computational modeling applied to fluorescence optical mapping of mechanoelectric action potential propagation; and in-silico modeling of signal transduction pathways related to calcium cycling.

IC Inventories

NIGMS

Databases:

PharmGKB The database associated with the NIGMS pharmacogenomics effort.

HIV RT & Protease Sequence Database. ($280,000)

HIV Protease Database. ($50,000)

Protein Data Bank: Administered by NSF. Co funded by NSF, NIGMS, NLM, and DOE. NIGMS provides about 40% (1.7 million dollars) of the support.

Uniprot: Administered by NHGRI. NIGMS provides $1,000,000 per year.

Databases Associated with other Large Efforts (ca. $750,000 each) :
Glue grants: Alliance for Cellular Signaling
Consortium for Functional Glycomics
Cell Migration Consortium
Inflammation and the Host Response
Protein Structure Initiative
Each of the nine project has associated with it informatics and data management efforts.

Computational Biology/Bioinformatics Research Grants:

(Not part of the center for Bioinformatics and Computational Biology)
Theoretical Studies of Biological Macromolecules: 70 Grants ca. 15 million dollars.
Population Biology: 70 grants ca. 16 million dollars.

Within the Center for Bioinformatics and Computational Biology:

Research Grants:
Bioinformatics: 15; ca. $3,000,000
Computational Biology: 49; ca. $11,000,000

Centers of Excellence: 2; ca. $5,000,000
Precenters
NIGMS Announcement: 2: ca. $400,000
BISTI Precenters: 8; $3,000,000

NIEHS

The mission of the National Institute of Environmental Health Sciences (NIEHS) is to reduce the burden of environmentally associated disease and dysfunction by defining how environmental exposures affect our health, how individuals differ in their susceptibility to these exposures, and how these susceptibilities change over time. The Institute fulfills its mission through three main channels: 1) research conducted by NIEHS personnel on site (intramural research); 2) support of research by groups external to NIEHS (extramural research) and through training grants; and 3) the leadership role in the National Toxicology Program (NTP), an interagency program within DHHS designed to broaden toxicological characterization of environmental chemicals and to develop and validate tests for toxic environmental agents. All three of these research channels make substantial use of and contributions to bioinformatics and computational biology. The studies conducted by the Division of Intramural Research (DIR) are often long term and high-risk basic research efforts and involve unique components, such as epidemiological studies of environmentally associated diseases, and intervention and prevention studies to reduce the effects of exposures to hazardous environments. The Laboratory of Computational Biology and Risk Analysis is a good example of DIR research combining development of laboratory methods for humans and animals with computational/statistical/ mathematical methods to further our understanding of the mechanisms underlying environmental disease and apply these methods in a risk assessment framework. The close collaboration between laboratory and computational scientists in this research group have had direct impacts on the use of mechanistic data by regulatory agencies in understanding and quantifying health risks from a number of important environmental and occupational hazards such as dioxin, butadiene, mercury and phthalates. The Laboratory of Structural Biology is another good example wherein the relationships among the atomic level structures of macromolecules and their biochemical properties, their abilities to interact with substrates and other molecules including those of environmental concern, and their functions in vivo are investigated. This requires an integrated approach wherein x-ray crystallography, nuclear magnetic resonance, mass spectrometry and computational chemistry are combined with biochemical and genetic approaches. DIR scientists are also actively involved in translational research. New advances in cell and molecular biology are being extended not only into molecular medicine (from bench to bedside) but also into disease prevention (from bench to longer, healthier lives). In sum, DIR scientists are involved in research that contributes to our basic understanding of biological and chemical processes, to our understanding of the role of environmental agents in human disease and dysfunction, and to the underlying mechanisms of environmentally associated diseases. The National Toxicology Program is recognized as the most thorough and scientifically comprehensive toxicology testing program in the world. In addition to its mandate to provide critical data for public health decisions, the NTP continuously strives to translate emerging research methods into mainstream use through their testing program. In computational biology, the NTP is developing pharmacokinetic and biochemical models on a routine basis as part of their testing program. In its extramural research program, NIEHS supports bioinformatics and/or computational biology core facilities at the MIT, Harvard, UNC, Oregon Health and Science U (OSHU), Duke, Harvard, Fred Hutchinson Cancer Research Center/UW, Texas A&M, UT (San Antonio), U Cincinnati, and the Mt. Desert Island Biological Laboratory. These bioinformatics and computational cores provide direct support to the extramural Toxicogenomics Research Consortium, the Environmental Genome Project Comparative Mouse Genomics Centers Consortium to develop mouse models to determine functional significance of human DNA polymorphism and the Human DNA Polymorphism Discovery and Characterization Re-sequencing involving functional analysis of polymorphic variants in environmentally responsive genes, the UW-FHCRC Variation Discovery Resource (SeattleSNPs) and the Comparative Toxicogenomics Database (primarily for marine species) at Mount Desert Island Biological Laboratory. In September 2000, the NIEHS created the National Center for Toxicogenomics (NCT). This Center will seek to promote new understanding of the mechanisms of biological responses to environmental stressors, including toxic injury, and to identify biomarkers of exposure and disease that can be used to improve and protect human health. New computational and bioinformatics tools together with global gene, protein and metabolite expression analysis methods will play a significant role in improving our understanding of toxicant-related disease. When combined with information on gene/protein groups, functional pathways and networks, and human genetic polymorphisms, these data will confer new knowledge of gene-environment interactions and human health risks. This information will be captured and continually updated in the Chemical Effects in Biological Systems (CEBS) knowledge base. CEBS will be linked extensively to other databases and to Web genomics and proteomics resources, providing users the suite of information and bioinformatics and computational tools needed to fully interpret global molecular expression datasets.


NIMH

The Theoretical and Computational Neuroscience Research Program supports research investigating the development and application of realistic models for the analysis and understanding of brain function. It focuses on research projects combining mathematical and computational tools with neurophysiological, neuroanatomical, or neurochemical techniques in order to decipher the mechanisms which underlie specific neuronal and behavioral systems. It also supports research projects focusing on understanding the computations made by nerve cells and groups of nerve cells in orchestrating behaviors.

The Neurotechnology Program supports research and development of new technologies and approaches for studying the brain and behavior, including projects in basic and applied informatics. Tools supported include software for analysis of behavior, images, molecular data, etc. Resources supported include databases for gene expression in the brain (and other spatial information), protein structure, genetic sequence data, etc. Basic research supported includes the novel application of mathematical and statistical approaches to advance informatics related to brain and behavioral research. It is the NIMH home for several initiatives that include multiple Institutes and Centers of NIH, including those of BECON and BISTIC.

The Neuroimaging Informatics Technology Initiative is jointly sponsored with NINDS, and provides coordinated service, training, and research to develop and enhance the utility of informatics tools related to functional magnetic resonance imaging used in brain and behavioral research.

The Office on Neuroinformatics plans, directs, coordinates, and supports activities of the Human Brain Project. The Office gives grants that will lead to new digital and electronic tools for all domains of brain and behavioral research. The approaches and technologies studied under this grant funding initiative are being utilized to generate information that is generalizable, scalable, extensible, and interoperable.

NHGRI

The Genome Informatics program supports research in computational biology that will enable the development of tools for sequence analysis, gene mapping, complex trait mapping and genetic variation. These tools include mathematical and statistical methods for the identification of functional elements in complex genomes; the identification of patterns in large datasets (for example, microarray data); and the mapping of complex traits and genetic variations (for example, single nucleotide polymorphisms, or SNPs).

The program also encourages development and maintenance of databases of genomic and genetic data. Of particular importance is the continued maintenance of genome databases that links the genome sequence of model organisms with the biology of the organisms. This emphasis also includes new tools for annotating complex genomes so as to expand their utility. The program also supports the production of robust, exportable software that can be widely shared among different databases in order to facilitate database interoperability. These bioinformatics resources will allow the scientific community efficient access to genomic data, which will enable new types of analyses. The analyses, in turn, will allow for the computer modeling and subsequent experimental validation of the complex pathways and networks that ultimately determine the phenotype of a cell or the causes of many human diseases.

NIMH

The Theoretical and Computational Neuroscience Research Program supports research investigating the development and application of realistic models for the analysis and understanding of brain function. It focuses on research projects combining mathematical and computational tools with neurophysiological, neuroanatomical, or neurochemical techniques in order to decipher the mechanisms which underlie specific neuronal and behavioral systems. It also supports research projects focusing on understanding the computations made by nerve cells and groups of nerve cells in orchestrating behaviors.

The Neurotechnology Program supports research and development of new technologies and approaches for studying the brain and behavior, including projects in basic and applied informatics. Tools supported include software for analysis of behavior, images, molecular data, etc. Resources supported include databases for gene expression in the brain (and other spatial information), protein structure, genetic sequence data, etc. Basic research supported includes the novel application of mathematical and statistical approaches to advance informatics related to brain and behavioral research. It is the NIMH home for several initiatives that include multiple Institutes and Centers of NIH, including those of BECON and BISTIC.

The Neuroimaging Informatics Technology Initiative is jointly sponsored with NINDS, and provides coordinated service, training, and research to develop and enhance the utility of informatics tools related to functional magnetic resonance imaging used in brain and behavioral research.

The Office on Neuroinformatics plans, directs, coordinates, and supports activities of the Human Brain Project. The Office gives grants that will lead to new digital and electronic tools for all domains of brain and behavioral research. The approaches and technologies studied under this grant funding initiative are being utilized to generate information that is generalizable, scalable, extensible, and interoperable.

NCRR

NCRR supports the development of computational infrastructure in a number of different ways. We fund the following P41 centers in modeling/simulation {Leslie Loew (system biology), Ralph Roskies (supercomputer - database analysis - genetics and models), Peter Arzberger (supercomputer - database analysis - genetics and models), Thomas Ferrin (molecular graphics) Charlie Brooks (molecular simulations),Klaus Schulten (molecular simulations)}

A major area of emphasis at NCRR is the Bioinformatics Research Network. This program is focused on solving problems involved with the federation of databases. A number of pilot projects are underway in this initiative.

NCRR has sponsored a program announcement in the area of collaborative science that is just making its way through ENS and involved computational biology. The purpose of this program announcement is to invite proposals to develop tools and techniques to harness the unprecedented volume of data generated by collaborations between researchers. Proposals dealing with data from either research laboratories or from the clinical laboratories are welcome. Using these new tools and techniques, it is expected that two or more laboratories will be able to productively collaborate in ways that are not currently possible.

NCRR has also recently initiated a program announcement in software maintenance. The goal of this PA is to support the continued development, maintenance, testing and evaluation of existing software. The proposed work should apply best practices and proven methods for software design, construction and implementation to extend the applicability of existing bioinformatics/computational biology software to a broader biomedical research community.

Finally, NCRR supports the development of software for instrumentation through PAR-03-075.


NIAAA

I. Bioinformatics and computation biology for analysis of sequences and sequence-related resources

  • Interest: Tools that can better explore the information embedded in DNA and protein sequences. For example, regulatory elements in promoters and introns, alternatively spliced RNAs, identification of functional domains or signaling sequences, and prediction of protein structures. Generation of sequence-related databases and tools for mining these resources.
  • Ongoing activity: (1) A bioinformatics core is developing software that can analyze the 5'-UTRs of the differentially expressed genes. (2) Trans-NIH involvement: rat genome database, MGC, HapMap.
    II. Bioinformatics and computation biology on studies of genetic linkage and Epidemiology
  • Interest: Bioinformatics, statistical, and computational tools that can assist the analysis of genetic linkage or epidemiological studies in complex diseases, for example, QTL analysis.
  • Ongoing activities: A NIAAA-funded program "Collaborative Studies on the Genetics of Alcoholism" (COGA) has been using and developing tools and databases for genetic linkage analysis.
III. Bioinformatics and computation biology for studies in genomics and functional genomics
  • Interest: Bioinformatics, statistical, and computational tools or methods for image analysis, data analysis, data comparison, data storage, data sharing, and data publishing.
  • Ongoing activity: (1) Two microarray centers, two bioinformatics cores, and several microarray projects are currently funded. (2) 3 relevant RFAs (microarray, proteomics, and ENU targeted random mutagenesis) have been issued. (3) 2 relevant PAs will be issued (proteomics, novel technologies). (4) 4 potential initiatives are being discussed - functional genomics alliance, genetical genomics, metabolomics, and computational biology.

IV. Bioinformatics and computation biology for deciphering biochemical or signal pathways, and their interactions (networks).

  • Ongoing activity: Bioinformatics cores, neuroinformatics resource facilities, and some initiatives under discussion.
V. Bioinformatics and computation biology for studies in structural biology
  • Interest: database for protein structure, tools to compare and predict structures, and tools to identify structure-fitting compounds.
  • Ongoing activity: Ethanol binding site of glycine, GABA, nicotinic acetylcholine, and serotonin receptors.

VI. Bioinformatics and computation biology used in imaging

  • Interest: computational tools for better analyzing and visualizing imaging data.
  • Ongoing activity: Many NIAAA-funded projects involve neuroimaging approaches.
VII. Computational or mathematical modeling to study "functional modules" in the cell.

VIII. Computational or mathematical modeling for biological system, whole organism, or behavior.

  • Ongoing activity: 1 neuro-computational research project.

NIDCR

  • Microbial genomic sequences - full and draft (8x) coverage
  • Microarrays
  • Proteomics - function and structure
  • Metabolome
  • Use of bioinformatics tools (e.g., comparative genomics) to identify genes with unknown functions
  • Databases - Oragen (Oral Pathogens Relational Database at Los Alamos)
  • SNPs
  • Registries - cell, tissue, saliva
  • Animal model systems
  • Biological imaging
  • Biological structure - proteins, carbohydrates, tissues; microbial biofilms
  • Computer assisted training … virtual head

NINDS

A. Trans Government Agency Activities

1. RFA: Joint NSF-NIH Initiative to Support Collaborative Research in Computational Neuroscience (NSF-02-18/NS-02-501)
NINDS has led the Trans-Agency Working Group of seven NIH Institutes and four Directorates at NSF that developed this initiative. The review was conducted at NSF and the two agencies jointly funded 31 grants out of 157 applications. Currently, the Working Group is developing a PA (FY2004) as follow up to the RFA.

2. The Human Brain Project (HBP)
NINDS is participating in HBP, a trans-agency (NIH, NSF and DOE) initiative led by NIMH that supports research and development of advanced neuroinformatics technologies and infrastructure through cooperative efforts among neuroscientists and information scientists. Currently, four PAS are active under HBP. (http://www.nimh.nih.gov/neuroinformatics/index.cfm)


B. Trans NIH Activities

1. Biomedical Information Sciences and Technology Initiative (BISTI)
NINDS has been participating in BISTI. This initiative is aimed at making optimal use of computer science and technology to address problems in biology and medicine. Currently, three PAs are active under BISTI: (http://www.bisti.nih.gov)

2. Neuroimaging Informatics Technology Initiative (NIfTI)
NINDS has jointed the NIMH to sponsor NIfTI, to provide coordinated and targeted service, training, and research to facilitate the development and enhance the utility of informatics tools related to neuroimaging. The current focus is on fMRI-related informatics tools. Under the NIfTI program, four workshops were held and one RFA (RFA-MH-02-008 Characterizing, Validating, and Comparing Neuroimaging Informatics Tools) was funded. (http://nifti.nimh.nih.gov)

3. Nature Neuroscience Special Supplement Computational Approaches to Brain Function
In 2000, NINDS and NIMH, NIDA and NIAAA Working Group supported the publication of this special supplement. It contains a preview of the NIH and eight review articles. It was widely distributed in the neuroscience community and well received. (http://www.nature.com/neuro/supplements)


C. NINDS Activities

1. NANDS Council Subcommittee on Computational Neuroscience, Neuroinformatics and Infrastructure
This subcommittee was established in February 2000. It meets prior to and reports to the NANDS council. The role of the committee is to access the needs in the computational neuroscience and neuroinformatics field and monitor the related activities supported by NINDS.

2. Workshop: Computational and Theoretical Neuroscience - From Synapse to Circuitry (April 28, 2000)
Participants of this workshop were theoreticians and computational scientists who have been working very closely with experimental neuroscientists. The workshop identified research area of emphasis for the NINDS, it also gave the following general recommendations: establish cross-cultural collaboration; promote interdisciplinary training; create career opportunities for computational scientists in biological research fields; educate computational scientist to become successful NIH applicants.
(http://www.ninds.nih.gov/news_and_events/computationalwkshp_technical.htm)

3. Grant Funding
The NINDS is currently supporting 125 grants related to computational biology and 58 related to bioinformatics. The grants can be breaking down into molecular, cellular, system, behavioral levels and disease-related, such as epilepsy and Parkinson's disease. Some of them are related to neuroimaging or neural prostheses.

Peng, NIBIB

The mission of the NIBIB is to focus on new and novel technology, including algorithm and model development. The areas of bioinformatics and computational biology, therefore, focus specifically on new and novel mechanistic developments in the algorithm or model, rather than the application of existing algorithms or models to a particular organ or disease area.

Bioinformatics Areas

Includes imaging informatics such as the study, invention, and implementation of structures and algorithms to improve communication, understanding and management of information related to biomedical images. Includes knowledge-based systems, deformable atlases, integration of imaging information from diverse imaging methods, automation of image-guided treatment, outcome studies and meta-analyses.

Includes development of new technologies to collect, store, retrieve, and integrate quantitative data ranging from the genome to the organism and to elucidate functional dynamics in living cells and tissues with sensitivity down to the level of single molecules.

Includes large-scale data- driven knowledge base and database methods that support data mining, statistical analysis, systems biology and modeling efforts. Integration of different data types, especially time-variant data; development of database and software infrastructure standards; design of experiments to build mesoscopic databases, including those describing the physico-chemical properties of gene products and databases on physiological function; development of tools for information management and dissemination to cope with the large amount of data generated by combinatorial approaches are all included.

Includes development and application of classification systems and standard terminology to improve health care and reduce costs and creation of better techniques to facilitate the accurate and efficient collection of information from physicians, other health professionals, and patients. Methods may include user-friendly remote sensing devices for home and community use.

Includes improvement of computer science methods to protect confidentiality of patient data; development of methods for structuring, managing, and analyzing large, distributed, networked, adaptive databases; development of methods for acquiring patient data and knowledge; and development of methods for sharing knowledge for multiple purposes and updating disseminated information as it is superseded by more recent data.

Basic research in computer science (such as software engineering methods, high-end computing (software and hardware), high-performance networking, grid computing, knowledge representations, ontology development, basic algorithms and solvers, and methods and standards for data and image manipulation) that is expected to have long-term biomedical impact is should be referred to NIBIB.

Includes software and hardware for image display, visualization, and computer-aided interpretation. Studies of image perception and psychophysics related to imaging devices; development of display systems and methodologies that facilitate the review of large volumes of image data; enhanced methods for early detection of diseases and disorders; image computation, including hardware and software for image reconstruction and processing. Computer-aided image analysis methodologies for increasing the specificity of current clinical imaging methods; incorporation of biostatistical components in image analysis programs; and development of human computer interfaces for simulation of medical procedures or virtual manipulation of physical and quantitative models. Image processing including the segmentation, filtering, reformatting, augmentation, registration etc. of images for improved detection, diagnosis, and treatment of disease and injury.

Surgical tools and techniques include the development of new medical technologies, including image-guided therapies, computer-assisted surgeries and large-scale simulation modeling to improve surgical outcomes.

Computational Biology Areas

Includes studies that focus on the development of algorithms, mathematical models, simulations and analysis of complex biological, physiological, and biomechanical systems and use genomics and proteomics (examples include studies in systems biology and multi-scale modeling approaches).

Systems Biology approaches, which include development of data-driven, genome-based, organism-scale models for the analysis, interpretation, and prediction of the genotype-phenotype relationship; development of modeling, simulation, and statistical theory to describe single cell behavior to parallel empirical observation; and development and testing of comprehensive informatic-based multivariate physiological models that span scales from the cell to the organism, e.g., to uncover the rules of nonlinear cellular and systemic regulation. Includes development of physiologically based mathematical models to predict therapy delivery or in vivo remodeling; development of knowledge-based modeling which incorporates analysis for validation and testing; and construction of methods for visualizing and interpreting large and possibly heterogeneous data sets and the results of multivariate, time-dependent simulations of biological and biomolecular systems.

NHLBI

The NHLBI is involved in a broad range of biomedical informatic and computational biology activities addressing the need to improve our understanding of genomic, proteomic, and physiological functions in health and disease. Generally, specific informatic activities are integrated with ongoing scientific programs. The ongoing Programs for Genomic Applications (PGAs) is a new initiative to advance functional genomic research related to heart, lung, blood, and sleep health and disorders. The PGA centers have developed web-based approaches for collaboration and the open dissemination of databased research information including mutant and transgenic rodent strains and phenotypes, DNA sequence comparisons, cDNA libraries, protein sequences, and large-scale microarray expression profile data of human disease models. Coordination on cross-cutting bioinformatic issues such as database nomenclature has enhanced inter-operability and facilitated comparisons across species. A unique feature of the PGA program is that the bioinformatic and computational tools are freely distributed, and that the external use of these bioinformatic and computational resources is facilitated through a regular schedule of training workshops, courses, and visiting scientist programs. A new NHLBI Proteomics Initiative will converge results from different platform technologies that analyze intracellular and secreted proteins related to heart, lung, blood, and sleep disease processes. A key component is the development of relational software and statistical analysis regimens that will allow the comparison and correlation of different datasets generated by common but diverse technologies such as fluorescence activated cell sorters (FACS), robotic microarray printers and scanners, and microfabrication design for capillary electrophoresis equipment. The goal is to facilitate interdisciplinary research across diseases and model organisms.

The Institute specializes in computational and database platforms necessary to facilitate gene finding, data mining, combinatorial partitioning and neural network models for heart, lung and blood disease. Genelink is a new program that facilitates data sharing, information exchange and meta-analysis in ongoing NHLBI family studies. An important activity is examining the efficacy and safety of new data gathering methodologies, monitoring nationwide health trends, and developing innovative diagnostic and informatic technologies in the epidemiologic setting to promote diagnosis and information gathering. The NHLBI has a limited role in the curation and limited-access dissemination of epidemiological datasets for research.

A diverse array of informatic and computational activities are also included as core subprojects within a relatively large portfolio of interactive center and program project grants in heart, lung, and blood diseases. The NHLBI participates in the program announcements of the NIH Biomedical Information Science and Technology Initiative Consortium and funds a portfolio of phased innovative awards (R21/R33) and pre-center bioinformatic planning grants. These activities produce mutant libraries and gene expression microarrays; phenotype and gene databases for animal models; computer simulation of molecular and tissue mechanics in disease processes; and enhanced treatment management systems to facilitate physician decision-making. The NHLBI Division of Intramural Research (DIR) supports a spectrum of tools ranging from web-accessible clinical and laboratory databases with data warehousing to dedicated clusters of computers and custom software supporting molecular modeling and simulation, real-time MRI imaging of the heart, or pattern recognition analysis of large genomic and proteomic data sets. The DIR has both centralized information technologists for tools to support research, as well as tenured and career scientists whose research interests require computational methods to address scientific questions.