LibraryNCBI Databases: GenBank, PubMed, Protein

NCBI Databases: GenBank, PubMed, Protein

Learn about NCBI Databases: GenBank, PubMed, Protein as part of Bioinformatics and Computational Biology

Navigating the NCBI: GenBank, PubMed, and Protein Databases

The National Center for Biotechnology Information (NCBI) is a cornerstone of bioinformatics, providing a vast array of biological databases and tools. Understanding its key resources, such as GenBank, PubMed, and the Protein database, is essential for anyone working with biological sequence data and scientific literature.

GenBank: The Nucleotide Archive

GenBank is a comprehensive, non-redundant collection of all publicly available DNA sequences. It's a primary repository for genetic information, housing sequences from a wide range of organisms, including bacteria, archaea, eukaryotes, and viruses. Each entry in GenBank contains not only the nucleotide sequence but also detailed annotations, such as gene names, protein products, and literature references.

GenBank stores DNA sequences and their associated metadata.

GenBank is the NCBI's primary repository for DNA sequences. It's crucial for storing and retrieving genetic information from diverse organisms. Each sequence record includes annotations like gene names and related publications.

GenBank is a genetic sequence database that represents a continually updated collection of all annotated DNA sequences. It is maintained by the National Center for Biotechnology Information (NCBI) at the National Library of Medicine (NLM) of the National Institutes of Health (NIH). GenBank is a collaborative project, with contributions from scientists worldwide. The database is organized by accession number, which is a unique identifier for each sequence record. Users can search GenBank using various criteria, including sequence similarity, keywords, and accession numbers. The data within GenBank is essential for understanding gene function, evolutionary relationships, and developing biotechnological applications.

What type of biological data does GenBank primarily store?

DNA sequences (nucleotides).

PubMed: The Biomedical Literature Gateway

PubMed is a freely accessible database of citations and abstracts for biomedical literature. It is part of the Entrez system at NCBI. While it doesn't contain the full text of all articles, it provides links to them when available. PubMed is indispensable for literature reviews, staying updated on research trends, and finding experimental methodologies.

PubMed acts as a search engine for biomedical research papers. It indexes millions of citations from journals, providing abstracts, author information, and links to full-text articles where available. Its advanced search capabilities allow users to find relevant literature based on keywords, authors, journals, and publication dates, making it a critical tool for scientific discovery and knowledge synthesis.

📚

Text-based content

Library pages focus on text content

What is the primary purpose of PubMed?

To provide access to citations and abstracts of biomedical literature.

Protein Database: The World of Proteins

The NCBI Protein database is a collection of protein sequences from various sources, including GenBank, RefSeq, and UniProt. It contains information about protein sequences, their functions, structures, and related literature. This database is crucial for understanding protein function, identifying protein domains, and studying protein-protein interactions.

The Protein database houses protein sequences and their functional information.

The NCBI Protein database is a vital resource for protein sequences, derived from multiple sources. It provides detailed information on protein function, structure, and links to relevant research, aiding in the study of molecular biology.

The NCBI Protein database is a curated collection of protein sequences. It integrates data from various sources, including GenBank, RefSeq (Reference Sequence database), and UniProt. Each protein record typically includes the amino acid sequence, functional annotations, domain information, taxonomic classification, and links to related nucleotide sequences and literature. This comprehensive data allows researchers to explore protein families, predict protein functions, and identify conserved regions across different species. Advanced search options enable users to find proteins based on sequence similarity, keywords, or specific protein families.

What kind of information can be found in the NCBI Protein database besides the amino acid sequence?

Functional annotations, domain information, taxonomic classification, and links to related data.

Interconnectivity and Tools

These databases are not isolated; they are interconnected through NCBI's Entrez system. This integration allows users to seamlessly navigate between related nucleotide sequences, protein sequences, and literature. NCBI also provides a suite of tools, such as BLAST (Basic Local Alignment Search Tool), which are essential for comparing biological sequences and identifying similarities.

DatabasePrimary ContentKey Use Case
GenBankDNA SequencesStoring and retrieving genetic information, evolutionary studies
PubMedBiomedical Literature Citations & AbstractsLiterature reviews, staying updated on research, finding methodologies
ProteinProtein Sequences & AnnotationsUnderstanding protein function, identifying domains, studying protein interactions

Think of GenBank as the library's book catalog for DNA, PubMed as the index for scientific articles, and the Protein database as the detailed encyclopedia entries for proteins.

Learning Resources

NCBI Overview: A Guide to the National Center for Biotechnology Information(documentation)

Provides a comprehensive overview of NCBI's mission, resources, and tools, setting the context for its databases.

GenBank: The NCBI's DNA Sequence Database(documentation)

The official entry point to GenBank, offering search capabilities and information about the database's structure and content.

PubMed: Search for Biomedical Literature(documentation)

The primary interface for searching the vast collection of biomedical literature, abstracts, and citations.

NCBI Protein Database(documentation)

Access the NCBI's comprehensive collection of protein sequences and related annotations.

NCBI BLAST: Basic Local Alignment Search Tool(tutorial)

Learn how to use BLAST, a fundamental tool for comparing biological sequences against databases like GenBank and Protein.

NCBI Bookshelf: Bioinformatics and Computational Biology(documentation)

Explore a collection of authoritative books and documents on bioinformatics, providing deeper context for database usage.

Understanding NCBI Databases: A Tutorial(video)

A video tutorial explaining the core NCBI databases and how to navigate them effectively.

NCBI E-utilities: Programmatic Access to NCBI Data(documentation)

Learn how to programmatically access and retrieve data from NCBI databases, essential for advanced bioinformatics workflows.

What is Bioinformatics?(wikipedia)

A foundational explanation of bioinformatics, highlighting the role of databases like those at NCBI.

The NCBI Handbook(documentation)

A detailed guide to using the NCBI's suite of resources, including in-depth explanations of GenBank, PubMed, and Protein.