Find your protein

Name: Automated annotation in UniProt
Start: 2025-1-30
End: 2025-1-30
Location: Automated annotation in UniProt

UniProt Knowledgebase

Proteomes

Protein sets for species with sequenced genomes from across the tree of life

UniRef

Clusters of protein sequences at 100%, 90% & 50% identity

UniParc

Non-redundant archive of publicly available protein sequences seen across different databases

Supporting Data

Cross-referenced databases

Subcellular locations

Automatic annotations: UniRule & ARBA

Latest News

View release archive

Forthcoming changes
Planned changes for UniProt
UniProt release 2024_06
What happens when ribosomes crash | Cross-references to FunFam | Cross-references to AntiFam
UniProt release 2024_05
Plasma membrane rupture during cell death: from a passive hypothesis to an active process | Changes to the controlled vocabulary of human diseases
UniProt release 2024_04
Oocyte waste disposal strategy: 'store to degrade later' | Removal of the cross-references to CLAE | Removal of the cross-references to COMPLUYEAST-2DPAGE
UniProt release 2024_03
The culprit for extreme morning sickness identified | Removal of the cross-references to Genevisible | Removal of the cross-references to SWISS-2DPAGE
UniProt release 2024_02
CMV infections: plants beaten at their own game | Changes to the controlled vocabulary of human diseases | Changes to the controlled vocabulary for PTMs
UniProt release 2024_01
Vitamin K beyond coagulation | Cross-references to EMDB | Cross-references to JaponicusDB | Changes to the controlled vocabulary...

#UsingUniProt - DisCanVis interpreting genomic variation data

In recent years a wealth of information has become available about genetic variations that underlie various diseases, especially cancer.

How artificial intelligence can help us annotate protein names

A conversation with machine learning engineer Andreea Gane. At UniProt we are very interested in engaging with the machine learning community

Unconventional

There are about 8 billion people living on our planet today. It's a lot. But consider the following: one human body harbours about 380 trillion viruses and 39 trillion bacteria - both on our skin and underneath it...

Analysis Tools

View dashboard

Search with a sequence to find homologs through pairwise sequence alignment

Align two or more protein sequences with Clustal Omega to find conserved regions

Find proteins with lists of UniProt IDs or convert from/to other database IDs

Search with a peptide sequence to find all UniProt proteins that contain exact matches

Need help?

Find answers through our help center or get in touch

Help center Contact us

Attend training

European Bioinformatics Institute (EBI)Protein Information Resource (PIR)Swiss Institute of Bioinformatics (SIB)

Tutorial & videos Online courses

Live webinar

Open 30 January 2025Online

Automated annotation in UniProt

UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally investigated. At the same time, the UniProt database contains large numbers of proteins which are predicted to exist from gene models, but which do not have associated experimental evidence indicating their function. UniProt commits significant resources to developing computational methods for functional annotation of these predicted proteins based on the data in entries that have gone through the expert review process. We will describe the two main automated annotation systems currently in use. First, UniRule, which is an established UniProt system in which curators manually develop rules for annotation. Second, ARBA (Association-Rule-Based Annotator), which is a multi-class learning system which uses rule mining techniques to generate concise annotation models. ARBA employs a data exclusion algorithm that censors data not suitable for computational annotation, and generates human-readable rules for each UniProt release. As part of our interest in engaging with the machine learning community, we will also introduce the contribution of ProtNLM (Protein Natural Language Model), from Google Research, which annotates proteins which have "uncharacterised" names. We will also introduce UniFIRE, an open source software that enables researchers to annotate their own protein dataset by using the above mentioned annotation systems. In order to provide an easy and straightforward way to download and set up this tool we have containerised UniFIRE together with all its dependencies and the latest set of UniRule and ARBA rules. In this webinar, we will show how to create functional predictions for protein sequences by using this container image.

UniProt data

Download UniProt release data

Manuals, schemas and ontology descriptions

Query UniProt data using APIs providing REST, SPARQL and Java services

Submit your sequences, publications and annotation updates

Find your protein

Proteins

UniProt Knowledgebase

Species

Proteomes

Protein Clusters

UniRef

Sequence archive

UniParc

Supporting Data

Latest News

Forthcoming changes

UniProt release 2024_06

UniProt release 2024_05

UniProt release 2024_04

UniProt release 2024_03

UniProt release 2024_02

UniProt release 2024_01

#UsingUniProt - DisCanVis interpreting genomic variation data

How artificial intelligence can help us annotate protein names

Unconventional

Analysis Tools

BLAST

Align

Search with Lists Map IDs

Search Peptides

Need help?

Attend training

Live webinar

Automated annotation in UniProt

UniProt data

FTP Download

Technical Documentation

Programmatic Access

Submit Data

UniProt website home page

Find your protein

UniProt core data

Proteins

UniProt Knowledgebase

Species

Proteomes

Protein Clusters

UniRef

Sequence archive

UniParc

Supporting Data

Latest News

Analysis Tools

BLAST

Align

Search with Lists Map IDs

Search Peptides

Need help?

Attend training

Live webinar

UniProt data

FTP Download

Technical Documentation

Programmatic Access

Submit Data