UniProt website home page

Find your protein

UniProt core data

Supporting Data

Automatic annotations: UniRule & ARBA
  • UniProt release 2024_06

    What happens when ribosomes crash | Cross-references to FunFam | Cross-references to AntiFam

  • UniProt release 2024_05

    Plasma membrane rupture during cell death: from a passive hypothesis to an active process | Changes to the controlled vocabulary of human diseases

  • UniProt release 2024_04

    Oocyte waste disposal strategy: 'store to degrade later' | Removal of the cross-references to CLAE | Removal of the cross-references to COMPLUYEAST-2DPAGE

  • UniProt release 2024_03

    The culprit for extreme morning sickness identified | Removal of the cross-references to Genevisible | Removal of the cross-references to SWISS-2DPAGE

  • UniProt release 2024_02

    CMV infections: plants beaten at their own game | Changes to the controlled vocabulary of human diseases | Changes to the controlled vocabulary for PTMs

  • UniProt release 2024_01

    Vitamin K beyond coagulation | Cross-references to EMDB | Cross-references to JaponicusDB | Changes to the controlled vocabulary...

Unconventional

There are about 8 billion people living on our planet today. It's a lot. But consider the following: one human body harbours about 380 trillion viruses and 39 trillion bacteria - both on our skin and underneath it...

Analysis Tools

View dashboard

Need help?

Find answers through our help center or get in touch

Live webinar

OpenOnline

Automated annotation in UniProt

UniProt is a high quality, comprehensive protein resource in which the core activity is the expert review and annotation of proteins where the function has been experimentally investigated. At the same time, the UniProt database contains large numbers of proteins which are predicted to exist from gene models, but which do not have associated experimental evidence indicating their function. UniProt commits significant resources to developing computational methods for functional annotation of these predicted proteins based on the data in entries that have gone through the expert review process. We will describe the two main automated annotation systems currently in use. First, UniRule, which is an established UniProt system in which curators manually develop rules for annotation. Second, ARBA (Association-Rule-Based Annotator), which is a multi-class learning system which uses rule mining techniques to generate concise annotation models. ARBA employs a data exclusion algorithm that censors data not suitable for computational annotation, and generates human-readable rules for each UniProt release. As part of our interest in engaging with the machine learning community, we will also introduce the contribution of ProtNLM (Protein Natural Language Model), from Google Research, which annotates proteins which have "uncharacterised" names. We will also introduce UniFIRE, an open source software that enables researchers to annotate their own protein dataset by using the above mentioned annotation systems. In order to provide an easy and straightforward way to download and set up this tool we have containerised UniFIRE together with all its dependencies and the latest set of UniRule and ARBA rules. In this webinar, we will show how to create functional predictions for protein sequences by using this container image.

UniProt data

We'd like to inform you that we have updated our Privacy Notice to comply with Europe’s new General Data Protection Regulation (GDPR) that applies since 25 May 2018.
Help