How to change sequence names coming from Prokaryotic GeneFinding

Mariana Monteiro | April 3, 2025November 12, 2018

When running GeneFinding the sequences receive a name with the predicted genes.

The first part of the sequence identifier comes from the genome reference sequence name (de-novo assembly) and then a _orfx is appended, where x is a number.

Sometimes this name is not useful to proceed with downstream analysis or compare results from other experiments.

Is there any way in which I can attribute the 4,357 gene names to more standard gene IDs, such as rseq gene IDs or ENSG IDs?

The approach that can be followed is to replace the sequence name by the top hit from a reference (.fasta) retrieved by similarity.

OmicsBox/Blast2GO offers the following feature under Tools > Retrieve Blat Top-hit which will search for similar sequences against a reference genome.

If the reference genome is available at the NCBI, this can be downloaded and then used to replace the names.

1. Download reference genome (e.g. genes) from NCBI.
  1. 1. Usually under Send to on the top right corner from the page e.g. Gene Features (Fasta Nucleotide).
2. Under Tools > Retrieve Blat Top-hit choose the parameters like in Figure 1.

Figure 1: Retrieve Blat Top-Hit Parameters

The user will end up with a new project, where the sequences itself are from the gene finding project and the sequence names are the ones from the reference.

Note: The reference genome (genes) used in the feature can also be retrieved from BioMart from within OmicsBox/Blast2GO (see Load Sequences/ Annotation from a list of identifiers with Blast2GO) or from Load Fasta from Reference + GFF/GTF.

Blog Categories:

News

Releases, Media, Announcements, etc.

Use Cases, Reviews, Tutorials

Product Tutorial, Quickstarts, New Features, etc.

Video Tutorials

Helpful Features, Tips and Tricks

Tips And Tricks

Mini-tutorials for common use-cases and to address frequently asked questions FAQs

Blog Categories:

Most Popular:

Reference-free analysis of long-read RNA sequencing for non-model species

OmicsBox in a New Scientific Publication on Sorghum Stress Responses

LongTREC Secondments at BioBam: Advancing Long-Read Transcriptomics Through Collaboration

Cell Type Prediction with CellKb

OmicsBox 3.4: CellKb’s Advanced Cell-Type Annotation

Company

OmicsBox

Blog

Info

Security