How useful are predicted sequences in NCBI?

June 5, 201016 yr

Bit new to bioinformatics and just wondered how useful PREDICTED amino acid sequences derived from the genome are (I have found several in NCBI). If trying to BLAST these sequences to find conserved sequences, are the results going to be of use or is it better to not bother with these sequences at all?

Thanks,

Caz

June 5, 201016 yr

Well of course there are problems knowing whether you've got the right open reading frame, as well as knowing whether you're actually in a gene or not. But other than that, i have never had any problem (I only have limited experience though).

June 7, 201016 yr

It depends on what you are looking for and what kind of database you use.

Sequences for already well characterized proteins tend to be useful in most cases. However due to the automated pipelines that are used nowadays errors could still be there. Swissprot, for instance is a better curated database, yet with overall fewer sequences.

As a rule of thumb reality checking with well-characterized reference genomes are helpful, especially with regards to functional assignments.

But again, it really depends on what you are looking for (e.g. single protein vs whole genome analyses, intergenic regions etc.)

June 14, 201015 yr

Author

Hiya!

Wow thanks so much for the replies - very useful stuff so far and has served to illuminate my own lack of knowledge of the subject! I think I am getting a little confused with these things. The PREDICTED sequences I am looking at are nucleotide sequences and I am not sure how I would cross reference that against a genome. It just doesn't seem to make sense to me so I assume that my lack of experience in the field means I don't have access to all the facts! For example NCBI accession number XM_001120951 - it says this nucleotide sequence has been predicted from the genomic sequence. I am finding this quite confusing as surely the nucleotide exists or it doesn't. I want to understand how these predicted sequences are different to (let's say) "normal" sequences.

I understand this may be a little in depth to expect an answer on but if anyone could perhaps reccomend a book which may cover this aspect of genomes that would be just as helpful for me.

Thanks so much for your help,

Caroline

June 14, 201015 yr

Predicted does not mean that the sequence is predicted (it has been sequenced) but that it has been predicted to be an open reading frame. This topic should be covered by most molecular genetics text books (e.g. Genes).

Again, the function of a locus is predicted but the sequence itself is based on data (though depending on source it may be faulty, but that is another issue).

June 14, 201015 yr

Author

That's great - thanks so much for your help. Making a lot more sense now.

Best,

Caroline

October 18, 20214 yr

Good day CharonY.

Following what cazmantis asked you, I want to also know if it's appropriate for me to design primers using 'predicted' nucleotide sequences, obtained from NCBI.

(Am carrying out a project, and it involves primer design)

Please reply soon CharonY.

Thanks

How useful are predicted sequences in NCBI?

Featured Replies

Create an account or sign in to comment

Important Information

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)