Jump to content

Bioinformatic differential expression analysis


Mr Nobody

Recommended Posts

In order to do this type of analysis, after RNA samples have been extracted and a library has been constructed, the sequences have to be sequenced and assembled.

Remember before any assembly can occur, the quality of the reads needs to be checked for any adapter content or poor quality regions. These can be trimmed using several different programmes. Once trimmed, assembly can begin. When using command line, the Tuxedo Suite is the most often used for assembly and transcriptome analysis.

Since a transcriptome analysis works with multiple RNA sequences, the amount of overrepresented sequences and duplicates will be high.

Tuxedo Suite:

Bowtie - Allows for fast and simple alignment. Needed to form the base of Tophat alignment.

Needs a reference genome (.fa)

Tophat - Uses output file from Bowtie and aligns RNA sequences in a splice-aware way. It allows for the discovery of new splice junctions. This will be repeated for every read you have.

(Eg. tophat2 –p 5 --library-type fr-firststrand –o outputDirectory (Reference file name) inputFile.fq)

Cufflinks - Assembles transcripts

(Eg. cufflinks –g(reference .gtf file) –b(reference.fa) –u --library-type fr-firststrand –o outputDirectory inputfile.bam)

Cuffmerge - Merges multiple transcript assemblies into 1 file

Often to reduce the complexity of the script, a text file is made which contains the path to the .gtf file needed

(Eg. cuffmerge –o outputDirectory –g reference.gtf –s reference.fa pathfile.txt)

Cuffdiff - Differential expression analysis for Transcriptome analysis

(Eg. cuffdiff –p 5 –b reference.fa –u mergedfile.gtf CaseInputfiles.bam(separated by a comma) ControlInputfile.bam –o outputDirectory)

 

For assistance with Tophat - https://www.illumina.com/documents/products/technotes/RNASeqAnalysisTopHat.pdf

https://ccb.jhu.edu/software/tophat/manual.shtml

 

For assistance with Cufflinks - https://www.google.co.za/url?sa=t&rct=j&q=&esrc=s&source=web&cd=5&cad=rja&uact=8&ved=0ahUKEwin5rqd1pfUAhWIAMAKHUEfAH4QFgg2MAQ&url=https%3A%2F%2Fwww.researchgate.net%2Ffile.PostFileLoader.html%3Fid%3D544651e5d3df3edb2b8b463a%26assetKey%3DAS%253A273626954174476%25401442249157340&usg=AFQjCNHlfzwAeAOVgHNwH4gfae3r_YPjig&sig2=n5gn1ZMpiTTMiAojbmxadw

Link to comment
Share on other sites

Just a note, bowtie is very specific about its command... before you do tophat command, you need to "repackage" the reference genome file with bowtie. For example, in my lab, we use this command:

 

$bowtie2-build ref_genome.fa ref_genome

  • the fasta file is basically being repackaged so that tophat can use it. But the "package" (ref_genome) has to have the same name as the fasta file (ref_genome.fa), obviously removing the .fa part

Then, you can follow this with your tophat command (all in the same shell)

Tuxedo Genome Guided Transcriptome Assembly Workshop site gives a good explanation of the workflow for this type of analysis as well:

 

https://github.com/trinityrnaseq/RNASeq_Trinity_Tuxedo_Workshop/wiki/Tuxedo-Genome-Guided-Transcriptome-Assembly-Workshop

 

Alos, you can find a helpful flow diagram in the Cufflinks manual:

http://cole-trapnell-lab.github.io/cufflinks/manual/

Link to comment
Share on other sites

Guest
This topic is now closed to further replies.
×
×
  • Create New...

Important Information

We have placed cookies on your device to help make this website better. You can adjust your cookie settings, otherwise we'll assume you're okay to continue.