Answered step by step
Verified Expert Solution
Question
1 Approved Answer
I ' m having some trouble executing the codes and getting results on windows. Some screenshots of the results would be appreciated About the data
Im having some trouble executing the codes and getting results on windows. Some screenshots of the results would be appreciated
About the data You will use whole genome data from SARS and Covid at the National Center for
Biotechnology Information NCBI RefSeq genome collection. Specifically, you will download the coding
sequence CDS data for each genome using biomartr package in R
Approach You will use biomartr packge to download the CDS sequences for SARS and Covid You
will then use the orthologr R package to identify orthologs using reciprocal best hit RBH approach, align
with Clustal Omega clustalo run palnal to convert protein alignments back to nucleotide alignments, and
finally estimate dN dS and dNdS from multiple sequence alignments. This can all be done with simple
commands executed in RStudio.
Configuring your computer for orthologr It is anticipated that configuring your computer could take
some time depending on your OS and personal computer configuration. Your instructor will dedicate a
session to working with students and answering questons. If you are not familiar with software compilation
and installation, then this is an excellent opportunity to learn more.
You will need to configure your computer with two software: Blast and KaKsCalculator version
This is relatively straightforward for MacOS and Linux operating systems, but could be more challenging
for Windows users.
Instructions provided below are only for LinuxMacOS users. It should be possible complete the
project on Windows operating system with WSL or some other linux emulator. However, for WSL we have
encountered difficulties compiling KaKsCalculator from source in the past. Please reach out to your
instructor if you are using WSL and encounter difficulties.
Some basic instructions for configuring your computer for orthologr at the drostlab github
Note that you can skip all software except the three mentioned above.
Blast should have precompiled executables available for your operating system. In other words, you do
not need to compile these softwares from source.
KaKsCalculator can be downloaded from here and must be compiled from source: google kakscalculator
Please download version with description KaKsCalculator
Version Command Line for LinuxMac Instructions for how to install can be found at the drostlab
github page listed above.
Your instructor recommends placing all binaries executable files in your usrlocalbin directory. This
is the standard location for thirdparty softwares on UnixLinuxbased operating systems and will help
you avoid issues when running orthologr. For Blast you only need to copy makeblastdb and blastp
executables to usrlocalbin the others are not necessary for your assignment
Install orthologr See instructions at the drostlab github page above.
Install biomartr See instructions at biomartr github
Tasks The commands you need to execute are listed at the drostlab github page in the section Example:
Computing dNdS values for all orthologous genes between two genomes: google github drostlab orthologr #examplecomputingdndsvaluesforallorthologousgenesbetweentwogenomes
Instead of comparing whole mouse and human genes, we will instead compare SARS and Covid and need
to modify the code for this purpose.
Please modify the biomart::getCDS commands to download organism GCF for Covid
and GCF for SARS and define sensible variable names.
After running the biomart code, it is common to see the error The download session seems to have timed
out at the FTP site This can appear even when the files downloaded correctly. To be sure, you can
always compare the md hash using the mdsum Linux or mdMacOs commands from your terminal
and confirm the hashes match those in the downloaded file mdchecksums.txt which contains the hashes
of the files on the NCBI ftp site.
Modify the dNdS command queryfile argument to specify the Covid cds file from the biomart::getCDS
step.
Modify the dNdS command subjectfile argument to specify the SARS cds file.
You can use the pairwise alignment approach with NWNeedlemanWunsch algorithm, otherwise you
will need to install additional standalone alignment software. This NeedlemanWunsch approach produces
very similar results as aligning with Clustal Omega and is fine for this assignment.
When you you are ready, you can run the code.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started