Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I ' m having some trouble executing the codes and getting results on windows. Some screenshots of the results would be appreciated About the data

I'm having some trouble executing the codes and getting results on windows. Some screenshots of the results would be appreciated
About the data You will use whole genome data from SARS and Covid-19 at the National Center for
Biotechnology Information (NCBI) RefSeq genome collection. Specifically, you will download the coding
sequence (CDS) data for each genome using biomartr package in R.
Approach You will use biomartr packge to download the CDS sequences for SARS and Covid-19. You
will then use the orthologr R package to identify orthologs using reciprocal best hit (RBH) approach, align
with Clustal Omega (clustalo), run pal2nal to convert protein alignments back to nucleotide alignments, and
finally estimate dN, dS and dN/dS from multiple sequence alignments. This can all be done with 3 simple
commands executed in RStudio.
Configuring your computer for orthologr It is anticipated that configuring your computer could take
some time depending on your OS and personal computer configuration. Your instructor will dedicate a
session to working with students and answering questons. If you are not familiar with software compilation
and installation, then this is an excellent opportunity to learn more.
You will need to configure your computer with two software: Blast+ and KaKs_Calculator (version 1.2).
This is relatively straightforward for MacOS and Linux operating systems, but could be more challenging
for Windows users.
Instructions provided below are only for Linux/MacOS users. It should be possible complete the
project on Windows operating system with WSL or some other linux emulator. However, for WSL we have
encountered difficulties compiling Ka_Ks_Calculator from source in the past. Please reach out to your
instructor if you are using WSL and encounter difficulties.
Some basic instructions for configuring your computer for orthologr at the drostlab github
Note that you can skip all software except the three mentioned above.
Blast+ should have pre-compiled executables available for your operating system. In other words, you do
not need to compile these softwares from source.
KaKs_Calculator can be downloaded from here and must be compiled from source: google kaks_calculator
Please download version 1.2 with description KaKs_Calculator
1
Version 1.2- Command Line for Linux/Mac. Instructions for how to install can be found at the drostlab
github page listed above.
Your instructor recommends placing all binaries (executable files) in your /usr/local/bin directory. This
is the standard location for third-party softwares on Unix/Linux-based operating systems and will help
you avoid issues when running orthologr. For Blast+, you only need to copy makeblastdb and blastp
executables to /usr/local/bin (the others are not necessary for your assignment).
Install orthologr. See instructions at the drostlab github page above.
Install biomartr. See instructions at biomartr github
Tasks The commands you need to execute are listed at the drostlab github page in the section Example:
Computing dN/dS values for all orthologous genes between two genomes: google github drostlab orthologr #example-computing-dnds-values-for-all-orthologous-genes-between-two-genomes
Instead of comparing whole mouse and human genes, we will instead compare SARS and Covid-19 and need
to modify the code for this purpose.
Please modify the biomart::getCDS commands to download organism GCF_009858895.2 for Covid-19
and GCF_000864885.1 for SARS and define sensible variable names.
After running the biomart code, it is common to see the error The download session seems to have timed
out at the FTP site ... This can appear even when the files downloaded correctly. To be sure, you can
always compare the md5 hash using the md5sum (Linux) or md5(MacOs) commands from your terminal
and confirm the hashes match those in the downloaded file *md5checksums.txt which contains the hashes
of the files on the NCBI ftp site.
Modify the dNdS command query_file argument to specify the Covid-19 cds file from the biomart::getCDS
step.
Modify the dNdS command subject_file argument to specify the SARS cds file.
You can use the pairwise alignment approach with NW(Needleman-Wunsch) algorithm, otherwise you
will need to install additional standalone alignment software. This Needleman-Wunsch approach produces
very similar results as aligning with Clustal Omega and is fine for this assignment.
When you you are ready, you can run the code.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database Administration The Complete Guide To Dba Practices And Procedures

Authors: Craig S. Mullins

2nd Edition

0321822943, 978-0321822949

More Books

Students also viewed these Databases questions

Question

What is P{T1 Answered: 1 week ago

Answered: 1 week ago

Question

Define Administration and Management

Answered: 1 week ago

Question

Define organisational structure

Answered: 1 week ago

Question

Define line and staff authority

Answered: 1 week ago

Question

Define the process of communication

Answered: 1 week ago

Question

Explain the importance of effective communication

Answered: 1 week ago