Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

I would like just like the screenshots of executed code and installed softwares on linux please for this assignment, I have been unable to execute

I would like just like the screenshots of executed code and installed softwares on linux please for this assignment, I have been unable to execute properly
##### About the data
You will use whole genome data from SARS and Covid-19 at the National Center for Biotechnology Information (NCBI) RefSeq genome collection. Specifically, you will download the coding sequence (CDS) data for each genome using biomartr package in R.
##### Completing your assignment
##### Approach
You will use biomartr packge to download the CDS sequences for SARS and Covid-19. You will then use the orthologr R package to identify orthologs using reciprocal best hit ("RBH") approach, align with Clustal Omega (clustalo), run pal2nal to convert protein alignments back to nucleotide alignments, and finally estimate dN, dS and dN/dS from multiple sequence alignments. This can all be done with 3 simple commands executed in RStudio.
##### Configuring your computer for orthologr
It is anticipated that configuring your computer could take some time depending on your OS and personal computer configuration. Your instructor will dedicate a session to working with students and answering questons. If you are not familiar with software compilation and installation, then this is an excellent opportunity to learn more.
You will need to configure your computer with two software: Blast+ and KaKs_Calculator (version 1.2). This is relatively straightforward for MacOS and Linux operating systems, but could be more challenging for Windows users.
Instructions provided below **are only for Linux/MacOS users**. It should be possible complete the project on Windows operating system with WSL or some other linux emulator. However, for WSL we have encountered difficulties compiling Ka_Ks_Calculator from source in the past. Please reach out to your instructor if you are using WSL and encounter difficulties.
Some basic instructions for configuring your computer for orthologr are here:
Note that you can skip all software except the three mentioned above.
Blast+ should have pre-compiled executables available for your operating system. In other words, you do not need to compile these softwares from source.
KaKs_Calculator can be downloaded from here and must be compiled from source:
.
Please download version 1.2 with description "KaKs_Calculator Version 1.2- Command Line for Linux/Mac".
Instructions for how to install can be found at the drostlab github page listed above.
Your instructor recommends placing all binaries (executable files) in your /usr/local/bin directory. This is the standard location for third-party softwares on Unix/Linux-based operating systems and will help you avoid issues when running orthologr. For Blast+, you only need to copy `makeblastdb` and `blastp` executables to /usr/local/bin (the others are not necessary for your assignment).
Install 'orthologr'. See instructions at the drostlab github page above.
Install 'biomartr'. See instructions here:
##### Tasks
The commands you need to execute are listed at the drostlab github page in the section
"Example: Computing dN/dS values for all orthologous genes between two genomes":
Instead of comparing whole mouse and human genes, we will instead compare SARS and Covid-19 and need to modify the code for this purpose.
Please modify the `biomart::getCDS` commands to download organism "GCF_009858895.2" for Covid-19 and "GCF_000864885.1" for SARS and define sensible variable names.
After running the biomart code, it is common to see the error "The download session seems to have timed out at the FTP site ..." This can appear even when the files downloaded correctly. To be sure, you can always compare the md5 hash using the md5sum (Linux) or md5(MacOs) commands from your terminal and confirm the hashes match those in the downloaded file "*md5checksums.txt" which contains the hashes of the files on the NCBI ftp site.
Modify the dNdS command `query_file` argument to specify the Covid-19 cds file from the `biomart::getCDS` step.
Modify the dNdS command `query_file` argument to specify the Covid-19 cds file from the `biomart::getCDS` step.
Modify the dNdS command `subject_file` argument to specify the SARS cds file.
You can use the pairwise alignment approach with "NW"(Needleman-Wunsch) algorithm, otherwise you will need to install additional standalone alignment software. This Needleman-Wunsch approach produces very similar results as aligning with Clustal Omega and is fine for this assignment.
When you you are ready, you can run the code

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions

Question

Describe effectiveness of reading at night?

Answered: 1 week ago

Question

find all matrices A (a) A = 13 (b) A + A = 213

Answered: 1 week ago