Answered step by step
Verified Expert Solution
Question
1 Approved Answer
I would like just like the screenshots of executed code and installed softwares on linux please for this assignment, I have been unable to execute
I would like just like the screenshots of executed code and installed softwares on linux please for this assignment, I have been unable to execute properly ##### About the data You will use whole genome data from SARS and Covid at the National Center for Biotechnology Information NCBI RefSeq genome collection. Specifically, you will download the coding sequence CDS data for each genome using biomartr package in R ##### Completing your assignment ##### Approach You will use biomartr packge to download the CDS sequences for SARS and Covid You will then use the orthologr R package to identify orthologs using reciprocal best hit RBH approach, align with Clustal Omega clustalo run palnal to convert protein alignments back to nucleotide alignments, and finally estimate dN dS and dNdS from multiple sequence alignments. This can all be done with simple commands executed in RStudio. ##### Configuring your computer for orthologr It is anticipated that configuring your computer could take some time depending on your OS and personal computer configuration. Your instructor will dedicate a session to working with students and answering questons. If you are not familiar with software compilation and installation, then this is an excellent opportunity to learn more. You will need to configure your computer with two software: Blast and KaKsCalculator version This is relatively straightforward for MacOS and Linux operating systems, but could be more challenging for Windows users. Instructions provided below are only for LinuxMacOS users It should be possible complete the project on Windows operating system with WSL or some other linux emulator. However, for WSL we have encountered difficulties compiling KaKsCalculator from source in the past. Please reach out to your instructor if you are using WSL and encounter difficulties. Some basic instructions for configuring your computer for orthologr are here: Note that you can skip all software except the three mentioned above. Blast should have precompiled executables available for your operating system. In other words, you do not need to compile these softwares from source. KaKsCalculator can be downloaded from here and must be compiled from source: Please download version with description "KaKsCalculator Version Command Line for LinuxMac Instructions for how to install can be found at the drostlab github page listed above. Your instructor recommends placing all binaries executable files in your usrlocalbin directory. This is the standard location for thirdparty softwares on UnixLinuxbased operating systems and will help you avoid issues when running orthologr. For Blast you only need to copy makeblastdb and blastp executables to usrlocalbin the others are not necessary for your assignment Install 'orthologr'. See instructions at the drostlab github page above. Install 'biomartr'. See instructions here: ##### Tasks The commands you need to execute are listed at the drostlab github page in the section "Example: Computing dNdS values for all orthologous genes between two genomes": Instead of comparing whole mouse and human genes, we will instead compare SARS and Covid and need to modify the code for this purpose. Please modify the biomart::getCDS commands to download organism "GCF for Covid and "GCF for SARS and define sensible variable names. After running the biomart code, it is common to see the error "The download session seems to have timed out at the FTP site This can appear even when the files downloaded correctly. To be sure, you can always compare the md hash using the mdsum Linux or mdMacOs commands from your terminal and confirm the hashes match those in the downloaded file mdchecksums.txt which contains the hashes of the files on the NCBI ftp site. Modify the dNdS command queryfile argument to specify the Covid cds file from the biomart::getCDS step. Modify the dNdS command queryfile argument to specify the Covid cds file from the biomart::getCDS step. Modify the dNdS command subjectfile argument to specify the SARS cds file. You can use the pairwise alignment approach with NWNeedlemanWunsch algorithm, otherwise you will need to install additional standalone alignment software. This NeedlemanWunsch approach produces very similar results as aligning with Clustal Omega and is fine for this assignment. When you you are ready, you can run the code
I would like just like the screenshots of executed code and installed softwares on linux please for this assignment, I have been unable to execute properly
##### About the data
You will use whole genome data from SARS and Covid at the National Center for Biotechnology Information NCBI RefSeq genome collection. Specifically, you will download the coding sequence CDS data for each genome using biomartr package in R
##### Completing your assignment
##### Approach
You will use biomartr packge to download the CDS sequences for SARS and Covid You will then use the orthologr R package to identify orthologs using reciprocal best hit RBH approach, align with Clustal Omega clustalo run palnal to convert protein alignments back to nucleotide alignments, and finally estimate dN dS and dNdS from multiple sequence alignments. This can all be done with simple commands executed in RStudio.
##### Configuring your computer for orthologr
It is anticipated that configuring your computer could take some time depending on your OS and personal computer configuration. Your instructor will dedicate a session to working with students and answering questons. If you are not familiar with software compilation and installation, then this is an excellent opportunity to learn more.
You will need to configure your computer with two software: Blast and KaKsCalculator version This is relatively straightforward for MacOS and Linux operating systems, but could be more challenging for Windows users.
Instructions provided below are only for LinuxMacOS users It should be possible complete the project on Windows operating system with WSL or some other linux emulator. However, for WSL we have encountered difficulties compiling KaKsCalculator from source in the past. Please reach out to your instructor if you are using WSL and encounter difficulties.
Some basic instructions for configuring your computer for orthologr are here:
Note that you can skip all software except the three mentioned above.
Blast should have precompiled executables available for your operating system. In other words, you do not need to compile these softwares from source.
KaKsCalculator can be downloaded from here and must be compiled from source:
Please download version with description "KaKsCalculator Version Command Line for LinuxMac
Instructions for how to install can be found at the drostlab github page listed above.
Your instructor recommends placing all binaries executable files in your usrlocalbin directory. This is the standard location for thirdparty softwares on UnixLinuxbased operating systems and will help you avoid issues when running orthologr. For Blast you only need to copy makeblastdb and blastp executables to usrlocalbin the others are not necessary for your assignment
Install 'orthologr'. See instructions at the drostlab github page above.
Install 'biomartr'. See instructions here:
##### Tasks
The commands you need to execute are listed at the drostlab github page in the section
"Example: Computing dNdS values for all orthologous genes between two genomes":
Instead of comparing whole mouse and human genes, we will instead compare SARS and Covid and need to modify the code for this purpose.
Please modify the biomart::getCDS commands to download organism "GCF for Covid and "GCF for SARS and define sensible variable names.
After running the biomart code, it is common to see the error "The download session seems to have timed out at the FTP site This can appear even when the files downloaded correctly. To be sure, you can always compare the md hash using the mdsum Linux or mdMacOs commands from your terminal and confirm the hashes match those in the downloaded file mdchecksums.txt which contains the hashes of the files on the NCBI ftp site.
Modify the dNdS command queryfile argument to specify the Covid cds file from the biomart::getCDS step.
Modify the dNdS command queryfile argument to specify the Covid cds file from the biomart::getCDS step.
Modify the dNdS command subjectfile argument to specify the SARS cds file.
You can use the pairwise alignment approach with NWNeedlemanWunsch algorithm, otherwise you will need to install additional standalone alignment software. This NeedlemanWunsch approach produces very similar results as aligning with Clustal Omega and is fine for this assignment.
When you you are ready, you can run the code
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started