Question
Code in c++ language. Write code to read, store, and analyze the latest human genome assembly (found at: /common/contrib/classroom/inf503/genomes/human.txt ). At minimum, your code must
Code in c++ language.
Write code to read, store, and analyze the latest human genome assembly (found at: /common/contrib/classroom/inf503/genomes/human.txt ). At minimum, your code must contain (10pts):
A character array to store the entire human genome in a single data structure
A separate function to read the human genome file
A function to compute the number of A, C, G, or T characters in the human genome
Comments describing major code blocks and control structures
(20pts) Read in and store the human genome. There will be multiple scaffolds (each with a separate header denoted by >). Concatenate the entire genome (discard headers) into a single character array data structure. Collect the following statistics (see below) as you are reading the file. Hint: you can keep running totals or store scaffold sizes / names in a separate sets of arrays
How many scaffolds were there?
What was the longest and shortest scaffold? Provide names of scaffolds and lengths.
What was the average scaffold length?
(20pts) Write a function to assess the content of the human genome count the total number of a given character (A, C, G, or T) in the whole genome.
What is the big O notation of your search (linear / quadratic / cubic / etc)?
How long does it take (in seconds) to execute this function? Hint: You will need to use
system time within your code to get accurate time estimates.
What was the GC content of the human genome (percent of Cs and Gs in the genome)?
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started