For this assignment, you will be simulating the manipulation the nucleotides in strands of DNA You do not need to know much about DNA to do this assignment, but if you would like to learn more about the process, this webpage has more information Background A nucleotide base is a fundamental unit that creates a genetic code found in a DNA strand (RNA will be saved for a future project) These these bases are represented by four letters which are A for adenine C for cytosine G for guanine T for thymine A chain of these nucleotide bases creates a Nucleic Acid which contains information used by living cells to construct proteins, which in turn contains the information that living organism need to survive and reproduce The representation of these chains can be quite long An example of such chains can be found here As part of genetic engineering, these chains are cut, and a sequence from another organism is inserted, or perhaps a section of the chain is excised, or removed from the larger chain The program you write will simulate the splicing and chains of nucleotide bases Since there are four different letters, each letter can be represented by two bits A 00 C 01 G 10 T 11 So, for example, the nucleotide sequence GATTACA can be represented by the number sequence 2033010 (or decimal 9156 or binary 10 00 11 11 00 01 00) You will be using bytes (unsigned char) to hold the nucleotide bases Each byte can hold up to four letters You will use bitwise operators (shifts, bitwise , , , ) to manipulate the bits in a byte to add, move, or change nucleotide letters Specifications Your program will use the following structure to hold a nucleotide chain define MAX CHAIN BYTES 100 typedef struct Chain size t SeqLen Number of letters in sequence unsigned char Sequence MAX CHAIN BYTES Chain With this structure, a sequence can hold up to 400 different letters of a DNA sequence The representation (and order) of the letters will be in the smallest to largest byte in the Sequence array However, within a single byte, the letters will be ordered from Most Significant Bit (MSB the leftmost bit) to Least Significant Bit (LSB the rightmost bit) In other words, if a sequence contains just two letters CT, then the representation of this sequence will be found in the first byte of the Sequence array (Sequence 0 ), and will have the bit pattern 0111xxxx, where the x's can be either 0 or 1 since they will not be part of the overall sequence (Note Even though the unused bits can be either 0 or 1, you may find it advantageous to set them to 0 ) You are to write a menu driven program to alter (splice, excise, or replace) sequences The menu will contain the following options Read a DNA sequence from a file Save the current sequence to a file Print the current sequence Splice and insert a sub sequence Remove a sub sequence Replace a sub sequence with another sub sequence Exit the program User input to the menu will consist of the numbers 1 7 Do not use any additional or different input values Descriptions of the functionality of each input is as follows When this option is selected, prompt the user to enter a filename containing a DNA sequence This file will be in binary format, and contain the data for a single Chain structure If the file can be opened successfully, the data in the file is used to initialize a variable of type Chain Otherwise, an error message is written to the screen, and the program returns to the main menu Note This option must be chosen before choosing options 2 6 You should check that a file was successfully read before allowing options 2 6 to be executed When this option is selected, prompt the user to enter a filename in which to save the data in the Chain struct containing the current sequence The format of the file should be the same as described for option 1 (a single Chain structure in binary format) If the output file cannot be opened successfully, an error message is written to the screen, and the program returns to the main menu When this option is selected, the sequence of letters is printed to the screen When this option is selected, your program will prompt the user for two things The sub sequence to insert This will be a string of Nucleotide letters, and will be read from the console (not a file) (Note Use strlen() to find the length of the input string, and DON'T forget to remove the trailing ' ') The place to insert the sequence The sub sequence will be placed after ALL instances of the given place to insert For example Suppose the current sequence is CATAGGTACCAGGTACA The sequence to insert is ACATGA The place to insert is GGT Your program will search for all instances of GGT (shown in bold ) in the current sequence CATA GGT ACCA GGT ACA and insert the sub sequence (shown in lower case for this example) after each of those instances CATA GGT acatgaACCA GGT acatgaACA So, the result after this insertion will be CATAGGTACATGAACCAGGTACATGAACA Note If the sub sequence happens to contain the subsequence after which to insert, it will NOT be included in the insertion (What can happen if it is included as the place to insert ) When this option is selected, your program will prompt the user for a sub sequence This sub sequence will be entered from the console (not a file) Your program will then search for the given sub sequence throughout the entire current sequence and remove ALL instances of the given sub sequence For example, if the current sequence is CATA GGTA CATGAACCA GGTA CATGAACA and the entered sub sequence is GGTA the resulting sequence is CATACATGAACCACATGAACA When this option is selected, your program will prompt the user for a sub sequence to remove This sub sequence will be entered from the console (not a file) The program will then prompt the user for a sub sequence to replace the removed sub sequence This sub sequence will also be entered from the console Note that the two sequences do not necessarily have to be the same length Your program will then search for the first sub sequence throughout the entire current sequence and replace ALL instances of this sub sequence with the second sub sequence For example, if the current sequence is CATA GGTA CATGAACCA GGTA CATGAACA the sub sequence to remove is GGTA and the replacement sub sequence is AACGTGA the resulting sequence is CATA AACGTGA CATGAACCA AACGTGA CATGAACA If this option is selected, print an appropriate closing message, and exit the program As in past assignments, part of the grading of your code will be performed by a script (sample scripts will be provided), and any additional or missing prompts will cause your program to fail to run correctly Other Specifications and Additional Information You must include a Makefile to compile your project Portions of the project will be graded using a script (a sample will be provided) It is important that your program works with any sample scripts Otherwise, your overall score for the project may be lowered considerably If the size of the original sequence plus a modification is greater than the maximum length of a sequence, truncate the end of the sequence so that it is no longer than 4 MAX CHAIN BYTES For options 4, 5, and 6, if the entered sub sequence (for replacement, removal, or insertion after) cannot be found in the current sequence, the sequence should not be modified, and an informational message (such as sub sequence not found ) should be printed to the screen If an input sequence that is read from the console contains any characters other than A, C, G, or T, print an error message and re prompt for a new sequence until a correct sequence is entered Data in the binary files is assumed to be correct (by nature of the previous two bullets) As always, the use of global variables and variable length arrays are forbidden All source and tarfiles should follow the standard CS262 naming conventions and contain appropriate comments Some Helpful Hints Although the data must be read and written from to the files in binary format, you can perform most of the other operations using cstrings However, you will have to convert the data from or to binary when reading or writing the files You may find the following String Library functions useful (check man pages for proper usage) strstr() strcpy() strlen() strcat() You can index within portions of a string by using the (Address of) operator For example, to remove the 7th and 8th character from a string named str using strcpy, you can use the following function call strcpy( str 6 , str 8 ) Strategic use of the NULL character (' 0') after copying sequences to temporary cstrings may help with insertion and deletion of sub sequences chain1 dat https www dropbox com s w9n2rz790e171al chain1 dat dl 0 chain2 dat https www dropbox com s i7qf303vzgut2l3 chain2 dat dl 0

The Answer is in the image, click to view ...

Question: For this assignment, you will be simulating the manipulation the nucleotides in strands of DNA. You do not need to know much about DNA to

For this assignment, you will be simulating the manipulation the nucleotides in strands of DNA. You do not need to know much about DNA to do this assignment, but if you would like to learn more about the process, this webpage has more information.

Background

A nucleotide base is a fundamental unit that creates a genetic code found in a DNA strand (RNA will be saved for a future project). These these bases are represented by four letters which are:

A for adenine
C for cytosine
G for guanine
T for thymine

A chain of these nucleotide bases creates a Nucleic Acid which contains information used by living cells to construct proteins, which in turn contains the information that living organism need to survive and reproduce. The representation of these chains can be quite long. An example of such chains can be found here. As part of genetic engineering, these chains are cut, and a sequence from another organism is inserted, or perhaps a section of the chain is excised, or removed from the larger chain.

The program you write will simulate the splicing and chains of nucleotide bases. Since there are four different letters, each letter can be represented by two bits:

A 00
C 01
G 10
T 11

So, for example, the nucleotide sequence GATTACA can be represented by the number sequence 2033010 (or decimal 9156 or binary 10 00 11 11 00 01 00).

You will be using bytes (unsigned char) to hold the nucleotide bases. Each byte can hold up to four letters. You will use bitwise operators (shifts, bitwise &, |, ^, ~) to manipulate the bits in a byte to add, move, or change nucleotide letters.

Specifications

Your program will use the following structure to hold a nucleotide chain:

#define MAX_CHAIN_BYTES 100 typedef struct _Chain { size_t SeqLen; // Number of letters in sequence unsigned char Sequence[MAX_CHAIN_BYTES]; } Chain;

With this structure, a sequence can hold up to 400 different letters of a DNA sequence. The representation (and order) of the letters will be in the smallest to largest byte in the Sequence array. However, within a single byte, the letters will be ordered from Most Significant Bit (MSB - the leftmost bit) to Least Significant Bit (LSB - the rightmost bit). In other words, if a sequence contains just two letters - CT, then the representation of this sequence will be found in the first byte of the Sequence array (Sequence[0]), and will have the bit pattern 0111xxxx, where the x's can be either 0 or 1 since they will not be part of the overall sequence. (Note: Even though the unused bits can be either 0 or 1, you may find it advantageous to set them to 0.)

You are to write a menu driven program to alter (splice, excise, or replace) sequences. The menu will contain the following options:

Read a DNA sequence from a file
Save the current sequence to a file
Print the current sequence
Splice and insert a sub-sequence
Remove a sub-sequence
Replace a sub-sequence with another sub-sequence
Exit the program

User input to the menu will consist of the numbers 1-7. Do not use any additional or different input values. Descriptions of the functionality of each input is as follows:

When this option is selected, prompt the user to enter a filename containing a DNA sequence. This file will be in binary format, and contain the data for a single Chain structure. If the file can be opened successfully, the data in the file is used to initialize a variable of type Chain. Otherwise, an error message is written to the screen, and the program returns to the main menu. Note: This option must be chosen before choosing options 2-6. You should check that a file was successfully read before allowing options 2-6 to be executed.
When this option is selected, prompt the user to enter a filename in which to save the data in the Chain struct containing the current sequence. The format of the file should be the same as described for option 1 (a single Chain structure in binary format). If the output file cannot be opened successfully, an error message is written to the screen, and the program returns to the main menu.
When this option is selected, the sequence of letters is printed to the screen.
When this option is selected, your program will prompt the user for two things:
1. The sub-sequence to insert - This will be a string of Nucleotide letters, and will be read from the console (not a file). (Note: Use strlen() to find the length of the input string, and DON'T forget to remove the trailing ' ').
2. The place to insert the sequence

The sub-sequence will be placed after ALL instances of the given place to insert. For example:

Suppose the current sequence is: CATAGGTACCAGGTACA

The sequence to insert is: ACATGA The place to insert is: GGT

Your program will search for all instances of GGT (shown in bold) in the current sequence

CATAGGTACCAGGTACA

and insert the sub-sequence (shown in lower case for this example) after each of those instances:

CATAGGTacatgaACCAGGTacatgaACA

So, the result after this insertion will be:

CATAGGTACATGAACCAGGTACATGAACA

Note: If the sub-sequence happens to contain the subsequence after which to insert, it will NOT be included in the insertion (What can happen if it is included as the place to insert?)

When this option is selected, your program will prompt the user for a sub-sequence. This sub-sequence will be entered from the console (not a file). Your program will then search for the given sub-sequence throughout the entire current sequence and remove ALL instances of the given sub-sequence. For example, if the current sequence is:

CATAGGTACATGAACCAGGTACATGAACA

and the entered sub-sequence is:

GGTA

the resulting sequence is:

CATACATGAACCACATGAACA

When this option is selected, your program will prompt the user for a sub-sequence to remove. This sub-sequence will be entered from the console (not a file). The program will then prompt the user for a sub-sequence to replace the removed sub-sequence. This sub-sequence will also be entered from the console. Note that the two sequences do not necessarily have to be the same length. Your program will then search for the first sub-sequence throughout the entire current sequence and replace ALL instances of this sub-sequence with the second sub-sequence. For example, if the current sequence is:

CATAGGTACATGAACCAGGTACATGAACA the sub-sequence to remove is:

GGTA

and the replacement sub-sequence is:

AACGTGA

the resulting sequence is: CATAAACGTGACATGAACCAAACGTGACATGAACA

If this option is selected, print an appropriate closing message, and exit the program.

As in past assignments, part of the grading of your code will be performed by a script (sample scripts will be provided), and any additional or missing prompts will cause your program to fail to run correctly.

Other Specifications and Additional Information

You must include a Makefile to compile your project.
Portions of the project will be graded using a script (a sample will be provided). It is important that your program works with any sample scripts. Otherwise, your overall score for the project may be lowered considerably.
If the size of the original sequence plus a modification is greater than the maximum length of a sequence, truncate the end of the sequence so that it is no longer than 4 * MAX_CHAIN_BYTES.
For options 4, 5, and 6, if the entered sub-sequence (for replacement, removal, or insertion after) cannot be found in the current sequence, the sequence should not be modified, and an informational message (such as "sub-sequence not found") should be printed to the screen.
If an input sequence that is read from the console contains any characters other than A, C, G, or T, print an error message and re-prompt for a new sequence until a correct sequence is entered.
Data in the binary files is assumed to be correct (by nature of the previous two bullets).
As always, the use of global variables and variable length arrays are forbidden.
All source and tarfiles should follow the standard CS262 naming conventions and contain appropriate comments.

Some Helpful Hints:

Although the data must be read and written from/to the files in binary format, you can perform most of the other operations using cstrings. However, you will have to convert the data from or to binary when reading or writing the files.
You may find the following String Library functions useful (check man pages for proper usage):
- strstr()
- strcpy()
- strlen()
- strcat()
You can index within portions of a string by using the & (Address-of) operator. For example, to remove the 7th and 8th character from a string named str using strcpy, you can use the following function call:

strcpy(&str[6], &str[8]);

Strategic use of the NULL character ('\0') after copying sequences to temporary cstrings may help with insertion and deletion of sub-sequences.

chain1.dat

https://www.dropbox.com/s/w9n2rz790e171al/chain1.dat?dl=0

chain2.dat

https://www.dropbox.com/s/i7qf303vzgut2l3/chain2.dat?dl=0

Step by Step Solution

There are 3 Steps involved in it

1 Expert Approved Answer

Step: 1 Unlock blur-text-image

Question Has Been Solved by an Expert!

Get step-by-step solutions from verified subject matter experts

Step: 2 Unlock

Step: 3 Unlock

Students Have Also Explored These Related Databases Questions!

Introduction For this assignment, you will be simulating the manipulation the nucleotides in strands of DNA. You do not need to know much about DNA to do this assignment, but if you would like to...

Urgent! Need help ASAP with this C Programming Assignment For this assignment, you will be simulating the manipulation the nucleotides in strands of DNA. You do not need to know much about DNA to do...

PLEASE HELP! Calculate the frequency of the following partial profiles for the African American ethnic group USB the values listed in the chart (Forensic Science Little Helper). There is an example...

I have included the code for the .h and unit tester .cpp. C++ language is preferred. //Bring in unit testing code and tell it to build a main function #define DOCTEST_CONFIG_IMPLEMENT_WITH_MAIN...

This assignment has a DNA_Strand.h. file that contains the declaration of a set of functions to manipulate static arrays representing DNA. There is another initial test program: DNAtest.cpp . The...

DNA Strand DNA, or deoxyribonucleic acid, is the primary carrier of genetic information in most organisms. The information in DNA is represented using a string of nucleotides. There are four kinds of...

Summary Using your knowledge of loops and String variables, you will create a program that calculate and create a report of some basic statistics and genomic quantities on DNA base-pair sequences....

please try to answer parts 3,7&8 especially using python 2.7. my code is not working and this is due today. verify using working test codes. Basically I am stuck with the longestORF non reading (part...

Briefly explain the following? i. Postprocessing (Differential Correction) ii. Precision iii. Pseudorandom Noise or Number (PRN) iv. Radio Technical Commission for Maritime Services (RTCM) v. Real...

A simply supported beam, 10 ft long, consists of three 4-in. by 6-in. planks that are secured by bolts spaced 12 in. apart. The bolts are tightened to a tensile stress of 18 ksi. The beam carries a...

Saved Dowell Company produces a single product. Its income statements under absorption costing for its first two years of operation follow. \ table [ [ Income Statements ( Absorption Costing ) ,...

Seved Help 14 Wisconsin Snowmobile Corp. is considering a switch to level production Cost efficiencies would occur under level production, and aftertax costs would decline by $31,500, but inventory...

Provide examples of KPIs in Human Capital Management.

What would an Internal Compa-Ratio of 118% indicate for an employee?

What are OLAP Cubes?