Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Please help! I am trying to find a potential gene by start and stop codons which I have already found. The start must start with

Please help! I am trying to find a potential gene by start and stop codons which I have already found. The start must start with ATG and can end with TAG, TAA, TGA. Now I just need to somehow combine the start codon, substring, and stop codons all in one string. My code so far.

int main()

{

string fileName;

cout

cin >> fileName;

fstream fileStream(fileName, ios::in);

if (fileStream.fail())

{

cout

exit(1);

}

cout

char dummyChar = '>';

fileStream >> dummyChar;

string dummyString;

getline(fileStream, dummyString);

cout

cout

string genome,fasta;

int length;

float cTotal, gTotal, gcTotal, gcPercent;

while (!fileStream.eof())

{

getline(fileStream,genome); // Saves the line in genome.

fasta += genome;//appending each line to string fasta

}

length = fasta.length();

cTotal = countGenes(fasta, 'C'); // Acquires total amount of "C" genes

gTotal = countGenes(fasta, 'G'); // Acquires total amount of "G" genes

gcTotal = (gTotal + cTotal); // Total amount of both "C" and "G" genes

gcPercent = ((gcTotal / fasta.size()) * 100); // Total percentage of both "C" and "G" genes out of all genes

cout

vector startCodonPositions;

startCodon(fasta, startCodonPositions, "ATG");

cout

vector stopCodonPositions;

startCodon(fasta, stopCodonPositions, "TAG");

startCodon(fasta, stopCodonPositions, "TGA");

startCodon(fasta, stopCodonPositions, "TAA");

cout

return 0;

}

My gene.cpp

int countGenes(string str, char gene)

{

int count = 0, length = 7374; // Counter and string length

for (unsigned i = 0; i

{

if (str[i] == gene)

count++;

}

// See how many times the character repeats

int geneRepetition = length / str.size();

count = (count * geneRepetition);

/* If the length is not a multiple of the string size

check for the remaining repeating characters. */

for (unsigned i = 0; i

{

if (str[i] == gene)

count++;

}

return count;

}

void startCodon(string str, vector& vec, string codon)

{

size_t start = 0;

bool search = true;

while (search)

{

start = str.find(codon, start);

if (start == string::npos)

break;

vec.push_back(start);

start++;

}

}

Output:

image text in transcribed

How to find a potential gene

image text in transcribed

Part of the csv file:

>NC_003988.1 Simian enterovirus A, complete genome GAGTGTTCCCACCCAACAGGCCCACTGGGTGTTGTACTCTGGTATTACGGTACCTTTGTACGCCTATTTT ATTTCCCCCCCCTTTTTGAAACTTAGAAGTTAATAATAAACACGCTCACTAGGTGCACTACATCCAGTAG TGTAATGAGCAAGCACTTCTGTCTYCCCCGGGAGGGATATATGGTACGCTGTGCAAACGGCGGAAATTAA TCCTACCGTTAACCGCCCACCTACTCCGAGAAGCCTAGTACCTAATTGGATTTATCAATGGAGTTGCGCT CAGCAGGTGACCCTGACCTGCCAGCTCCGGCTGATGGACCTGGGCTTTCCCCACAGGCGACTGTGGCCCA GGTCGCGTGGCGGCCGGCCCACCCCCCTGGGTGGGACGCCTTGATAATGACAAGGTGGGAAGAGCCTATT GGGCTAGCTGGTTTCCTCCGGCCTCCTGAATGCGGCTAACCTTAACCCCAGAGCATATGGTAGCAACCCA GCTACTAGTATGTCATAATGCGTAAGTCTGGGATGGGACCGACTACTTTGGAGAGTCCGTGTTTCTATTG TTTCTTTAATCAATCTTATGGTGACAATTTATAGTGCCCTGAGTATTGATTGGTTGTTGCTTTTGACAAT TATTGAGACATCACATAGACATAATGGGAGCTCAAGTAAGCAGGCAAACGTCTGGTGCGCATGACACCCG GATACGGGCTGAACAGGGCGCAAACATTCATTATACTAATATCAATTATTATAGAGATGCAGCTAGCAAT GCAGCAAGCAAAATGGACTATTCCCAGGATCCGGACAAGTTCACGAAACCAGTACTTGATGCTATAACTG AACCATTACCCACGCTGAAGTCCCCTAGTGCTGAGGCATGTGGGTACAGCGACCGAGTTGCACAACTGAC AATTGGCAATTCCACTATCACTACTCAGGAAGCCGCCAATGTGGTGGTCGCATATGGACAATGGCCTGAA TATTTAGATTCGAAGGATGCAACTGCCGTGGATAAGCCCACACAGCCCGATGTAGCCTCAAATAGATTTT ACACTCTTAAGACAGTGTCTTGGGAGAAGAGTTCAACTGGCTGGTATTGGAAATTCTCGGATTGTCTGGC TTCTGTTGGATTATTTGGACAGAATGTACAGTATCATTATTTAGGCCGTTATGGGTTAGCGGTTCATGTG CAATGTAATGCTTCAAAATTTCATCAGGGCACTCTACTGGTCTTAGCAATACCAGAATGGGAGATTGGGG TGTCTAATGCTGATAGGGCATCCTTTAATCTAACAAACCCCGATAAGAACGGGCATACTATGACTGGTCA AGAAGCTTATTGCTTACATAATGGGACTAACATCCATTCTTCACTGGTATTTCCACATCAATTCATCAAT CTTAGGACAAACAATTGTGCTACGTTAGTCTTGCCCTATGTGGGAGCAACACCACTGGACACACCGATCA AGCATAATGTTTGGTCATTGGTAGTAATACCGGTGGTCCCGTTGGATTACACCACTGGTGCAACTACACA AGTGCCTATAACAATAACAATGGCTCCAATGGCGTGCGAGTTTAACGGACTGCGCAATGCCATCACCCAA GGGCTGCCAGTACTCAATACACCCGGCTCTGGGCAGTTTGTGACTACAGATAATTTCCAATCACCAAACT TGATTCCAAATTTTGATGTGACACAAGTCTTTAATAGTCCAGGTGAAATTATTAATTTACAGCAGTATGT CCAGATTGAGGGCATTATGGAAATCAATAATGTAGCAAGTGCAAATAATTTGGAGAGAATTCGCATTCCA ATATCAGTCCAGAGTGGAATTGATGAGATGTTATTTGCAATCAACTGCAACCCAGGAACAGCCCAGGAGT TTAGACGCACACCCCTGGGAGATGTGTGTAGGTATTATACACAGTGGTCAGGTAGCATACAAATTACATT TACATTTTGTGGTTCATTTATGACAACAGGAAAATTATTAATTTGCTACACCCCTCCGGGTGGTCGAGTA CCACAAAATAGAGAGGAGGCAATGCTAGGGACTAATGTGATCTGGGATTTTGGTTTACAATCCAGCGTTA CGCTGAACATACCGTGGATAAGTGGAGCCCATTTTAGAAACACTTCTGTTAATGTCGATGGTTTTGATAA CACAGGGTATGTATCTGCTTGGTTTCAAACGAACATGGTAGTTCCTCCCGATGCTCCAACGACTGCTTAT ATATTGGCTTTTACATCAGCCAAGGATGATTTCTCGATGCGCTTGTTGCGGGATACAGCAGAGATTTCGC AAGACGGATTTCTGCAAGGACCAATAGATCAAGCAATAGAAAAAGTAATCACTGATGTAGTGTCTGACAC GCGTGAGTCTAGTAGTGACTTTAGCATTGGGGCTGTTCCAGCATTGAATGCGGTGGAAACTGGAGCCACT TCGCAAGCTAGTGTTGAGTCCACCATTGAGACGCGGGCCGTGCAGAATCGTCATCGCACTTCTGAGATGA GCGTGGAAAGCTTTTTGGGCCGCTCTAGTTTAGTAACTCGCTTTACCATTAATAATGGAGGAACAAATAA TGCCACGAAGTTTCGTAACTGGAAAATAAACTTAAAGGAAGTGGTGCAGCTGCGGCGTAAATTAGAAATG TTTACTTACGTGCGCTTTGATCTTGAGGTGACTATAGTGGCTGTGAATTTGACTGGAAATGGAGGAGTGC GTTACATGTACCAAGCAATGTACTGCCCCCCAGGTGCCCCCCTCCCCACCAATGCTGATCAATATCTGTG GCAATCCTCGACAAATCCCTCCATAATCGGAGCAGTTGGTGAAGTCCCAGGCAGAGTATCAGTGCCTTTT GTGTCAAATGCTAATATGTATGCCACCTTTTATGATGGATATCCATCCTTTGGAAGCATAAATGGACAGG GAAATGGCTCTGATTACGGTGCATTCATACCAAATGATATGGGTACATTGTGTTTCCGATTACTCAATAT CTTTAATAATGGTCCACAAATTCAATTTAGAGTGTTCATGAAACCCAAGCATGTACGAGTATGGTGCCCA

Thank you.

Enter a filename of the genome: nc.fna Reading nc.fna... The file contains the genome of NC_003988.1 Simian enterovirus A, complete genome Loading the genome sequence... The genome sequence has a length of 7374 with a GC content of 42.935% Finding potential genes...Number of start codons: 190 Number of stop codons: 386 Using the string sequence, look for the start codons (start codon = ATG), look for stop codons (TAG, TAA, and TGA), and find all potential genes (defined by start and stop codons) How to find a potential gene: MUST start with ATG and can end with TAG, TAA, or TGA Even though the sub-sequence (or substring) follows the previous rule, it is NOT directly considered a potential gene. They MUST follow the rules below: The potential gene length must be divisible by 3. This is due to the fact the there MUST be an equal number of 3 characters to represent the different amino acids in the potential gene. A potential gene length of 6 is EXCLUDED. This happens when the gene basically only has the start and stop codon. In a bioinformatics research setting, this could be a configuration variable and researchers can easily change the minimum length. GTA IGA CA A LA GA A A Frame 1 STOP Gin Frame 2 GTA GACAAJ ATG A A A Pop TH Mie Guy GTATGACA ATALGA A A Frame 3 Met | STOP Start codon's position ensures that this frame is chosen Enter a filename of the genome: nc.fna Reading nc.fna... The file contains the genome of NC_003988.1 Simian enterovirus A, complete genome Loading the genome sequence... The genome sequence has a length of 7374 with a GC content of 42.935% Finding potential genes...Number of start codons: 190 Number of stop codons: 386 Using the string sequence, look for the start codons (start codon = ATG), look for stop codons (TAG, TAA, and TGA), and find all potential genes (defined by start and stop codons) How to find a potential gene: MUST start with ATG and can end with TAG, TAA, or TGA Even though the sub-sequence (or substring) follows the previous rule, it is NOT directly considered a potential gene. They MUST follow the rules below: The potential gene length must be divisible by 3. This is due to the fact the there MUST be an equal number of 3 characters to represent the different amino acids in the potential gene. A potential gene length of 6 is EXCLUDED. This happens when the gene basically only has the start and stop codon. In a bioinformatics research setting, this could be a configuration variable and researchers can easily change the minimum length. GTA IGA CA A LA GA A A Frame 1 STOP Gin Frame 2 GTA GACAAJ ATG A A A Pop TH Mie Guy GTATGACA ATALGA A A Frame 3 Met | STOP Start codon's position ensures that this frame is chosen

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Climate And Environmental Database Systems

Authors: Michael Lautenschlager ,Manfred Reinke

1st Edition

1461368332, 978-1461368335

More Books

Students also viewed these Databases questions