Question
create a Java programming language with the following information? In a FASTA format DNA sequence file, a sequence record starts with a header line beginning
create a Java programming language with the following information?
In a FASTA format DNA sequence file, a sequence record starts with a header line beginning with a ">" sign, and followed by a sequence identifier (such as GenBank accession number) and a description about the sequence. Develop a Java program to read in a sequence file. This is the sequence file:
>by21f03.y1|BF727444
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAATTCACCCC
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTGGAATTCCAGCCCAACTACTCGGGCCTCCA
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGGAATTCAGGACCGCT
>by05e12.y1|BF726365
CCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCTACGAGGACCG
GGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCACCCCAACCTGC
AGCCCTACTTGAGGAATTCGAACTCGGCGCGCGTGGACAGCGGCTGCTGG
ATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACTTCCTGCGCCG
CGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGCGACTCGGTCC
GCTCCTGCCGCCTC
>by09f05.y1|BF726635
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCC
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCGGAATTCCTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGT
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGGAATTCCCCTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGTCTTCAGGACCGCTTCCGCTTCAATGAAATCCAC
TCCCTCAACGTGCTGGAGGGCTCCTGGGTCCTCTACGAGCTGTCCAACTA
CCGAGGACGGCAGTACCTG
>by14f12.y1|BF726960
CAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCT
ACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCAC
CCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACT
TCCTGCGCCGCGGCGACTATGGAATTCGGCAGCAGTGGATGGGCCTCAGC
GACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAGGAT
CAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCACTG
AGGAC
>by20g06.y1|BF727389
CAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCT
ACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCAC
CCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGACAG
CGGCTGCTGGATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACT
TCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGC
GACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAGGAT
CAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCACTG
AGGACTGCTCCTGTC
>by18g06.y1|BF727241
CGCGAGCCTCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCA
GCAGCGACCACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCG
CGCGTGGACAGCGGCTGC
and find out how may sequences are in the file (count the number of the header line). The program should prompt the user for the sequence file name, and then print a message to state how many sequences are contained in the file, such as:
Enter the name of the sequence file: seq.fasta
File seq.fasta contains 6 sequences
In the above In-Class exercise, you need to read through the whole file to determine the number of head lines. So you can separate the actual sequence from the head line for each sequence record. Please modify the above program to search through the sequence of each record for any restriction site. Underline the restriction sites with "*"s. See a sample output below:
Enter the name of the sequence file: seq.fasta
Enter the sequence of a restriction site: GAATTC
>by21f03.y1|BF727444
CACCAGCTCAGCACCGCCGTGCGCCCAGCCAGCCATGGGGAATTCACCCC
******
TCTACGAGGACCGGGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGAC
CACCCCAACCTGCAGCCCTACTTGAGCCGCTGCAACTCGGCGCGCGTGGA
CAGCGGCTGCTGGATGCTCTGGAATTCCAGCCCAACTACTCGGGCCTCCA
******
ACTTCCTGCGCCGCGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTC
AGCGACTCGGTCCGCTCCTGCCGCCTCATCCCCCACTCTGGCTCTCACAG
GATCAGACTCTATGAGAGGGAGGACTACAGAGGCCAGATGATAGAGTTCA
CTGAGGACTGCTCCTGGAATTCAGGACCGCT
******
>by05e12.y1|BF726365
CCGCCGTGCGCCCAGCCAGCCATGGGGAAGATCACCCTCTACGAGGACCG
GGGCTTCCAGGGCCGCCACTACGAATGCAGCAGCGACCACCCCAACCTGC
AGCCCTACTTGAGGAATTCGAACTCGGCGCGCGTGGACAGCGGCTGCTGG
******
ATGCTCTATGAGCAGCCCAACTACTCGGGCCTCCAGTACTTCCTGCGCCG
CGGCGACTATGCCGACCACCAGCAGTGGATGGGCCTCAGCGACTCGGTCC
GCTCCTGCCGCCTC
Step by Step Solution
3.43 Rating (159 Votes )
There are 3 Steps involved in it
Step: 1
Sure here is a Java program to read in a FASTA format DNA sequence file find out how many sequences are in the file and search through the sequence of ...Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started