Question
The below question references: Compose a Perl script shown below (dump-coords.pl in our files), which uses an array (@genes) to hold information for two genes
The below question references:
Compose a Perl script shown below ("dump-coords.pl" in our files), which uses an array (@genes) to hold information for two genes in the ../data/mys.coord2 file. Each element of the array would be a reference to a hash. The hash reference itself would hold gene information in the following format: e.g., $gene={'id'=>'0001', 'start'=>16, 'end'=>324, 'frame'=>'+1', 'score'=>0.929}. Print the array by using the Dumper function.
#!/usr/bin/perl use strict; use warnings; use Data::Dumper;
# Input : A coord file # Output : Coordinates and read frame for each gene # ---------------------------------------- my @genes; # declare the array while(<>) { # this means that as long as lines come from the pipe we keep going my $line = $_; # a line that come from the pipe (we go line by line) next unless $line =~ /^\d+/; # skip lines except those reporting genes # #Split the line and store the contents in an array my @words = split(' ', $line);
# #Constructing an anonymous hash based on the values from the array my $anonymoushash = { id => $words[0], start => $words[1], end => $words[2], frame => $words[3], score => $words[4], }; #Pushing the array into @genes push(@genes, $anonymoushash);
} print Dumper(\@genes); exit;
___________________________________
1.Write an "assign.pl" using Perl & BioPerl that extract coding sequences from "coord" files. The script will use Bio::SeqIO to read the genome FASTA file ("file1.fas"). It will also read the "file2.coord" as the 2nd argument. Use Bio::Seq to obtain coding sequences and translate sequences. Your output of protein sequences should not contain stop codons except as the last codon. The following template helps you get started:
where:
file1.fas is
TTTAAAACTTTTCTATTGGATAGATTTTATACAAAGAAGGTAATAATGTATAAACAACAA
TATTTTATTTCTGGCAAGGTGCAAGGTGTTGGTTTTAGATTTTTCACAGAGCAAATAGCA
AATAATATGAAACTAAAAGGATTTGTAAAAAATCTCAACGATGGAAGGGTAGAAATTGTA
GCTTTCTTTAATACTAAAGAACAAATGAAAAAATTTGAAAAATTATTAAATGGGAATAAG
TATTCAAACATTAAAAACATTGAAAAAATAGTTTTAGATGAAAATTATCCTTTTCAATTT
AATGATTTTAAAATTTATTATTAGGGCTTGCCTCTCGTTTAACAAGTACCTTAACCTTAT
TTTTTGGTTTAATATTGTGCACATAAGAATTGTTATTCTTATAGCAAGACACACTAGTTC
TAAAAAAATGTTCGACTTTAAATTTCAAAAACTCTAAAGACTTTCTGTTTCTACAAAAAA
TATTCAAATTGCCATCAGAGAATTTAATATCAAGCCCACAATTACAACTTGTTTTTAAAA
AACTGCACGAAATAAAACTACCATTTGAACTATTCCTATTAAAAAGAATAAAATATTTTT
TACCTCTTTTGGAGTCTTTAAAATAAATTCTAAGCCATTCTCCTGCTGGCTTAACTGAAG
CCAATCTATCATAAAGATAAGAAATGTAAATCGTTCTATTGTCATTAGATTTTAAAATTT
TTTTAAAAAAATAATGAGGACCTATTTTCATACAAATTATATAATATCTTATTAATAAAA
TATTTCCTAATATTTCCCAAATATATTGATAATGCCTGAATTTAAAAAAACAAAAACTAT
and
file2.coord is
Sequence file = mystery_seq1.fas
Excluded regions file = none
Circular genome = true
Initial minimum gene length = 90 bp
Determine optimal min gene length to maximize number of genes
Maximum overlap bases = 30
Start codons = atg,gtg,ttg
Stop codons = taa,tag,tga
Sequence length = 840
Final minimum gene length = 232
Putative Genes:
00001 16 324 +1 0.929
00002 751 308 -2 0.911
______________________________________
#!/usr/bin/perl use strict; use warnings; use lib '/data/biocs/b/bio425/bioperl-live'; use Bio::SeqIO; # Input : A FASTA file with 1 DNA seq and coord file from LONG-ORF # Output : A FASTA file with translated protein sequences # ---------------------------------------- die "Usage: $0 " unless @ARGV > 0; my ($fasta_file, $coord_file) = @ARGV; my $fasta_input = Bio::SeqIO->new(); # create a file handle to read sequences my $output = Bio::SeqIO->new(-file=>">$fasta_file".".out", -format=>'fasta'); # create a file handle to output sequences into a file my $seq_obj = $input->next_seq(); # get sequence object from FASTA file # Read COORD file & extract sequences open COORD, "<" . $coord_file; while () { my $line = $_; chomp $line; next unless $line =~ /^\d+/; # skip lines except my ($seq_id, $cor1, $cor2, $strand, $score) = split /\s+/, $line; # split line on white spaces if () { } else { } $output->write_seq($pro_obj); } close COORD; exit;
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started