[Solved] The below question references: Compose a

Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 21, 2024

The below question references: Compose a Perl script shown below (dump-coords.pl in our files), which uses an array (@genes) to hold information for two genes

The below question references:

Compose a Perl script shown below ("dump-coords.pl" in our files), which uses an array (@genes) to hold information for two genes in the ../data/mys.coord2 file. Each element of the array would be a reference to a hash. The hash reference itself would hold gene information in the following format: e.g., $gene={'id'=>'0001', 'start'=>16, 'end'=>324, 'frame'=>'+1', 'score'=>0.929}. Print the array by using the Dumper function.

#!/usr/bin/perl use strict; use warnings; use Data::Dumper;

# Input : A coord file # Output : Coordinates and read frame for each gene # ---------------------------------------- my @genes; # declare the array while(<>) { # this means that as long as lines come from the pipe we keep going my $line = $_; # a line that come from the pipe (we go line by line) next unless $line =~ /^\d+/; # skip lines except those reporting genes # #Split the line and store the contents in an array my @words = split(' ', $line);

# #Constructing an anonymous hash based on the values from the array my $anonymoushash = { id => $words[0], start => $words[1], end => $words[2], frame => $words[3], score => $words[4], }; #Pushing the array into @genes push(@genes, $anonymoushash);

} print Dumper(\@genes); exit;

___________________________________

1.Write an "assign.pl" using Perl & BioPerl that extract coding sequences from "coord" files. The script will use Bio::SeqIO to read the genome FASTA file ("file1.fas"). It will also read the "file2.coord" as the 2nd argument. Use Bio::Seq to obtain coding sequences and translate sequences. Your output of protein sequences should not contain stop codons except as the last codon. The following template helps you get started:

where:

file1.fas is

TTTAAAACTTTTCTATTGGATAGATTTTATACAAAGAAGGTAATAATGTATAAACAACAA

TATTTTATTTCTGGCAAGGTGCAAGGTGTTGGTTTTAGATTTTTCACAGAGCAAATAGCA

AATAATATGAAACTAAAAGGATTTGTAAAAAATCTCAACGATGGAAGGGTAGAAATTGTA

GCTTTCTTTAATACTAAAGAACAAATGAAAAAATTTGAAAAATTATTAAATGGGAATAAG

TATTCAAACATTAAAAACATTGAAAAAATAGTTTTAGATGAAAATTATCCTTTTCAATTT

AATGATTTTAAAATTTATTATTAGGGCTTGCCTCTCGTTTAACAAGTACCTTAACCTTAT

TTTTTGGTTTAATATTGTGCACATAAGAATTGTTATTCTTATAGCAAGACACACTAGTTC

TAAAAAAATGTTCGACTTTAAATTTCAAAAACTCTAAAGACTTTCTGTTTCTACAAAAAA

TATTCAAATTGCCATCAGAGAATTTAATATCAAGCCCACAATTACAACTTGTTTTTAAAA

AACTGCACGAAATAAAACTACCATTTGAACTATTCCTATTAAAAAGAATAAAATATTTTT

TACCTCTTTTGGAGTCTTTAAAATAAATTCTAAGCCATTCTCCTGCTGGCTTAACTGAAG

CCAATCTATCATAAAGATAAGAAATGTAAATCGTTCTATTGTCATTAGATTTTAAAATTT

TTTTAAAAAAATAATGAGGACCTATTTTCATACAAATTATATAATATCTTATTAATAAAA

TATTTCCTAATATTTCCCAAATATATTGATAATGCCTGAATTTAAAAAAACAAAAACTAT

and

file2.coord is

Sequence file = mystery_seq1.fas

Excluded regions file = none

Circular genome = true

Initial minimum gene length = 90 bp

Determine optimal min gene length to maximize number of genes

Maximum overlap bases = 30

Start codons = atg,gtg,ttg

Stop codons = taa,tag,tga

Sequence length = 840

Final minimum gene length = 232

Putative Genes:

00001 16 324 +1 0.929

00002 751 308 -2 0.911

______________________________________

#!/usr/bin/perl use strict; use warnings; use lib '/data/biocs/b/bio425/bioperl-live'; use Bio::SeqIO; # Input : A FASTA file with 1 DNA seq and coord file from LONG-ORF # Output : A FASTA file with translated protein sequences # ---------------------------------------- die "Usage: $0 " unless @ARGV > 0; my ($fasta_file, $coord_file) = @ARGV; my $fasta_input = Bio::SeqIO->new(); # create a file handle to read sequences my $output = Bio::SeqIO->new(-file=>">$fasta_file".".out", -format=>'fasta'); # create a file handle to output sequences into a file my $seq_obj = $input->next_seq(); # get sequence object from FASTA file # Read COORD file & extract sequences open COORD, "<" . $coord_file; while () { my $line = $_; chomp $line; next unless $line =~ /^\d+/; # skip lines except my ($seq_id, $cor1, $cor2, $strand, $score) = split /\s+/, $line; # split line on white spaces if () { } else { } $output->write_seq($pro_obj); } close COORD; exit;