Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

A FASTA file consists of one line of header, the so-called, definition line (which starts with the > symbol), followed by lines consisting only of

A FASTA file consists of one line of header, the so-called, definition line (which starts with the > symbol), followed by lines consisting only of sequence data (usually 60 characters per sequence except perhaps the last one). A multi-FASTA contains information regarding multiple sequences and it starts with the header line for one sequence followed by the sequence across multiple lines, followed by the header for the second sequence, followed by the second sequence, and so on.You should write a python script which does the following: it should ask the user for the name of an input FASTA file. Lets say the filename they enter is called stuff.fasta. The script should then read the input file. It should then display to the screen information regarding each sequence. For each, it should display the header line, the first 10 characters followed by the the number of amino acids in each protein. This will serve as a useful tool to summarize the contents of a multi-FASTA file.

(Using Biopython)

Here is multiprotein.fasta file content:

>1433G_HUMAN (P61981) 14-3-3 protein gamma (Protein kinase C inhibitor protein 1) (KCIP-1) [Homo sapiens] VDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSNEERNLLSVAYKNVVGARRSSWR VISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQDVLSLLDNYLIKNCSETQYESKV FYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHEISKEHMQPTHPIRLGLALNYSV FYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDDD GGEGNN >ATP8_RAT (P11608) ATP synthase protein 8 (EC 3.6.3.14) (ATPase subunit 8) (A6L) (Chargerin II) [Rattus norvegicus] MPQLDTSTWFITIISSMATLFILFQLKISSQTFPAPPSPKTMATEKTNNPWESKWTKIYL PLSLPPQ >ALR_LISIN (Q92DC9) Alanine racemase (EC 5.1.1.1) MVTGWHRPTWIEIDRAAIRENIKNEQNKLPDKVALWAVVKANAYGHGIIETAKIAKEAGA KGFCVAILDEALALREAGFRNEFILVLGATRKEDANLAAKNNISVTVFREDWLDDLTLEA PLRIHLKVDSGMGRLGIRSREEAQRIETTIAIDHQMILEGIYTHFATADQLETSYFEQQL AKFQAILSSLTTRPTFVHTANSAASLLQPQIDFDAIRFGISMYGLTPSTEIKNSLPFELK PALALYTEMVHVKELAPGDSVSYGATYTATEKEWVATLPIGYADGLIRHYSGFHVLVEGE RAPIIGRICMDQTIIKLPREFQTGTKVTIIGSDHGNKVTADDAAEYLGTINYEVTCLLTE RIPRKYIN >CDCA4_HUMAN (Q9BXL8) Cell division cycle-associated protein 4 (Hematopoietic progenitor protein) [Homo sapiens] MFARGLKRKCVGHEEDVEGALAGLKTVSSYSLQRQSLLDMSLVKLQLCHMLVEPNLCRSV LIANTVRQIQEEMTQDGTWRTVAPQAAERAPLDRLVSTEILCRAAWGQEGAHPAPGLGDG HTQGPVSDLCPVTSAQAPRHLQSSAWEMDGPRENRGSFHKSLDQIFETLETKNPSCMEEL FSDVDSPYYDLDTVLTGMMGGARPGPCEGLEGLAPATPGPSSSCKSDLGELDHVVEILVE T

.

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Advances In Knowledge Discovery In Databases

Authors: Animesh Adhikari, Jhimli Adhikari

1st Edition

3319132121, 9783319132129

More Books

Students also viewed these Databases questions

Question

8.10 Explain several common types of training for special purposes.

Answered: 1 week ago