Question
A FASTA file consists of one line of header, the so-called, definition line (which starts with the > symbol), followed by lines consisting only of
A FASTA file consists of one line of header, the so-called, definition line (which starts with the > symbol), followed by lines consisting only of sequence data (usually 60 characters per sequence except perhaps the last one). A multi-FASTA contains information regarding multiple sequences and it starts with the header line for one sequence followed by the sequence across multiple lines, followed by the header for the second sequence, followed by the second sequence, and so on.You should write a python script which does the following: it should ask the user for the name of an input FASTA file. Lets say the filename they enter is called stuff.fasta. The script should then read the input file. It should then display to the screen information regarding each sequence. For each, it should display the header line, the first 10 characters followed by the the number of amino acids in each protein. This will serve as a useful tool to summarize the contents of a multi-FASTA file.
(Using Biopython)
Here is multiprotein.fasta file content:
>1433G_HUMAN (P61981) 14-3-3 protein gamma (Protein kinase C inhibitor protein 1) (KCIP-1) [Homo sapiens] VDREQLVQKARLAEQAERYDDMAAAMKNVTELNEPLSNEERNLLSVAYKNVVGARRSSWR VISSIEQKTSADGNEKKIEMVRAYREKIEKELEAVCQDVLSLLDNYLIKNCSETQYESKV FYLKMKGDYYRYLAEVATGEKRATVVESSEKAYSEAHEISKEHMQPTHPIRLGLALNYSV FYYEIQNAPEQACHLAKTAFDDAIAELDTLNEDSYKDSTLIMQLLRDNLTLWTSDQQDDD GGEGNN >ATP8_RAT (P11608) ATP synthase protein 8 (EC 3.6.3.14) (ATPase subunit 8) (A6L) (Chargerin II) [Rattus norvegicus] MPQLDTSTWFITIISSMATLFILFQLKISSQTFPAPPSPKTMATEKTNNPWESKWTKIYL PLSLPPQ >ALR_LISIN (Q92DC9) Alanine racemase (EC 5.1.1.1) MVTGWHRPTWIEIDRAAIRENIKNEQNKLPDKVALWAVVKANAYGHGIIETAKIAKEAGA KGFCVAILDEALALREAGFRNEFILVLGATRKEDANLAAKNNISVTVFREDWLDDLTLEA PLRIHLKVDSGMGRLGIRSREEAQRIETTIAIDHQMILEGIYTHFATADQLETSYFEQQL AKFQAILSSLTTRPTFVHTANSAASLLQPQIDFDAIRFGISMYGLTPSTEIKNSLPFELK PALALYTEMVHVKELAPGDSVSYGATYTATEKEWVATLPIGYADGLIRHYSGFHVLVEGE RAPIIGRICMDQTIIKLPREFQTGTKVTIIGSDHGNKVTADDAAEYLGTINYEVTCLLTE RIPRKYIN >CDCA4_HUMAN (Q9BXL8) Cell division cycle-associated protein 4 (Hematopoietic progenitor protein) [Homo sapiens] MFARGLKRKCVGHEEDVEGALAGLKTVSSYSLQRQSLLDMSLVKLQLCHMLVEPNLCRSV LIANTVRQIQEEMTQDGTWRTVAPQAAERAPLDRLVSTEILCRAAWGQEGAHPAPGLGDG HTQGPVSDLCPVTSAQAPRHLQSSAWEMDGPRENRGSFHKSLDQIFETLETKNPSCMEEL FSDVDSPYYDLDTVLTGMMGGARPGPCEGLEGLAPATPGPSSSCKSDLGELDHVVEILVE T
.
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started