Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

The data This file already contains the import statement needed to import the phylogenetic tree we will be using: from hivlMgroup the variable mtree into

image text in transcribedimage text in transcribedimage text in transcribedimage text in transcribed

The data This file already contains the import statement needed to import the phylogenetic tree we will be using: from hivlMgroup the variable mtree into the environment, which is the HIV1 tree we will be working with. import *. This statement brings mtree is a rooted, tuple-based tree of the type we have seen already. Each node has four things: (name, leftTree, branchLength is the length of the branch leading to the node. Here is an image of the tree represented by mtree rightTree, branchLength), where 2001 1994 2007 2004 1983 2003 2008 2000 2011 2002 2006 2010 2007 2001 2007 009 2001 2010 200 2012 2007 199 2 1999 1990 93 1996 1997 1996 1993 2008 2002 008 93 2004 2009 1996 22009 2011 199 998 2000 2002 2007 200 1989 2011 2 2001 2001 2011 As in the trees we saw last week,mtree was made from a collection of HIV1 sequences using the neighbor joining method, and rooted using a chimpanzee SIV outgroup. However there is one difference between mtree and the trees we've seen up till now. Inthe trees from last week, the name value for each leaf was the name of the sample sequence. In mtree it is instead the date when the sample was collected. For example, the very top most leaf in the image (marked with a red dot) was collected in the year 2001. That leaf looks like this in our tuple-based tree representation: (2001, ,,0.06047) Notice that the 2001 is put in the "name" position (index 0). The left and right branches are empty, as is always the case with a leaf. And the final position (index 3) contains the branch length. That branch length value tells you how much sequence change has occurred in this lineage since it descended from its parent node Notice that the 2001 is put in the "name" position (index 0). The left and right branches are empty, as is always the case with a leaf. And the final position (index 3) contains the branch length. That branch length value tells you how much sequence change has occurred in this lineage since it descended from its parent node (the parent is also marked with a red dot) One of the things that we'll want to extract from mtree are these name values from the leaves. s paree The other thing we'll need to get is the total branch length from a leaf back to the root node. In the image above, observe the leaf labelled "2008" whose branch is colored red. We are interested in calculating the sum of all the branches leading from this leaf back to the root. These are illustrated in red. We want to sum up the lengths of all of these. IS extractData(tree) Your task is to write a function extractData which takes a tree as input and extracts from it the data we are interested in: the date of collection of the sample, and also the total branch length from root to leaf for that sample. This is a problem which requires us to "get into the guts" of the tree, and since trees are recursive structures, it is necessary to write a recursive function to solve the problem. The overall format will be similar to the recursive-functions-on-trees we discussed in lecture. extractData (tree) returns a list containing as many elements as there are leaves in its input tree. Each element of this output list is itself a list consisting of: [collectionDate, branchLengthSum]. We obtain collectionDate from the name position of each leaf node.And branchLengthSum is the sum of all the branch lengths from the leaf back to the root of the input tree The branchLen.py starter file contains two example trees, exTree and exTree2. Here is extractData operating on these. exTree: 2007 1993 1999 extractData (exTree) [[2007, 9.0], [1993, 5.0], [1999, 7.0 exTree2: 2010 0.02 0.015 0.01 2007 0.01 1994 0.01 2001 0.01 0.02 2006 0.01 0 03 0.015 2004 0.03 2006 >>> extractData (exTree2) [2010, 0.04], [2007, 0.045],[1994,0.02], [2001, 0.05], [2006, 0.07], [2004, 0.065],[2006,0.0611 The data This file already contains the import statement needed to import the phylogenetic tree we will be using: from hivlMgroup the variable mtree into the environment, which is the HIV1 tree we will be working with. import *. This statement brings mtree is a rooted, tuple-based tree of the type we have seen already. Each node has four things: (name, leftTree, branchLength is the length of the branch leading to the node. Here is an image of the tree represented by mtree rightTree, branchLength), where 2001 1994 2007 2004 1983 2003 2008 2000 2011 2002 2006 2010 2007 2001 2007 009 2001 2010 200 2012 2007 199 2 1999 1990 93 1996 1997 1996 1993 2008 2002 008 93 2004 2009 1996 22009 2011 199 998 2000 2002 2007 200 1989 2011 2 2001 2001 2011 As in the trees we saw last week,mtree was made from a collection of HIV1 sequences using the neighbor joining method, and rooted using a chimpanzee SIV outgroup. However there is one difference between mtree and the trees we've seen up till now. Inthe trees from last week, the name value for each leaf was the name of the sample sequence. In mtree it is instead the date when the sample was collected. For example, the very top most leaf in the image (marked with a red dot) was collected in the year 2001. That leaf looks like this in our tuple-based tree representation: (2001, ,,0.06047) Notice that the 2001 is put in the "name" position (index 0). The left and right branches are empty, as is always the case with a leaf. And the final position (index 3) contains the branch length. That branch length value tells you how much sequence change has occurred in this lineage since it descended from its parent node Notice that the 2001 is put in the "name" position (index 0). The left and right branches are empty, as is always the case with a leaf. And the final position (index 3) contains the branch length. That branch length value tells you how much sequence change has occurred in this lineage since it descended from its parent node (the parent is also marked with a red dot) One of the things that we'll want to extract from mtree are these name values from the leaves. s paree The other thing we'll need to get is the total branch length from a leaf back to the root node. In the image above, observe the leaf labelled "2008" whose branch is colored red. We are interested in calculating the sum of all the branches leading from this leaf back to the root. These are illustrated in red. We want to sum up the lengths of all of these. IS extractData(tree) Your task is to write a function extractData which takes a tree as input and extracts from it the data we are interested in: the date of collection of the sample, and also the total branch length from root to leaf for that sample. This is a problem which requires us to "get into the guts" of the tree, and since trees are recursive structures, it is necessary to write a recursive function to solve the problem. The overall format will be similar to the recursive-functions-on-trees we discussed in lecture. extractData (tree) returns a list containing as many elements as there are leaves in its input tree. Each element of this output list is itself a list consisting of: [collectionDate, branchLengthSum]. We obtain collectionDate from the name position of each leaf node.And branchLengthSum is the sum of all the branch lengths from the leaf back to the root of the input tree The branchLen.py starter file contains two example trees, exTree and exTree2. Here is extractData operating on these. exTree: 2007 1993 1999 extractData (exTree) [[2007, 9.0], [1993, 5.0], [1999, 7.0 exTree2: 2010 0.02 0.015 0.01 2007 0.01 1994 0.01 2001 0.01 0.02 2006 0.01 0 03 0.015 2004 0.03 2006 >>> extractData (exTree2) [2010, 0.04], [2007, 0.045],[1994,0.02], [2001, 0.05], [2006, 0.07], [2004, 0.065],[2006,0.0611

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image_2

Step: 3

blur-text-image_3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Database 101

Authors: Guy Kawasaki

1st Edition

0938151525, 978-0938151524

More Books

Students also viewed these Databases questions