Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

7. Array-Oriented Programming with NumPy Objectives In this chapter, youll: Learn what arrays are and how they differ from lists. Use the numpy modules highperformance

7. Array-Oriented Programming with NumPy Objectives In this chapter, youll:
  • Learn what arrays are and how they differ from lists.
  • Use the numpy modules highperformance ndarrays.
  • Compare list and ndarray performance with the IPython %timeit magic.
  • Use ndarrays to store and retrieve data efficiently.
  • Create and initialize ndarrays.
  • Refer to individual ndarray elements.
  • Iterate through ndarrays.
  • Create and manipulate multidimensional ndarrays.
  • Perform common ndarray manipulations.
  • Create and manipulate pandas one-dimensional Series and two-dimensional DataFrames.
  • Customize Series and DataFrame indices.
  • Calculate basic descriptive statistics for data in a Series and a DataFrame.
  • Customize floating-point number precision in pandas output formatting.
7.1 Introduction NumPy(Numerical Python) Library
  • First appeared in 2006 and is thepreferred Python array implementation.
  • High-performance, richly functionaln-dimensional arraytype calledndarray.
  • Written in Candup to 100 times faster than lists.
  • Critical in big-data processing, AI applications and much more.
  • According tolibraries.io,over 450 Python libraries depend on NumPy.
  • Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy.
Array-Oriented Programming
  • Functional-style programmingwithinternal iterationmakes array-oriented manipulations concise and straightforward, and reduces the possibility of error.

19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.2 Creatingarrays from Existing Data
  • Creating an array with thearrayfunction
  • Argument is anarrayor other iterable
  • Returns a newarraycontaining the arguments elements
In[1]: import numpy as np In[2]: numbers = np.array([2, 3, 5, 7, 11]) In[3]: type(numbers) Out[3]: numpy.ndarray In[4]: numbers Out[4]: array([ 2, 3, 5, 7, 11]) Multidimensional Arguments In[5]: np.array([[1, 2, 3], [4, 5, 6]]) Out[5]: array([[1, 2, 3], [4, 5, 6]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.3arrayAttributes
  • attributesenable you to discover information about its structure and contents
In[1]: import numpy as np In[2]: integers = np.array([[1, 2, 3], [4, 5, 6]]) In[3]: integers Out[3]: array([[1, 2, 3], [4, 5, 6]]) In[4]: floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4]) In[5]: floats Out[5]: array([0. , 0.1, 0.2, 0.3, 0.4])
  • NumPy does not display trailing 0s
Determining anarrays Element Type In[6]: integers.dtype Out[6]: dtype('int64') In[7]: floats.dtype Out[7]: dtype('float64')
  • For performance reasons, NumPy is written in the C programming language and uses Cs data types
  • Other NumPy types
Determining anarrays Dimensions
  • ndimcontains anarrays number of dimensions
  • shapecontains atuplespecifying anarrays dimensions
In[8]: integers.ndim Out[8]: 2 In[9]: floats.ndim Out[9]: 1 In[10]: integers.shape Out[10]: (2, 3) In[11]: floats.shape Out[11]: (5,) Determining anarrays Number of Elements and Element Size
  • view anarrays total number of elements withsize
  • view number of bytes required to store each element withitemsize
In[12]: integers.size Out[12]: 6 In[13]: integers.itemsize Out[13]: 8 In[14]: floats.size Out[14]: 5 In[15]: floats.itemsize Out[15]: 8 Iterating through a Multidimensionalarrays Elements In[16]: for row in integers: for column in row: print(column, end=' ') print() 1 2 3 4 5 6
  • Iterate through a multidimensionalarrayas if it were one-dimensional by usingflat
In[17]: for i in integers.flat: print(i, end=' ') 1 2 3 4 5 6
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.4 Fillingarrays with Specific Values
  • Functionszeros,onesandfullcreatearrays containing0s,1s or a specified value, respectively
In[1]: import numpy as np In[2]: np.zeros(5) Out[2]: array([0., 0., 0., 0., 0.])
  • For a tuple of integers, these functions return a multidimensionalarraywith the specified dimensions
In[3]: np.ones((2, 4), dtype=int) Out[3]: array([[1, 1, 1, 1], [1, 1, 1, 1]]) In[4]: np.full((3, 5), 13) Out[4]: array([[13, 13, 13, 13, 13], [13, 13, 13, 13, 13], [13, 13, 13, 13, 13]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.5 Creatingarrays from Ranges
  • NumPy provides optimized functions for creatingarrays from ranges
Creating Integer Ranges witharange In[1]: import numpy as np In[2]: np.arange(5) Out[2]: array([0, 1, 2, 3, 4]) In[3]: np.arange(5, 10) Out[3]: array([5, 6, 7, 8, 9]) In[4]: np.arange(10, 1, -2) Out[4]: array([10, 8, 6, 4, 2]) Creating Floating-Point Ranges withlinspace
  • Produce evenly spaced floating-point ranges with NumPyslinspacefunction
  • Ending valueis includedin thearray
In[5]: np.linspace(0.0, 1.0, num=5) Out[5]: array([0. , 0.25, 0.5 , 0.75, 1. ]) Reshaping anarray
  • arraymethodreshapetransforms an array into different number of dimensions
  • New shape must have thesamenumber of elements as the original
In[6]: np.arange(1, 21).reshape(4, 5) Out[6]: array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]) Displaying Largearrays
  • When displaying anarray, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output
In[7]: np.arange(1, 100001).reshape(4, 25000) Out[7]: array([[ 1, 2, 3, ..., 24998, 24999, 25000], [ 25001, 25002, 25003, ..., 49998, 49999, 50000], [ 50001, 50002, 50003, ..., 74998, 74999, 75000], [ 75001, 75002, 75003, ..., 99998, 99999, 100000]]) In[8]: np.arange(1, 100001).reshape(100, 1000) Out[8]: array([[ 1, 2, 3, ..., 998, 999, 1000], [ 1001, 1002, 1003, ..., 1998, 1999, 2000], [ 2001, 2002, 2003, ..., 2998, 2999, 3000], ..., [ 97001, 97002, 97003, ..., 97998, 97999, 98000], [ 98001, 98002, 98003, ..., 98998, 98999, 99000], [ 99001, 99002, 99003, ..., 99998, 99999, 100000]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.6 List vs.arrayPerformance: Introducing%timeit
  • Mostarrayoperations executesignificantlyfaster than corresponding list operations
  • IPython%timeitmagiccommand times theaverageduration of operations
Timing the Creation of a List Containing Results of 6,000,000 Die Rolls In[1]: import random In[2]: %timeit rolls_list = \ [random.randrange(1, 7) for i in range(0, 6_000_000)] 6.88 s 276 ms per loop (mean std. dev. of 7 runs, 1 loop each)
  • By default,%timeitexecutes a statement in a loop, and it runs the loopseventimes
  • If you do not indicate the number of loops,%timeitchooses an appropriate value
  • After executing the statement,%timeitdisplays the statementsaverageexecution time, as well as the standard deviation of all the executions
Timing the Creation of anarrayContaining Results of 6,000,000 Die Rolls In[3]: import numpy as np In[4]: %timeit rolls_array = np.random.randint(1, 7, 6_000_000) 75.2 ms 2.33 ms per loop (mean std. dev. of 7 runs, 10 loops each) 60,000,000 and 600,000,000 Die Rolls In[5]: %timeit rolls_array = np.random.randint(1, 7, 60_000_000) 916 ms 26.1 ms per loop (mean std. dev. of 7 runs, 1 loop each) In[6]: %timeit rolls_array = np.random.randint(1, 7, 600_000_000) 10.3 s 180 ms per loop (mean std. dev. of 7 runs, 1 loop each) Customizing the %timeit Iterations In[7]: %timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000) 74.5 ms 7.58 ms per loop (mean std. dev. of 2 runs, 3 loops each) Other IPython Magics IPython provides dozens of magics for a variety of tasksfor a complete list, see the IPython magics documentation. Here are a few helpful ones:
  • %loadto read code into IPython from a local file or URL.
  • %saveto save snippets to a file.
  • %runto execute a .py file from IPython.
  • %precisionto change the default floating-point precision for IPython outputs.
  • %cdto change directories without having to exit IPython first.
  • %editto launch an external editorhandy if you need to modify more complex snippets.
  • %historyto view a list of all snippets and commands youve executed in the current IPython session.

19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.7arrayOperators
  • arrayoperators perform operations onentirearrays.
  • Can perform arithmeticbetweenarrays and scalar numeric values, andbetweenarrays of the same shape.
In[1]: import numpy as np In[2]: numbers = np.arange(1, 6) In[3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In[4]: numbers * 2 Out[4]: array([ 2, 4, 6, 8, 10]) In[5]: numbers ** 3 Out[5]: array([ 1, 8, 27, 64, 125]) In[6]: numbers # numbers is unchanged by the arithmetic operators Out[6]: array([1, 2, 3, 4, 5]) In[7]: numbers += 10 In[8]: numbers Out[8]: array([11, 12, 13, 14, 15]) Broadcasting
  • Arithmetic operations require as operands twoarrays of thesame size and shape.
  • numbers * 2is equivalent tonumbers * [2, 2, 2, 2, 2]for a 5-element array.
  • Applying the operation to every element is calledbroadcasting.
  • Also can be applied betweenarrays of different sizes and shapes, enabling some concise and powerful manipulations.
Arithmetic Operations Betweenarrays
  • Can perform arithmetic operations and augmented assignments betweenarrays of thesameshape
In[9]: numbers2 = np.linspace(1.1, 5.5, 5) In[10]: numbers2 Out[10]: array([1.1, 2.2, 3.3, 4.4, 5.5]) In[11]: numbers * numbers2 Out[11]: array([12.1, 26.4, 42.9, 61.6, 82.5]) Comparingarrays
  • Can comparearrays with individual values and with otherarrays
  • Comparisons performedelement-wise
  • Producearrays of Boolean values in which each elementsTrueorFalsevalue indicates the comparison result
In[12]: numbers Out[12]: array([11, 12, 13, 14, 15]) In[13]: numbers >= 13 Out[13]: array([False, False, True, True, True]) In[14]: numbers2 Out[14]: array([1.1, 2.2, 3.3, 4.4, 5.5]) In[15]: numbers2 Out[15]: array([ True, True, True, True, True]) In[16]: numbers == numbers2 Out[16]: array([False, False, False, False, False]) In[17]: numbers == numbers Out[17]: array([ True, True, True, True, True])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.8 NumPy Calculation Methods
  • These methodsignore thearrays shapeanduse all the elements in the calculations.
  • Consider anarrayrepresenting four students grades on three exams:
In[1]: import numpy as np In[2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In[3]: grades Out[3]: array([[ 87, 96, 70], [100, 87, 90], [ 94, 77, 90], [100, 81, 82]])
  • Can use methods to calculatesum,min,max,mean,std(standard deviation) andvar(variance)
  • Each is a functional-style programmingreduction
In[4]: grades.sum() Out[4]: 1054 In[5]: grades.min() Out[5]: 70 In[6]: grades.max() Out[6]: 100 In[7]: grades.mean() Out[7]: 87.83333333333333 In[8]: grades.std() Out[8]: 8.792357792739987 In[9]: grades.var() Out[9]: 77.30555555555556 Calculations by Row or Column
  • You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
  • Each 2D+ array hasone axis per dimension
  • In a 2D array,axis=0indicates calculations should becolumn-by-column
In[10]: grades.mean(axis=0) Out[10]: array([95.25, 85.25, 83. ])
  • In a 2D array,axis=1indicates calculations should berow-by-row
In[11]: grades.mean(axis=1) Out[11]: array([84.33333333, 92.33333333, 87. , 87.66666667])
  • Other NumpyarrayCalculation Methods

19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.9 Universal Functions
  • Standaloneuniversal functions(ufuncs)performelement-wise operationsusing one or twoarrayor array-like arguments (like lists)
  • Each returns anewarraycontaining the results
  • Some ufuncs are called when you usearrayoperators like+and*
  • Create anarrayand calculate the square root of its values, using thesqrtuniversal function
In[1]: import numpy as np In[2]: numbers = np.array([1, 4, 9, 16, 25, 36]) In[3]: np.sqrt(numbers) Out[3]: array([1., 2., 3., 4., 5., 6.])
  • Add twoarrays with the same shape, using theadduniversal function
  • Equivalent to:
  • numbers + numbers2
In[4]: numbers2 = np.arange(1, 7) * 10 In[5]: numbers2 Out[5]: array([10, 20, 30, 40, 50, 60]) In[6]: np.add(numbers, numbers2) Out[6]: array([11, 24, 39, 56, 75, 96]) Broadcasting with Universal Functions
  • Universal functions can use broadcasting, just like NumPyarrayoperators
In[7]: np.multiply(numbers2, 5) Out[7]: array([ 50, 100, 150, 200, 250, 300]) In[8]: numbers3 = numbers2.reshape(2, 3) In[9]: numbers3 Out[9]: array([[10, 20, 30], [40, 50, 60]]) In[10]: numbers4 = np.array([2, 4, 6]) In[11]: np.multiply(numbers3, numbers4) Out[11]: array([[ 20, 80, 180], [ 80, 200, 360]])
  • Broadcasting rules documentation
Other Universal Functions
NumPy universal functions
Mathadd,subtract,multiply,divide,remainder,exp,log,sqrt,power, and more.
Trigonometrysin,cos,tan,hypot,arcsin,arccos,arctan, and more.
Bit manipulationbitwise_and,bitwise_or,bitwise_xor,invert,left_shiftandright_shift.
Comparisongreater,greater_equal,less,less_equal,equal,not_equal,logical_and,logical_or,logical_xor,logical_not,minimum,maximum, and more.
Floating pointfloor,ceil,isinf,isnan,fabs,trunc, and more.

19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.10 Indexing and Slicing
  • One-dimensionalarrays can beindexedandslicedlike lists.
Indexing with Two-Dimensionalarrays
  • To select an element in a two-dimensionalarray, specify a tuple containing the elements row and column indices in square brackets
In[1]: import numpy as np In[2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In[3]: grades Out[3]: array([[ 87, 96, 70], [100, 87, 90], [ 94, 77, 90], [100, 81, 82]]) In[4]: grades[0, 1] # row 0, column 1 Out[4]: 96 Selecting a Subset of a Two-Dimensionalarrays Rows
  • To select a single row, specify only one index in square brackets
In[5]: grades[1] Out[5]: array([100, 87, 90])
  • Select multiple sequential rows with slice notation
In[6]: grades[0:2] Out[6]: array([[ 87, 96, 70], [100, 87, 90]])
  • Select multiple non-sequential rows with a list of row indices
In[7]: grades[[1, 3]] Out[7]: array([[100, 87, 90], [100, 81, 82]]) Selecting a Subset of a Two-Dimensionalarrays Columns
  • Thecolumn indexalso can be a specificindex, asliceor alist
In[8]: grades[:, 0] Out[8]: array([ 87, 100, 94, 100]) In[9]: grades[:, 1:3] Out[9]: array([[96, 70], [87, 90], [77, 90], [81, 82]]) In[10]: grades[:, [0, 2]] Out[10]: array([[ 87, 70], [100, 90], [ 94, 90], [100, 82]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.11 Views: Shallow Copies
  • Views see the data in other objects, rather than having their own copies of the data
  • Views are shallow copies *arraymethodviewreturns anewarray object with aviewof the originalarrayobjects data
In[1]: import numpy as np In[2]: numbers = np.arange(1, 6) In[3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In[4]: numbers2 = numbers.view() In[5]: numbers2 Out[5]: array([1, 2, 3, 4, 5])
  • Use built-inidfunction to see thatnumbersandnumbers2aredifferentobjects
In[6]: id(numbers) Out[6]: 4431803056 In[7]: id(numbers2) Out[7]: 4430398928
  • Modifying an element in the originalarray, also modifies the view and vice versa
In[8]: numbers[1] *= 10 In[9]: numbers2 Out[9]: array([ 1, 20, 3, 4, 5]) In[10]: numbers Out[10]: array([ 1, 20, 3, 4, 5]) In[11]: numbers2[1] /= 10 In[12]: numbers Out[12]: array([1, 2, 3, 4, 5]) In[13]: numbers2 Out[13]: array([1, 2, 3, 4, 5]) Slice Views
  • Slices also create views
In[14]: numbers2 = numbers[0:3] In[15]: numbers2 Out[15]: array([1, 2, 3]) In[16]: id(numbers) Out[16]: 4431803056 In[17]: id(numbers2) Out[17]: 4451350368
  • Confirm thatnumbers2is a view of only first threenumberselements
In[18]: numbers2[3] ------------------------------------------------------------------------ IndexError Traceback (most recent call last) in ----> 1 numbers2[3] IndexError: index 3 is out of bounds for axis 0 with size 3
  • Modify an element botharrays share to show both are updated
In[19]: numbers[1] *= 20 In[20]: numbers Out[20]: array([ 1, 40, 3, 4, 5]) In[21]: numbers2 Out[21]: array([ 1, 40, 3])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.12 Deep Copies
  • When sharingmutablevalues, sometimes its necessary to create adeep copyof the original data
  • Especially important in multi-core programming, where separate parts of your program could attempt to modify your data at the same time, possibly corrupting it
  • arraymethodcopyreturns a new array object with an independent copy of the original array's data
In[1]: import numpy as np In[2]: numbers = np.arange(1, 6) In[3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In[4]: numbers2 = numbers.copy() In[5]: numbers2 Out[5]: array([1, 2, 3, 4, 5]) In[6]: numbers[1] *= 10 In[7]: numbers Out[7]: array([ 1, 20, 3, 4, 5]) In[8]: numbers2 Out[8]: array([1, 2, 3, 4, 5]) ModulecopyShallow vs. Deep Copies for Other Types of Python Objects
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.13 Reshaping and Transposing reshapevs.resize
  • Methodreshapereturns aview(shallow copy) of the originalarraywith new dimensions
  • Doesnotmodify the originalarray
In[1]: import numpy as np In[2]: grades = np.array([[87, 96, 70], [100, 87, 90]]) In[3]: grades Out[3]: array([[ 87, 96, 70], [100, 87, 90]]) In[4]: grades.reshape(1, 6) Out[4]: array([[ 87, 96, 70, 100, 87, 90]]) In[5]: grades Out[5]: array([[ 87, 96, 70], [100, 87, 90]])
  • Methodresizemodifies the originalarrays shape
In[6]: grades.resize(1, 6) In[7]: grades Out[7]: array([[ 87, 96, 70, 100, 87, 90]]) flattenvs.ravel
  • Can flatten a multi-dimensonal array into a single dimension with methodsflattenandravel
  • flattendeep copiesthe original arrays data
In[8]: grades = np.array([[87, 96, 70], [100, 87, 90]]) In[9]: grades Out[9]: array([[ 87, 96, 70], [100, 87, 90]]) In[10]: flattened = grades.flatten() In[11]: flattened Out[11]: array([ 87, 96, 70, 100, 87, 90]) In[12]: grades Out[12]: array([[ 87, 96, 70], [100, 87, 90]]) In[13]: flattened[0] = 100 In[14]: flattened Out[14]: array([100, 96, 70, 100, 87, 90]) In[15]: grades Out[15]: array([[ 87, 96, 70], [100, 87, 90]])
  • Methodravelproduces aviewof the originalarray, whichsharesthegradesarrays data
In[16]: raveled = grades.ravel() In[17]: raveled Out[17]: array([ 87, 96, 70, 100, 87, 90]) In[18]: grades Out[18]: array([[ 87, 96, 70], [100, 87, 90]]) In[19]: raveled[0] = 100 In[20]: raveled Out[20]: array([100, 96, 70, 100, 87, 90]) In[21]: grades Out[21]: array([[100, 96, 70], [100, 87, 90]]) Transposing Rows and Columns
  • Can quicklytransposeanarrays rows and columns
    • flips thearray, so the rows become the columns and the columns become the rows
  • Tattributereturns a transposedview(shallow copy) of thearray
In[22]: grades.T Out[22]: array([[100, 100], [ 96, 87], [ 70, 90]]) In[23]: grades Out[23]: array([[100, 96, 70], [100, 87, 90]]) Horizontal and Vertical Stacking
  • Can combine arrays by adding more columns or more rowsknown ashorizontal stackingandvertical stacking
In[24]: grades2 = np.array([[94, 77, 90], [100, 81, 82]])
  • Combinegradesandgrades2with NumPyshstack(horizontal stack) functionby passing a tuple containing the arrays to combine
  • The extra parentheses are required becausehstackexpects one argument
  • Adds more columns
In[25]: np.hstack((grades, grades2)) Out[25]: array([[100, 96, 70, 94, 77, 90], [100, 87, 90, 100, 81, 82]])
  • Combinegradesandgrades2with NumPysvstack(vertical stack) function
  • Adds more rows
In[26]: np.vstack((grades, grades2)) Out[26]: array([[100, 96, 70], [100, 87, 90], [ 94, 77, 90], [100, 81, 82]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.14.1 pandasSeries
  • An enhanced one-dimensionalarray
  • Supports custom indexing, including even non-integer indices like strings
  • Offers additional capabilities that make them more convenient for many data-science oriented tasks
    • Seriesmay have missing data
    • ManySeriesoperations ignore missing data by default

Creating aSerieswith Default Indices

  • By default, aSerieshas integer indices numbered sequentially from 0
In[1]: import pandas as pd In[2]: grades = pd.Series([87, 100, 94])

Creating aSerieswith All Elements Having the Same Value

  • Second argument is a one-dimensional iterable object (such as a list, anarrayor arange) containing theSeries indices
  • Number of indices determines the number of elements
In[149]: pd.Series(98.6, range(3)) Out[149]: 0 98.6 1 98.6 2 98.6 dtype: float64

Accessing aSeries Elements

In[150]: grades[0] Out[150]: 87

Producing Descriptive Statistics for a Series

  • Seriesprovides many methods for common tasks including producing various descriptive statistics
  • Each of these is a functional-style reduction
In[151]: grades.count() Out[151]: 3 In[152]: grades.mean() Out[152]: 93.66666666666667 In[153]: grades.min() Out[153]: 87 In[154]: grades.max() Out[154]: 100 In[155]: grades.std() Out[155]: 6.506407098647712
  • Seriesmethoddescribeproduces all these stats and more
  • The25%,50%and75%arequartiles:
    • 50%represents the median of the sorted values.
    • 25%represents the median of the first half of the sorted values.
    • 75%represents the median of the second half of the sorted values.
  • For the quartiles, if there are two middle elements, then their average is that quartiles median
In[156]: grades.describe() Out[156]: count 3.000000 mean 93.666667 std 6.506407 min 87.000000 25% 90.500000 50% 94.000000 75% 97.000000 max 100.000000 dtype: float64

Creating aSerieswith Custom Indices

Can specify custom indices with theindexkeyword argument In[157]: grades = pd.Series([87, 100, 94], index=['Wally', 'Eva', 'Sam']) In[158]: grades Out[158]: Wally 87 Eva 100 Sam 94 dtype: int64

Dictionary Initializers

  • If you initialize aSerieswith a dictionary, its keys are the indices, and its values become theSeries element values
In[159]: grades = pd.Series({'Wally': 87, 'Eva': 100, 'Sam': 94}) In[160]: grades Out[160]: Wally 87 Eva 100 Sam 94 dtype: int64

Accessing Elements of aSeriesVia Custom Indices

  • Can access individual elements via square brackets containing a custom index value
In[161]: grades['Eva'] Out[161]: 100
  • If custom indices are strings that could represent valid Python identifiers, pandas automatically adds them to theSeriesas attributes
In[162]: grades.Wally Out[162]: 87
  • dtypeattributereturns the underlyingarrays element type
In[163]: grades.dtype Out[163]: dtype('int64')
  • valuesattributereturns the underlyingarray
In[164]: grades.values Out[164]: array([ 87, 100, 94])

Creating a Series of Strings

  • In aSeriesof strings, you can usestrattributeto call string methods on the elements
In[165]: hardware = pd.Series(['Hammer', 'Saw', 'Wrench']) In[166]: hardware Out[166]: 0 Hammer 1 Saw 2 Wrench dtype: object
  • Call string methodcontainson each element
  • Returns aSeriescontainingboolvalues indicating thecontainsmethods result for each element
  • Thestrattribute provides many string-processing methods that are similar to those in Pythons string type
    • https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling
In[167]: hardware.str.contains('a') Out[167]: 0 True 1 True 2 False dtype: bool
  • Use string methodupperto produce anewSeriescontaining the uppercase versions of each element inhardware
In[168]: hardware.str.upper() Out[168]: 0 HAMMER 1 SAW 2 WRENCH dtype: object
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.14.2DataFrames
  • Enhanced two-dimensionalarray
  • Can have custom row and column indices
  • Offers additional operations and capabilities that make them more convenient for many data-science oriented tasks
  • Support missing data
  • Each column in aDataFrameis aSeries

Creating aDataFramefrom a Dictionary

  • Create aDataFramefrom a dictionary that represents student grades on three exams
In[1]: import pandas as pd In[2]: grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90], 'Sam': [94, 77, 90], 'Katie': [100, 81, 82], 'Bob': [83, 65, 85]} In[3]: grades = pd.DataFrame(grades_dict)
  • Pandas displaysDataFrames in tabular format with indicesleft alignedin the index column and the remaining columns valuesright aligned
In[4]: grades Out[4]:
Wally Eva Sam Katie Bob
0 87 100 94 100 83
1 96 87 77 81 65
2 70 90 90 82 85

Customizing aDataFrames Indices with theindexAttribute

  • Can use theindexattributeto change theDataFrames indices from sequential integers to labels
  • Must provide a one-dimensional collection that has the same number of elements as there arerowsin theDataFrame
In[5]: grades.index = ['Test1', 'Test2', 'Test3'] In[6]: grades Out[6]:
Wally Eva Sam Katie Bob
Test1 87 100 94 100 83
Test2 96 87 77 81 65
Test3 70 90 90 82 85

Accessing aDataFrames Columns

  • Can quickly and conveniently look at your data in many different ways, including selecting portions of the data
  • GetEvas grades by name
  • Displays her column as aSeries
In[7]: grades['Eva'] Out[7]: Test1 100 Test2 87 Test3 90 Name: Eva, dtype: int64
  • If aDataFrames column-name strings are valid Python identifiers, you can use them as attributes
In[8]: grades.Sam Out[8]: Test1 94 Test2 77 Test3 90 Name: Sam, dtype: int64

Selecting Rows via thelocandilocAttributes

  • DataFrames support indexing capabilities with[], but pandas documentation recommends using the attributesloc,iloc,atandiat
    • Optimized to accessDataFrames and also provide additional capabilities
  • Access a row by its label via theDataFrameslocattribute
In[9]: grades.loc['Test1'] Out[9]: Wally 87 Eva 100 Sam 94 Katie 100 Bob 83 Name: Test1, dtype: int64
  • Access rows by integer zero-based indices using theilocattribute(theiinilocmeans that its used with integer indices)
In[10]: grades.iloc[1] Out[10]: Wally 96 Eva 87 Sam 77 Katie 81 Bob 65 Name: Test2, dtype: int64

Selecting Rows via Slices and Lists with thelocandilocAttributes

  • Index can be aslice
  • When using slices containinglabelswithloc, the range specifiedincludesthe high index ('Test3'):
In[11]: grades.loc['Test1':'Test3'] Out[11]:
Wally Eva Sam Katie Bob
Test1 87 100 94 100 83
Test2 96 87 77 81 65
Test3 70 90 90 82 85
  • When using slices containinginteger indiceswithiloc, the range you specifyexcludesthe high index (2):
In[12]: grades.iloc[0:2] Out[12]:
Wally Eva Sam Katie Bob
Test1 87 100 94 100 83
Test2 96 87 77 81 65
  • Selectspecific rowswith alist
In[13]: grades.loc[['Test1', 'Test3']] Out[13]:
Wally Eva Sam Katie Bob
Test1 87 100 94 100 83
Test3 70 90 90 82 85
In[14]: grades.iloc[[0, 2]] Out[14]:
Wally Eva Sam Katie Bob
Test1 87 100 94 100 83
Test3 70 90 90 82 85

Selecting Subsets of the Rows and Columns

  • View onlyEvas andKaties grades onTest1andTest2
In[15]: grades.loc['Test1':'Test2', ['Eva', 'Katie']] Out[15]:
Eva Katie
Test1 100 100
Test2 87 81
  • Useilocwith a list and a slice to select the first and third tests and the first three columns for those tests
In[16]: grades.iloc[[0, 2], 0:3] Out[16]:
Wally Eva Sam
Test1 87 100 94
Test3 70 90 90

Boolean Indexing

  • One of pandas more powerful selection capabilities isBoolean indexing
  • Select all the A gradesthat is, those that are greater than or equal to 90:
    • Pandas checks every grade to determine whether its value is greater than or equal to 90 and, if so, includes it in the newDataFrame.
    • Grades for which the condition isFalseare represented asNaN(not a number)in the new `DataFrame
    • NaNis pandas notation for missing values
In[17]: grades[grades >= 90] Out[17]:
Wally Eva Sam Katie Bob
Test1 NaN 100.0 94.0 100.0 NaN
Test2 96.0 NaN NaN NaN NaN
Test3 NaN 90.0 90.0 NaN NaN
  • Select all the B grades in the range 8089
In[18]: grades[(grades >= 80) & (grades Out[18]:
Wally Eva Sam Katie Bob
Test1 87.0 NaN NaN NaN 83.0
Test2 NaN 87.0 NaN 81.0 NaN
Test3 NaN NaN NaN 82.0 85.0
  • Pandas Boolean indices combine multiple conditions with the Python operator&(bitwise AND),nottheandBoolean operator
  • Fororconditions, use|(bitwise OR)
  • NumPy also supports Boolean indexing forarrays, but always returns a one-dimensional array containing only the values that satisfy the condition

Accessing a SpecificDataFrameCell by Row and Column

  • DataFramemethodatandiatattributes get a single value from aDataFrame
In[19]: grades.at['Test2', 'Eva'] Out[19]: 87 In[20]: grades.iat[2, 0] Out[20]: 70
  • Can assign new values to specific elements
In[21]: grades.at['Test2', 'Eva'] = 100 In[22]: grades.at['Test2', 'Eva'] Out[22]: 100 In[23]: grades.iat[1, 2] = 87 In[24]: grades.iat[1, 2] Out[24]: 87

Descriptive Statistics

  • DataFramesdescribemethodcalculates basic descriptive statistics for the data and returns them as aDataFrame
  • Statistics are calculated by column
In[25]: grades.describe() Out[25]:
Wally Eva Sam Katie Bob
count 3.000000 3.000000 3.000000 3.000000 3.000000
mean 84.333333 96.666667 90.333333 87.666667 77.666667
std 13.203535 5.773503 3.511885 10.692677 11.015141
min 70.000000 90.000000 87.000000 81.000000 65.000000
25% 78.500000 95.000000 88.500000 81.500000 74.000000
50% 87.000000 100.000000 90.000000 82.000000 83.000000
75% 91.500000 100.000000 92.000000 91.000000 84.000000
max 96.000000 100.000000 94.000000 100.000000 85.000000
  • Quick way to summarize your data
  • Nicely demonstrates the power of array-oriented programming with a clean, concise functional-style call
  • Can control the precision and other default settings with pandasset_optionfunction
In[26]: pd.set_option('precision', 2) In[27]: grades.describe() Out[27]:
Wally Eva Sam Katie Bob
count 3.00 3.00 3.00 3.00 3.00
mean 84.33 96.67 90.33 87.67 77.67
std 13.20 5.77 3.51 10.69 11.02
min 70.00 90.00 87.00 81.00 65.00
25% 78.50 95.00 88.50 81.50 74.00
50% 87.00 100.00 90.00 82.00 83.00
75% 91.50 100.00 92.00 91.00 84.00
max 96.00 100.00 94.00 100.00 85.00
  • For student grades, the most important of these statistics is probably the mean
  • Can calculate that for each student simply by callingmeanon theDataFrame
In[28]: grades.mean() Out[28]: Wally 84.33 Eva 96.67 Sam 90.33 Katie 87.67 Bob 77.67 dtype: float64

Transposing theDataFramewith theTAttribute

  • Can quicklytransposerows and columnsso the rows become the columns, and the columns become the rowsby using theTattributeto get a view
In[29]: grades.T Out[29]:
Test1 Test2 Test3
Wally 87 96 70
Eva 100 100 90
Sam 94 87 90
Katie 100 81 82
Bob 83 65 85
  • Assume that rather than getting the summary statistics by student, you want to get them by test
  • Calldescribeongrades.T
In[30]: grades.T.describe() Out[30]:
Test1 Test2 Test3
count 5.00 5.00 5.00
mean 92.80 85.80 83.40
std 7.66 13.81 8.23
min 83.00 65.00 70.00
25% 87.00 81.00 82.00
50% 94.00 87.00 85.00
75% 100.00 96.00 90.00
max 100.00 100.00 90.00
  • Get average of all the students grades on each test
In[31]: grades.T.mean() Out[31]: Test1 92.8 Test2 85.8 Test3 83.4 dtype: float64

Sorting by Rows by Their Indices

  • Can sort aDataFrameby its rows or columns, based on their indices or values
  • Sort the rows by theirindicesindescendingorder usingsort_indexand its keyword argumentascending=False
In[32]: grades.sort_index(ascending=False) Out[32]:
Wally Eva Sam Katie Bob
Test3 70 90 90 82 85
Test2 96 100 87 81 65
Test1 87 100 94 100 83

Sorting by Column Indices

  • Sort columns into ascending order (left-to-right) by their column names
  • axis=1keyword argumentindicates that we wish to sort thecolumnindices, rather than the row indices
    • axis=0(the default) sorts therowindices
In[33]: grades.sort_index(axis=1) Out[33]:
Bob Eva Katie Sam Wally
Test1 83 100 100 94 87
Test2 65 100 81 87 96
Test3 85 90 82 90 70

Sorting by Column Values

  • To viewTest1s grades in descending order so we can see the students names in highest-to-lowest grade order, call methodsort_values
  • byandaxisarguments work together to determine which values will be sorted
    • In this case, we sort based on the column values (axis=1) forTest1
In[34]: grades.sort_values(by='Test1', axis=1, ascending=False) Out[34]:
Eva Katie Sam Wally Bob
Test1 100 100 94 87 83
Test2 100 81 87 96 65
Test3 90 82 90 70 85
  • Might be easier to read the grades and names if they were in a column
  • Sort the transposedDataFrameinstead
In[35]: grades.T.sort_values(by='Test1', ascending=False) Out[35]:
Test1 Test2 Test3
Eva 100 100 90
Katie 100 81 82
Sam 94 87 90
Wally 87 96 70
Bob 83 65 85
  • Since were sorting onlyTest1s grades, we might not want to see the other tests at all
  • Combine selection with sorting
In[36]: grades.loc['Test1'].sort_values(ascending=False) Out[36]: Katie 100 Eva 100 Sam 94 Wally 87 Bob 83 Name: Test1, dtype: int64

Copy vs. In-Place Sorting

  • sort_indexandsort_valuesreturn acopyof the originalDataFrame
  • Could require substantial memory in a big data application
  • Can sortin placeby passing the keyword argumentinplace=True

19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.

Attachments:

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Recommended Textbook for

Intermediate Accounting

Authors: James D. Stice, Earl K. Stice, Fred Skousen

17th Edition

978-0324592375

Students also viewed these Programming questions

Question

Help Tony write his job description. P-96

Answered: 1 week ago