Answered step by step
Verified Expert Solution
Question
1 Approved Answer
7. Array-Oriented Programming with NumPy Objectives In this chapter, youll: Learn what arrays are and how they differ from lists. Use the numpy modules highperformance
7. Array-Oriented Programming with NumPy Objectives In this chapter, youll: - Learn what arrays are and how they differ from lists.
- Use the numpy modules highperformance ndarrays.
- Compare list and ndarray performance with the IPython %timeit magic.
- Use ndarrays to store and retrieve data efficiently.
- Create and initialize ndarrays.
- Refer to individual ndarray elements.
- Iterate through ndarrays.
- Create and manipulate multidimensional ndarrays.
- Perform common ndarray manipulations.
- Create and manipulate pandas one-dimensional Series and two-dimensional DataFrames.
- Customize Series and DataFrame indices.
- Calculate basic descriptive statistics for data in a Series and a DataFrame.
- Customize floating-point number precision in pandas output formatting.
7.1 Introduction NumPy(Numerical Python) Library - First appeared in 2006 and is thepreferred Python array implementation.
- High-performance, richly functionaln-dimensional arraytype calledndarray.
- Written in Candup to 100 times faster than lists.
- Critical in big-data processing, AI applications and much more.
- According tolibraries.io,over 450 Python libraries depend on NumPy.
- Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy.
Array-Oriented Programming - Functional-style programmingwithinternal iterationmakes array-oriented manipulations concise and straightforward, and reduces the possibility of error.
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.2 Creatingarrays from Existing Data - Creating an array with thearrayfunction
- Argument is anarrayor other iterable
- Returns a newarraycontaining the arguments elements
In[1]: import numpy as np In[2]: numbers = np.array([2, 3, 5, 7, 11]) In[3]: type(numbers) Out[3]: numpy.ndarray In[4]: numbers Out[4]: array([ 2, 3, 5, 7, 11]) Multidimensional Arguments In[5]: np.array([[1, 2, 3], [4, 5, 6]]) Out[5]: array([[1, 2, 3], [4, 5, 6]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.3arrayAttributes - attributesenable you to discover information about its structure and contents
In[1]: import numpy as np In[2]: integers = np.array([[1, 2, 3], [4, 5, 6]]) In[3]: integers Out[3]: array([[1, 2, 3], [4, 5, 6]]) In[4]: floats = np.array([0.0, 0.1, 0.2, 0.3, 0.4]) In[5]: floats Out[5]: array([0. , 0.1, 0.2, 0.3, 0.4]) - NumPy does not display trailing 0s
Determining anarrays Element Type In[6]: integers.dtype Out[6]: dtype('int64') In[7]: floats.dtype Out[7]: dtype('float64') - For performance reasons, NumPy is written in the C programming language and uses Cs data types
- Other NumPy types
Determining anarrays Dimensions - ndimcontains anarrays number of dimensions
- shapecontains atuplespecifying anarrays dimensions
In[8]: integers.ndim Out[8]: 2 In[9]: floats.ndim Out[9]: 1 In[10]: integers.shape Out[10]: (2, 3) In[11]: floats.shape Out[11]: (5,) Determining anarrays Number of Elements and Element Size - view anarrays total number of elements withsize
- view number of bytes required to store each element withitemsize
In[12]: integers.size Out[12]: 6 In[13]: integers.itemsize Out[13]: 8 In[14]: floats.size Out[14]: 5 In[15]: floats.itemsize Out[15]: 8 Iterating through a Multidimensionalarrays Elements In[16]: for row in integers: for column in row: print(column, end=' ') print() 1 2 3 4 5 6 - Iterate through a multidimensionalarrayas if it were one-dimensional by usingflat
In[17]: for i in integers.flat: print(i, end=' ') 1 2 3 4 5 6
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.4 Fillingarrays with Specific Values - Functionszeros,onesandfullcreatearrays containing0s,1s or a specified value, respectively
In[1]: import numpy as np In[2]: np.zeros(5) Out[2]: array([0., 0., 0., 0., 0.]) - For a tuple of integers, these functions return a multidimensionalarraywith the specified dimensions
In[3]: np.ones((2, 4), dtype=int) Out[3]: array([[1, 1, 1, 1], [1, 1, 1, 1]]) In[4]: np.full((3, 5), 13) Out[4]: array([[13, 13, 13, 13, 13], [13, 13, 13, 13, 13], [13, 13, 13, 13, 13]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.5 Creatingarrays from Ranges - NumPy provides optimized functions for creatingarrays from ranges
Creating Integer Ranges witharange In[1]: import numpy as np In[2]: np.arange(5) Out[2]: array([0, 1, 2, 3, 4]) In[3]: np.arange(5, 10) Out[3]: array([5, 6, 7, 8, 9]) In[4]: np.arange(10, 1, -2) Out[4]: array([10, 8, 6, 4, 2]) Creating Floating-Point Ranges withlinspace - Produce evenly spaced floating-point ranges with NumPyslinspacefunction
- Ending valueis includedin thearray
In[5]: np.linspace(0.0, 1.0, num=5) Out[5]: array([0. , 0.25, 0.5 , 0.75, 1. ]) Reshaping anarray - arraymethodreshapetransforms an array into different number of dimensions
- New shape must have thesamenumber of elements as the original
In[6]: np.arange(1, 21).reshape(4, 5) Out[6]: array([[ 1, 2, 3, 4, 5], [ 6, 7, 8, 9, 10], [11, 12, 13, 14, 15], [16, 17, 18, 19, 20]]) Displaying Largearrays - When displaying anarray, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output
In[7]: np.arange(1, 100001).reshape(4, 25000) Out[7]: array([[ 1, 2, 3, ..., 24998, 24999, 25000], [ 25001, 25002, 25003, ..., 49998, 49999, 50000], [ 50001, 50002, 50003, ..., 74998, 74999, 75000], [ 75001, 75002, 75003, ..., 99998, 99999, 100000]]) In[8]: np.arange(1, 100001).reshape(100, 1000) Out[8]: array([[ 1, 2, 3, ..., 998, 999, 1000], [ 1001, 1002, 1003, ..., 1998, 1999, 2000], [ 2001, 2002, 2003, ..., 2998, 2999, 3000], ..., [ 97001, 97002, 97003, ..., 97998, 97999, 98000], [ 98001, 98002, 98003, ..., 98998, 98999, 99000], [ 99001, 99002, 99003, ..., 99998, 99999, 100000]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.6 List vs.arrayPerformance: Introducing%timeit - Mostarrayoperations executesignificantlyfaster than corresponding list operations
- IPython%timeitmagiccommand times theaverageduration of operations
Timing the Creation of a List Containing Results of 6,000,000 Die Rolls In[1]: import random In[2]: %timeit rolls_list = \ [random.randrange(1, 7) for i in range(0, 6_000_000)] 6.88 s 276 ms per loop (mean std. dev. of 7 runs, 1 loop each) - By default,%timeitexecutes a statement in a loop, and it runs the loopseventimes
- If you do not indicate the number of loops,%timeitchooses an appropriate value
- After executing the statement,%timeitdisplays the statementsaverageexecution time, as well as the standard deviation of all the executions
Timing the Creation of anarrayContaining Results of 6,000,000 Die Rolls In[3]: import numpy as np In[4]: %timeit rolls_array = np.random.randint(1, 7, 6_000_000) 75.2 ms 2.33 ms per loop (mean std. dev. of 7 runs, 10 loops each) 60,000,000 and 600,000,000 Die Rolls In[5]: %timeit rolls_array = np.random.randint(1, 7, 60_000_000) 916 ms 26.1 ms per loop (mean std. dev. of 7 runs, 1 loop each) In[6]: %timeit rolls_array = np.random.randint(1, 7, 600_000_000) 10.3 s 180 ms per loop (mean std. dev. of 7 runs, 1 loop each) Customizing the %timeit Iterations In[7]: %timeit -n3 -r2 rolls_array = np.random.randint(1, 7, 6_000_000) 74.5 ms 7.58 ms per loop (mean std. dev. of 2 runs, 3 loops each) Other IPython Magics IPython provides dozens of magics for a variety of tasksfor a complete list, see the IPython magics documentation. Here are a few helpful ones: - %loadto read code into IPython from a local file or URL.
- %saveto save snippets to a file.
- %runto execute a .py file from IPython.
- %precisionto change the default floating-point precision for IPython outputs.
- %cdto change directories without having to exit IPython first.
- %editto launch an external editorhandy if you need to modify more complex snippets.
- %historyto view a list of all snippets and commands youve executed in the current IPython session.
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.7arrayOperators - arrayoperators perform operations onentirearrays.
- Can perform arithmeticbetweenarrays and scalar numeric values, andbetweenarrays of the same shape.
In[1]: import numpy as np In[2]: numbers = np.arange(1, 6) In[3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In[4]: numbers * 2 Out[4]: array([ 2, 4, 6, 8, 10]) In[5]: numbers ** 3 Out[5]: array([ 1, 8, 27, 64, 125]) In[6]: numbers # numbers is unchanged by the arithmetic operators Out[6]: array([1, 2, 3, 4, 5]) In[7]: numbers += 10 In[8]: numbers Out[8]: array([11, 12, 13, 14, 15]) Broadcasting - Arithmetic operations require as operands twoarrays of thesame size and shape.
- numbers * 2is equivalent tonumbers * [2, 2, 2, 2, 2]for a 5-element array.
- Applying the operation to every element is calledbroadcasting.
- Also can be applied betweenarrays of different sizes and shapes, enabling some concise and powerful manipulations.
Arithmetic Operations Betweenarrays - Can perform arithmetic operations and augmented assignments betweenarrays of thesameshape
In[9]: numbers2 = np.linspace(1.1, 5.5, 5) In[10]: numbers2 Out[10]: array([1.1, 2.2, 3.3, 4.4, 5.5]) In[11]: numbers * numbers2 Out[11]: array([12.1, 26.4, 42.9, 61.6, 82.5]) Comparingarrays - Can comparearrays with individual values and with otherarrays
- Comparisons performedelement-wise
- Producearrays of Boolean values in which each elementsTrueorFalsevalue indicates the comparison result
In[12]: numbers Out[12]: array([11, 12, 13, 14, 15]) In[13]: numbers >= 13 Out[13]: array([False, False, True, True, True]) In[14]: numbers2 Out[14]: array([1.1, 2.2, 3.3, 4.4, 5.5]) In[15]: numbers2 Out[15]: array([ True, True, True, True, True]) In[16]: numbers == numbers2 Out[16]: array([False, False, False, False, False]) In[17]: numbers == numbers Out[17]: array([ True, True, True, True, True])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.8 NumPy Calculation Methods - These methodsignore thearrays shapeanduse all the elements in the calculations.
- Consider anarrayrepresenting four students grades on three exams:
In[1]: import numpy as np In[2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In[3]: grades Out[3]: array([[ 87, 96, 70], [100, 87, 90], [ 94, 77, 90], [100, 81, 82]]) - Can use methods to calculatesum,min,max,mean,std(standard deviation) andvar(variance)
- Each is a functional-style programmingreduction
In[4]: grades.sum() Out[4]: 1054 In[5]: grades.min() Out[5]: 70 In[6]: grades.max() Out[6]: 100 In[7]: grades.mean() Out[7]: 87.83333333333333 In[8]: grades.std() Out[8]: 8.792357792739987 In[9]: grades.var() Out[9]: 77.30555555555556 Calculations by Row or Column - You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
- Each 2D+ array hasone axis per dimension
- In a 2D array,axis=0indicates calculations should becolumn-by-column
In[10]: grades.mean(axis=0) Out[10]: array([95.25, 85.25, 83. ]) - In a 2D array,axis=1indicates calculations should berow-by-row
In[11]: grades.mean(axis=1) Out[11]: array([84.33333333, 92.33333333, 87. , 87.66666667]) - Other NumpyarrayCalculation Methods
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.9 Universal Functions - Standaloneuniversal functions(ufuncs)performelement-wise operationsusing one or twoarrayor array-like arguments (like lists)
- Each returns anewarraycontaining the results
- Some ufuncs are called when you usearrayoperators like+and*
- Create anarrayand calculate the square root of its values, using thesqrtuniversal function
In[1]: import numpy as np In[2]: numbers = np.array([1, 4, 9, 16, 25, 36]) In[3]: np.sqrt(numbers) Out[3]: array([1., 2., 3., 4., 5., 6.]) - Add twoarrays with the same shape, using theadduniversal function
- Equivalent to:
- numbers + numbers2
In[4]: numbers2 = np.arange(1, 7) * 10 In[5]: numbers2 Out[5]: array([10, 20, 30, 40, 50, 60]) In[6]: np.add(numbers, numbers2) Out[6]: array([11, 24, 39, 56, 75, 96]) Broadcasting with Universal Functions - Universal functions can use broadcasting, just like NumPyarrayoperators
In[7]: np.multiply(numbers2, 5) Out[7]: array([ 50, 100, 150, 200, 250, 300]) In[8]: numbers3 = numbers2.reshape(2, 3) In[9]: numbers3 Out[9]: array([[10, 20, 30], [40, 50, 60]]) In[10]: numbers4 = np.array([2, 4, 6]) In[11]: np.multiply(numbers3, numbers4) Out[11]: array([[ 20, 80, 180], [ 80, 200, 360]]) - Broadcasting rules documentation
Other Universal Functions NumPy universal functions Mathadd,subtract,multiply,divide,remainder,exp,log,sqrt,power, and more. Trigonometrysin,cos,tan,hypot,arcsin,arccos,arctan, and more. Bit manipulationbitwise_and,bitwise_or,bitwise_xor,invert,left_shiftandright_shift. Comparisongreater,greater_equal,less,less_equal,equal,not_equal,logical_and,logical_or,logical_xor,logical_not,minimum,maximum, and more. Floating pointfloor,ceil,isinf,isnan,fabs,trunc, and more.
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.10 Indexing and Slicing - One-dimensionalarrays can beindexedandslicedlike lists.
Indexing with Two-Dimensionalarrays - To select an element in a two-dimensionalarray, specify a tuple containing the elements row and column indices in square brackets
In[1]: import numpy as np In[2]: grades = np.array([[87, 96, 70], [100, 87, 90], [94, 77, 90], [100, 81, 82]]) In[3]: grades Out[3]: array([[ 87, 96, 70], [100, 87, 90], [ 94, 77, 90], [100, 81, 82]]) In[4]: grades[0, 1] # row 0, column 1 Out[4]: 96 Selecting a Subset of a Two-Dimensionalarrays Rows - To select a single row, specify only one index in square brackets
In[5]: grades[1] Out[5]: array([100, 87, 90]) - Select multiple sequential rows with slice notation
In[6]: grades[0:2] Out[6]: array([[ 87, 96, 70], [100, 87, 90]]) - Select multiple non-sequential rows with a list of row indices
In[7]: grades[[1, 3]] Out[7]: array([[100, 87, 90], [100, 81, 82]]) Selecting a Subset of a Two-Dimensionalarrays Columns - Thecolumn indexalso can be a specificindex, asliceor alist
In[8]: grades[:, 0] Out[8]: array([ 87, 100, 94, 100]) In[9]: grades[:, 1:3] Out[9]: array([[96, 70], [87, 90], [77, 90], [81, 82]]) In[10]: grades[:, [0, 2]] Out[10]: array([[ 87, 70], [100, 90], [ 94, 90], [100, 82]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.11 Views: Shallow Copies - Views see the data in other objects, rather than having their own copies of the data
- Views are shallow copies *arraymethodviewreturns anewarray object with aviewof the originalarrayobjects data
In[1]: import numpy as np In[2]: numbers = np.arange(1, 6) In[3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In[4]: numbers2 = numbers.view() In[5]: numbers2 Out[5]: array([1, 2, 3, 4, 5]) - Use built-inidfunction to see thatnumbersandnumbers2aredifferentobjects
In[6]: id(numbers) Out[6]: 4431803056 In[7]: id(numbers2) Out[7]: 4430398928 - Modifying an element in the originalarray, also modifies the view and vice versa
In[8]: numbers[1] *= 10 In[9]: numbers2 Out[9]: array([ 1, 20, 3, 4, 5]) In[10]: numbers Out[10]: array([ 1, 20, 3, 4, 5]) In[11]: numbers2[1] /= 10 In[12]: numbers Out[12]: array([1, 2, 3, 4, 5]) In[13]: numbers2 Out[13]: array([1, 2, 3, 4, 5]) Slice Views - Slices also create views
In[14]: numbers2 = numbers[0:3] In[15]: numbers2 Out[15]: array([1, 2, 3]) In[16]: id(numbers) Out[16]: 4431803056 In[17]: id(numbers2) Out[17]: 4451350368 - Confirm thatnumbers2is a view of only first threenumberselements
In[18]: numbers2[3] ------------------------------------------------------------------------ IndexError Traceback (most recent call last) in ----> 1 numbers2[3] IndexError: index 3 is out of bounds for axis 0 with size 3 - Modify an element botharrays share to show both are updated
In[19]: numbers[1] *= 20 In[20]: numbers Out[20]: array([ 1, 40, 3, 4, 5]) In[21]: numbers2 Out[21]: array([ 1, 40, 3])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.12 Deep Copies - When sharingmutablevalues, sometimes its necessary to create adeep copyof the original data
- Especially important in multi-core programming, where separate parts of your program could attempt to modify your data at the same time, possibly corrupting it
- arraymethodcopyreturns a new array object with an independent copy of the original array's data
In[1]: import numpy as np In[2]: numbers = np.arange(1, 6) In[3]: numbers Out[3]: array([1, 2, 3, 4, 5]) In[4]: numbers2 = numbers.copy() In[5]: numbers2 Out[5]: array([1, 2, 3, 4, 5]) In[6]: numbers[1] *= 10 In[7]: numbers Out[7]: array([ 1, 20, 3, 4, 5]) In[8]: numbers2 Out[8]: array([1, 2, 3, 4, 5]) ModulecopyShallow vs. Deep Copies for Other Types of Python Objects
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.13 Reshaping and Transposing reshapevs.resize - Methodreshapereturns aview(shallow copy) of the originalarraywith new dimensions
- Doesnotmodify the originalarray
In[1]: import numpy as np In[2]: grades = np.array([[87, 96, 70], [100, 87, 90]]) In[3]: grades Out[3]: array([[ 87, 96, 70], [100, 87, 90]]) In[4]: grades.reshape(1, 6) Out[4]: array([[ 87, 96, 70, 100, 87, 90]]) In[5]: grades Out[5]: array([[ 87, 96, 70], [100, 87, 90]]) - Methodresizemodifies the originalarrays shape
In[6]: grades.resize(1, 6) In[7]: grades Out[7]: array([[ 87, 96, 70, 100, 87, 90]]) flattenvs.ravel - Can flatten a multi-dimensonal array into a single dimension with methodsflattenandravel
- flattendeep copiesthe original arrays data
In[8]: grades = np.array([[87, 96, 70], [100, 87, 90]]) In[9]: grades Out[9]: array([[ 87, 96, 70], [100, 87, 90]]) In[10]: flattened = grades.flatten() In[11]: flattened Out[11]: array([ 87, 96, 70, 100, 87, 90]) In[12]: grades Out[12]: array([[ 87, 96, 70], [100, 87, 90]]) In[13]: flattened[0] = 100 In[14]: flattened Out[14]: array([100, 96, 70, 100, 87, 90]) In[15]: grades Out[15]: array([[ 87, 96, 70], [100, 87, 90]]) - Methodravelproduces aviewof the originalarray, whichsharesthegradesarrays data
In[16]: raveled = grades.ravel() In[17]: raveled Out[17]: array([ 87, 96, 70, 100, 87, 90]) In[18]: grades Out[18]: array([[ 87, 96, 70], [100, 87, 90]]) In[19]: raveled[0] = 100 In[20]: raveled Out[20]: array([100, 96, 70, 100, 87, 90]) In[21]: grades Out[21]: array([[100, 96, 70], [100, 87, 90]]) Transposing Rows and Columns - Can quicklytransposeanarrays rows and columns
- flips thearray, so the rows become the columns and the columns become the rows
- Tattributereturns a transposedview(shallow copy) of thearray
In[22]: grades.T Out[22]: array([[100, 100], [ 96, 87], [ 70, 90]]) In[23]: grades Out[23]: array([[100, 96, 70], [100, 87, 90]]) Horizontal and Vertical Stacking - Can combine arrays by adding more columns or more rowsknown ashorizontal stackingandvertical stacking
In[24]: grades2 = np.array([[94, 77, 90], [100, 81, 82]]) - Combinegradesandgrades2with NumPyshstack(horizontal stack) functionby passing a tuple containing the arrays to combine
- The extra parentheses are required becausehstackexpects one argument
- Adds more columns
In[25]: np.hstack((grades, grades2)) Out[25]: array([[100, 96, 70, 94, 77, 90], [100, 87, 90, 100, 81, 82]]) - Combinegradesandgrades2with NumPysvstack(vertical stack) function
- Adds more rows
In[26]: np.vstack((grades, grades2)) Out[26]: array([[100, 96, 70], [100, 87, 90], [ 94, 77, 90], [100, 81, 82]])
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.14.1 pandasSeries
- An enhanced one-dimensional
array
- Supports custom indexing, including even non-integer indices like strings
- Offers additional capabilities that make them more convenient for many data-science oriented tasks
-
Series
may have missing data - Many
Series
operations ignore missing data by default
Creating aSeries
with Default Indices
- By default, a
Series
has integer indices numbered sequentially from 0
In[1]: import pandas as pd In[2]: grades = pd.Series([87, 100, 94]) Creating aSeries
with All Elements Having the Same Value
- Second argument is a one-dimensional iterable object (such as a list, an
array
or arange
) containing theSeries
indices - Number of indices determines the number of elements
In[149]: pd.Series(98.6, range(3)) Out[149]: 0 98.6 1 98.6 2 98.6 dtype: float64 Accessing aSeries
Elements
In[150]: grades[0] Out[150]: 87 Producing Descriptive Statistics for a Series
-
Series
provides many methods for common tasks including producing various descriptive statistics - Each of these is a functional-style reduction
In[151]: grades.count() Out[151]: 3 In[152]: grades.mean() Out[152]: 93.66666666666667 In[153]: grades.min() Out[153]: 87 In[154]: grades.max() Out[154]: 100 In[155]: grades.std() Out[155]: 6.506407098647712 -
Series
methoddescribe
produces all these stats and more - The
25%
,50%
and75%
arequartiles: -
50%
represents the median of the sorted values. -
25%
represents the median of the first half of the sorted values. -
75%
represents the median of the second half of the sorted values.
- For the quartiles, if there are two middle elements, then their average is that quartiles median
In[156]: grades.describe() Out[156]: count 3.000000 mean 93.666667 std 6.506407 min 87.000000 25% 90.500000 50% 94.000000 75% 97.000000 max 100.000000 dtype: float64 Creating aSeries
with Custom Indices
Can specify custom indices with theindex
keyword argument In[157]: grades = pd.Series([87, 100, 94], index=['Wally', 'Eva', 'Sam']) In[158]: grades Out[158]: Wally 87 Eva 100 Sam 94 dtype: int64 Dictionary Initializers
- If you initialize a
Series
with a dictionary, its keys are the indices, and its values become theSeries
element values
In[159]: grades = pd.Series({'Wally': 87, 'Eva': 100, 'Sam': 94}) In[160]: grades Out[160]: Wally 87 Eva 100 Sam 94 dtype: int64 Accessing Elements of aSeries
Via Custom Indices
- Can access individual elements via square brackets containing a custom index value
In[161]: grades['Eva'] Out[161]: 100 - If custom indices are strings that could represent valid Python identifiers, pandas automatically adds them to the
Series
as attributes
In[162]: grades.Wally Out[162]: 87 -
dtype
attributereturns the underlyingarray
s element type
In[163]: grades.dtype Out[163]: dtype('int64') -
values
attributereturns the underlyingarray
In[164]: grades.values Out[164]: array([ 87, 100, 94]) Creating a Series of Strings
- In a
Series
of strings, you can usestr
attributeto call string methods on the elements
In[165]: hardware = pd.Series(['Hammer', 'Saw', 'Wrench']) In[166]: hardware Out[166]: 0 Hammer 1 Saw 2 Wrench dtype: object - Call string method
contains
on each element - Returns a
Series
containingbool
values indicating thecontains
methods result for each element - The
str
attribute provides many string-processing methods that are similar to those in Pythons string type - https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling
In[167]: hardware.str.contains('a') Out[167]: 0 True 1 True 2 False dtype: bool - Use string method
upper
to produce anewSeries
containing the uppercase versions of each element inhardware
In[168]: hardware.str.upper() Out[168]: 0 HAMMER 1 SAW 2 WRENCH dtype: object
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.14.2DataFrames
- Enhanced two-dimensional
array
- Can have custom row and column indices
- Offers additional operations and capabilities that make them more convenient for many data-science oriented tasks
- Support missing data
- Each column in a
DataFrame
is aSeries
Creating aDataFrame
from a Dictionary
- Create a
DataFrame
from a dictionary that represents student grades on three exams
In[1]: import pandas as pd In[2]: grades_dict = {'Wally': [87, 96, 70], 'Eva': [100, 87, 90], 'Sam': [94, 77, 90], 'Katie': [100, 81, 82], 'Bob': [83, 65, 85]} In[3]: grades = pd.DataFrame(grades_dict) - Pandas displays
DataFrame
s in tabular format with indicesleft alignedin the index column and the remaining columns valuesright aligned
In[4]: grades Out[4]: Wally Eva Sam Katie Bob 0 87 100 94 100 83 1 96 87 77 81 65 2 70 90 90 82 85
Customizing aDataFrame
s Indices with theindex
Attribute
- Can use the
index
attributeto change theDataFrame
s indices from sequential integers to labels - Must provide a one-dimensional collection that has the same number of elements as there arerowsin the
DataFrame
In[5]: grades.index = ['Test1', 'Test2', 'Test3'] In[6]: grades Out[6]: Wally Eva Sam Katie Bob Test1 87 100 94 100 83 Test2 96 87 77 81 65 Test3 70 90 90 82 85
Accessing aDataFrame
s Columns
- Can quickly and conveniently look at your data in many different ways, including selecting portions of the data
- Get
Eva
s grades by name - Displays her column as a
Series
In[7]: grades['Eva'] Out[7]: Test1 100 Test2 87 Test3 90 Name: Eva, dtype: int64 - If a
DataFrame
s column-name strings are valid Python identifiers, you can use them as attributes
In[8]: grades.Sam Out[8]: Test1 94 Test2 77 Test3 90 Name: Sam, dtype: int64 Selecting Rows via theloc
andiloc
Attributes
-
DataFrame
s support indexing capabilities with[]
, but pandas documentation recommends using the attributesloc
,iloc
,at
andiat
- Optimized to access
DataFrame
s and also provide additional capabilities
- Access a row by its label via the
DataFrame
sloc
attribute
In[9]: grades.loc['Test1'] Out[9]: Wally 87 Eva 100 Sam 94 Katie 100 Bob 83 Name: Test1, dtype: int64 - Access rows by integer zero-based indices using the
iloc
attribute(thei
iniloc
means that its used with integer indices)
In[10]: grades.iloc[1] Out[10]: Wally 96 Eva 87 Sam 77 Katie 81 Bob 65 Name: Test2, dtype: int64 Selecting Rows via Slices and Lists with theloc
andiloc
Attributes
- Index can be aslice
- When using slices containinglabelswith
loc
, the range specifiedincludesthe high index ('Test3'
):
In[11]: grades.loc['Test1':'Test3'] Out[11]: Wally Eva Sam Katie Bob Test1 87 100 94 100 83 Test2 96 87 77 81 65 Test3 70 90 90 82 85
- When using slices containinginteger indiceswith
iloc
, the range you specifyexcludesthe high index (2
):
In[12]: grades.iloc[0:2] Out[12]: Wally Eva Sam Katie Bob Test1 87 100 94 100 83 Test2 96 87 77 81 65
- Selectspecific rowswith alist
In[13]: grades.loc[['Test1', 'Test3']] Out[13]: Wally Eva Sam Katie Bob Test1 87 100 94 100 83 Test3 70 90 90 82 85
In[14]: grades.iloc[[0, 2]] Out[14]: Wally Eva Sam Katie Bob Test1 87 100 94 100 83 Test3 70 90 90 82 85
Selecting Subsets of the Rows and Columns
- View only
Eva
s andKatie
s grades onTest1
andTest2
In[15]: grades.loc['Test1':'Test2', ['Eva', 'Katie']] Out[15]: Eva Katie Test1 100 100 Test2 87 81
- Use
iloc
with a list and a slice to select the first and third tests and the first three columns for those tests
In[16]: grades.iloc[[0, 2], 0:3] Out[16]: Wally Eva Sam Test1 87 100 94 Test3 70 90 90
Boolean Indexing
- One of pandas more powerful selection capabilities isBoolean indexing
- Select all the A gradesthat is, those that are greater than or equal to 90:
- Pandas checks every grade to determine whether its value is greater than or equal to 90 and, if so, includes it in the new
DataFrame
. - Grades for which the condition is
False
are represented asNaN
(not a number)in the new `DataFrame -
NaN
is pandas notation for missing values
In[17]: grades[grades >= 90] Out[17]: Wally Eva Sam Katie Bob Test1 NaN 100.0 94.0 100.0 NaN Test2 96.0 NaN NaN NaN NaN Test3 NaN 90.0 90.0 NaN NaN
- Select all the B grades in the range 8089
In[18]: grades[(grades >= 80) & (grades Out[18]: Wally Eva Sam Katie Bob Test1 87.0 NaN NaN NaN 83.0 Test2 NaN 87.0 NaN 81.0 NaN Test3 NaN NaN NaN 82.0 85.0
- Pandas Boolean indices combine multiple conditions with the Python operator
&
(bitwise AND),nottheand
Boolean operator - For
or
conditions, use|
(bitwise OR) - NumPy also supports Boolean indexing for
array
s, but always returns a one-dimensional array containing only the values that satisfy the condition
Accessing a SpecificDataFrame
Cell by Row and Column
-
DataFrame
methodat
andiat
attributes get a single value from aDataFrame
In[19]: grades.at['Test2', 'Eva'] Out[19]: 87 In[20]: grades.iat[2, 0] Out[20]: 70 - Can assign new values to specific elements
In[21]: grades.at['Test2', 'Eva'] = 100 In[22]: grades.at['Test2', 'Eva'] Out[22]: 100 In[23]: grades.iat[1, 2] = 87 In[24]: grades.iat[1, 2] Out[24]: 87 Descriptive Statistics
-
DataFrame
sdescribe
methodcalculates basic descriptive statistics for the data and returns them as aDataFrame
- Statistics are calculated by column
In[25]: grades.describe() Out[25]: Wally Eva Sam Katie Bob count 3.000000 3.000000 3.000000 3.000000 3.000000 mean 84.333333 96.666667 90.333333 87.666667 77.666667 std 13.203535 5.773503 3.511885 10.692677 11.015141 min 70.000000 90.000000 87.000000 81.000000 65.000000 25% 78.500000 95.000000 88.500000 81.500000 74.000000 50% 87.000000 100.000000 90.000000 82.000000 83.000000 75% 91.500000 100.000000 92.000000 91.000000 84.000000 max 96.000000 100.000000 94.000000 100.000000 85.000000
- Quick way to summarize your data
- Nicely demonstrates the power of array-oriented programming with a clean, concise functional-style call
- Can control the precision and other default settings with pandas
set_option
function
In[26]: pd.set_option('precision', 2) In[27]: grades.describe() Out[27]: Wally Eva Sam Katie Bob count 3.00 3.00 3.00 3.00 3.00 mean 84.33 96.67 90.33 87.67 77.67 std 13.20 5.77 3.51 10.69 11.02 min 70.00 90.00 87.00 81.00 65.00 25% 78.50 95.00 88.50 81.50 74.00 50% 87.00 100.00 90.00 82.00 83.00 75% 91.50 100.00 92.00 91.00 84.00 max 96.00 100.00 94.00 100.00 85.00
- For student grades, the most important of these statistics is probably the mean
- Can calculate that for each student simply by calling
mean
on theDataFrame
In[28]: grades.mean() Out[28]: Wally 84.33 Eva 96.67 Sam 90.33 Katie 87.67 Bob 77.67 dtype: float64 Transposing theDataFrame
with theT
Attribute
- Can quicklytransposerows and columnsso the rows become the columns, and the columns become the rowsby using the
T
attributeto get a view
In[29]: grades.T Out[29]: Test1 Test2 Test3 Wally 87 96 70 Eva 100 100 90 Sam 94 87 90 Katie 100 81 82 Bob 83 65 85
- Assume that rather than getting the summary statistics by student, you want to get them by test
- Call
describe
ongrades.T
In[30]: grades.T.describe() Out[30]: Test1 Test2 Test3 count 5.00 5.00 5.00 mean 92.80 85.80 83.40 std 7.66 13.81 8.23 min 83.00 65.00 70.00 25% 87.00 81.00 82.00 50% 94.00 87.00 85.00 75% 100.00 96.00 90.00 max 100.00 100.00 90.00
- Get average of all the students grades on each test
In[31]: grades.T.mean() Out[31]: Test1 92.8 Test2 85.8 Test3 83.4 dtype: float64 Sorting by Rows by Their Indices
- Can sort a
DataFrame
by its rows or columns, based on their indices or values - Sort the rows by theirindicesindescendingorder using
sort_index
and its keyword argumentascending=False
In[32]: grades.sort_index(ascending=False) Out[32]: Wally Eva Sam Katie Bob Test3 70 90 90 82 85 Test2 96 100 87 81 65 Test1 87 100 94 100 83
Sorting by Column Indices
- Sort columns into ascending order (left-to-right) by their column names
-
axis=1
keyword argumentindicates that we wish to sort thecolumnindices, rather than the row indices -
axis=0
(the default) sorts therowindices
In[33]: grades.sort_index(axis=1) Out[33]: Bob Eva Katie Sam Wally Test1 83 100 100 94 87 Test2 65 100 81 87 96 Test3 85 90 82 90 70
Sorting by Column Values
- To view
Test1
s grades in descending order so we can see the students names in highest-to-lowest grade order, call methodsort_values
-
by
andaxis
arguments work together to determine which values will be sorted - In this case, we sort based on the column values (
axis=1
) forTest1
In[34]: grades.sort_values(by='Test1', axis=1, ascending=False) Out[34]: Eva Katie Sam Wally Bob Test1 100 100 94 87 83 Test2 100 81 87 96 65 Test3 90 82 90 70 85
- Might be easier to read the grades and names if they were in a column
- Sort the transposed
DataFrame
instead
In[35]: grades.T.sort_values(by='Test1', ascending=False) Out[35]: Test1 Test2 Test3 Eva 100 100 90 Katie 100 81 82 Sam 94 87 90 Wally 87 96 70 Bob 83 65 85
- Since were sorting only
Test1
s grades, we might not want to see the other tests at all - Combine selection with sorting
In[36]: grades.loc['Test1'].sort_values(ascending=False) Out[36]: Katie 100 Eva 100 Sam 94 Wally 87 Bob 83 Name: Test1, dtype: int64 Copy vs. In-Place Sorting
-
sort_index
andsort_values
return acopyof the originalDataFrame
- Could require substantial memory in a big data application
- Can sortin placeby passing the keyword argument
inplace=True
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. Attachments:
- Learn what arrays are and how they differ from lists.
- Use the numpy modules highperformance ndarrays.
- Compare list and ndarray performance with the IPython %timeit magic.
- Use ndarrays to store and retrieve data efficiently.
- Create and initialize ndarrays.
- Refer to individual ndarray elements.
- Iterate through ndarrays.
- Create and manipulate multidimensional ndarrays.
- Perform common ndarray manipulations.
- Create and manipulate pandas one-dimensional Series and two-dimensional DataFrames.
- Customize Series and DataFrame indices.
- Calculate basic descriptive statistics for data in a Series and a DataFrame.
- Customize floating-point number precision in pandas output formatting.
- First appeared in 2006 and is thepreferred Python array implementation.
- High-performance, richly functionaln-dimensional arraytype calledndarray.
- Written in Candup to 100 times faster than lists.
- Critical in big-data processing, AI applications and much more.
- According tolibraries.io,over 450 Python libraries depend on NumPy.
- Many popular data science libraries such as Pandas, SciPy (Scientific Python) and Keras (for deep learning) are built on or depend on NumPy.
- Functional-style programmingwithinternal iterationmakes array-oriented manipulations concise and straightforward, and reduces the possibility of error.
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.2 Creatingarrays from Existing Data
- Creating an array with thearrayfunction
- Argument is anarrayor other iterable
- Returns a newarraycontaining the arguments elements
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.3arrayAttributes
- attributesenable you to discover information about its structure and contents
- NumPy does not display trailing 0s
- For performance reasons, NumPy is written in the C programming language and uses Cs data types
- Other NumPy types
- ndimcontains anarrays number of dimensions
- shapecontains atuplespecifying anarrays dimensions
- view anarrays total number of elements withsize
- view number of bytes required to store each element withitemsize
- Iterate through a multidimensionalarrayas if it were one-dimensional by usingflat
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.4 Fillingarrays with Specific Values
- Functionszeros,onesandfullcreatearrays containing0s,1s or a specified value, respectively
- For a tuple of integers, these functions return a multidimensionalarraywith the specified dimensions
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.5 Creatingarrays from Ranges
- NumPy provides optimized functions for creatingarrays from ranges
- Produce evenly spaced floating-point ranges with NumPyslinspacefunction
- Ending valueis includedin thearray
- arraymethodreshapetransforms an array into different number of dimensions
- New shape must have thesamenumber of elements as the original
- When displaying anarray, if there are 1000 items or more, NumPy drops the middle rows, columns or both from the output
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.6 List vs.arrayPerformance: Introducing%timeit
- Mostarrayoperations executesignificantlyfaster than corresponding list operations
- IPython%timeitmagiccommand times theaverageduration of operations
- By default,%timeitexecutes a statement in a loop, and it runs the loopseventimes
- If you do not indicate the number of loops,%timeitchooses an appropriate value
- After executing the statement,%timeitdisplays the statementsaverageexecution time, as well as the standard deviation of all the executions
- %loadto read code into IPython from a local file or URL.
- %saveto save snippets to a file.
- %runto execute a .py file from IPython.
- %precisionto change the default floating-point precision for IPython outputs.
- %cdto change directories without having to exit IPython first.
- %editto launch an external editorhandy if you need to modify more complex snippets.
- %historyto view a list of all snippets and commands youve executed in the current IPython session.
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.7arrayOperators
- arrayoperators perform operations onentirearrays.
- Can perform arithmeticbetweenarrays and scalar numeric values, andbetweenarrays of the same shape.
- Arithmetic operations require as operands twoarrays of thesame size and shape.
- numbers * 2is equivalent tonumbers * [2, 2, 2, 2, 2]for a 5-element array.
- Applying the operation to every element is calledbroadcasting.
- Also can be applied betweenarrays of different sizes and shapes, enabling some concise and powerful manipulations.
- Can perform arithmetic operations and augmented assignments betweenarrays of thesameshape
- Can comparearrays with individual values and with otherarrays
- Comparisons performedelement-wise
- Producearrays of Boolean values in which each elementsTrueorFalsevalue indicates the comparison result
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.8 NumPy Calculation Methods
- These methodsignore thearrays shapeanduse all the elements in the calculations.
- Consider anarrayrepresenting four students grades on three exams:
- Can use methods to calculatesum,min,max,mean,std(standard deviation) andvar(variance)
- Each is a functional-style programmingreduction
- You can perform calculations by column or row (or other dimensions in arrays with more than two dimensions)
- Each 2D+ array hasone axis per dimension
- In a 2D array,axis=0indicates calculations should becolumn-by-column
- In a 2D array,axis=1indicates calculations should berow-by-row
- Other NumpyarrayCalculation Methods
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.9 Universal Functions
- Standaloneuniversal functions(ufuncs)performelement-wise operationsusing one or twoarrayor array-like arguments (like lists)
- Each returns anewarraycontaining the results
- Some ufuncs are called when you usearrayoperators like+and*
- Create anarrayand calculate the square root of its values, using thesqrtuniversal function
- Add twoarrays with the same shape, using theadduniversal function
- Equivalent to:
- numbers + numbers2
- Universal functions can use broadcasting, just like NumPyarrayoperators
- Broadcasting rules documentation
NumPy universal functions |
---|
Mathadd,subtract,multiply,divide,remainder,exp,log,sqrt,power, and more. |
Trigonometrysin,cos,tan,hypot,arcsin,arccos,arctan, and more. |
Bit manipulationbitwise_and,bitwise_or,bitwise_xor,invert,left_shiftandright_shift. |
Comparisongreater,greater_equal,less,less_equal,equal,not_equal,logical_and,logical_or,logical_xor,logical_not,minimum,maximum, and more. |
Floating pointfloor,ceil,isinf,isnan,fabs,trunc, and more. |
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.10 Indexing and Slicing
- One-dimensionalarrays can beindexedandslicedlike lists.
- To select an element in a two-dimensionalarray, specify a tuple containing the elements row and column indices in square brackets
- To select a single row, specify only one index in square brackets
- Select multiple sequential rows with slice notation
- Select multiple non-sequential rows with a list of row indices
- Thecolumn indexalso can be a specificindex, asliceor alist
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.11 Views: Shallow Copies
- Views see the data in other objects, rather than having their own copies of the data
- Views are shallow copies *arraymethodviewreturns anewarray object with aviewof the originalarrayobjects data
- Use built-inidfunction to see thatnumbersandnumbers2aredifferentobjects
- Modifying an element in the originalarray, also modifies the view and vice versa
- Slices also create views
- Confirm thatnumbers2is a view of only first threenumberselements
- Modify an element botharrays share to show both are updated
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.12 Deep Copies
- When sharingmutablevalues, sometimes its necessary to create adeep copyof the original data
- Especially important in multi-core programming, where separate parts of your program could attempt to modify your data at the same time, possibly corrupting it
- arraymethodcopyreturns a new array object with an independent copy of the original array's data
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.13 Reshaping and Transposing reshapevs.resize
- Methodreshapereturns aview(shallow copy) of the originalarraywith new dimensions
- Doesnotmodify the originalarray
- Methodresizemodifies the originalarrays shape
- Can flatten a multi-dimensonal array into a single dimension with methodsflattenandravel
- flattendeep copiesthe original arrays data
- Methodravelproduces aviewof the originalarray, whichsharesthegradesarrays data
- Can quicklytransposeanarrays rows and columns
- flips thearray, so the rows become the columns and the columns become the rows
- Tattributereturns a transposedview(shallow copy) of thearray
- Can combine arrays by adding more columns or more rowsknown ashorizontal stackingandvertical stacking
- Combinegradesandgrades2with NumPyshstack(horizontal stack) functionby passing a tuple containing the arrays to combine
- The extra parentheses are required becausehstackexpects one argument
- Adds more columns
- Combinegradesandgrades2with NumPysvstack(vertical stack) function
- Adds more rows
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.14.1 pandas
Series
- An enhanced one-dimensional
array
- Supports custom indexing, including even non-integer indices like strings
- Offers additional capabilities that make them more convenient for many data-science oriented tasks
-
Series
may have missing data - Many
Series
operations ignore missing data by default
-
Creating aSeries
with Default Indices
- By default, a
Series
has integer indices numbered sequentially from 0
Creating aSeries
with All Elements Having the Same Value
- Second argument is a one-dimensional iterable object (such as a list, an
array
or arange
) containing theSeries
indices - Number of indices determines the number of elements
Accessing aSeries
Elements
In[150]: grades[0] Out[150]: 87 Producing Descriptive Statistics for a Series
-
Series
provides many methods for common tasks including producing various descriptive statistics - Each of these is a functional-style reduction
-
Series
methoddescribe
produces all these stats and more - The
25%
,50%
and75%
arequartiles:-
50%
represents the median of the sorted values. -
25%
represents the median of the first half of the sorted values. -
75%
represents the median of the second half of the sorted values.
-
- For the quartiles, if there are two middle elements, then their average is that quartiles median
Creating aSeries
with Custom Indices
Can specify custom indices with theindex
keyword argument In[157]: grades = pd.Series([87, 100, 94], index=['Wally', 'Eva', 'Sam']) In[158]: grades Out[158]: Wally 87 Eva 100 Sam 94 dtype: int64 Dictionary Initializers
- If you initialize a
Series
with a dictionary, its keys are the indices, and its values become theSeries
element values
Accessing Elements of aSeries
Via Custom Indices
- Can access individual elements via square brackets containing a custom index value
- If custom indices are strings that could represent valid Python identifiers, pandas automatically adds them to the
Series
as attributes
-
dtype
attributereturns the underlyingarray
s element type
-
values
attributereturns the underlyingarray
Creating a Series of Strings
- In a
Series
of strings, you can usestr
attributeto call string methods on the elements
- Call string method
contains
on each element - Returns a
Series
containingbool
values indicating thecontains
methods result for each element - The
str
attribute provides many string-processing methods that are similar to those in Pythons string type- https://pandas.pydata.org/pandas-docs/stable/api.html#string-handling
- Use string method
upper
to produce anewSeries
containing the uppercase versions of each element inhardware
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs. 7.14.2
DataFrames
- Enhanced two-dimensional
array
- Can have custom row and column indices
- Offers additional operations and capabilities that make them more convenient for many data-science oriented tasks
- Support missing data
- Each column in a
DataFrame
is aSeries
Creating aDataFrame
from a Dictionary
- Create a
DataFrame
from a dictionary that represents student grades on three exams
- Pandas displays
DataFrame
s in tabular format with indicesleft alignedin the index column and the remaining columns valuesright aligned
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
0 | 87 | 100 | 94 | 100 | 83 |
1 | 96 | 87 | 77 | 81 | 65 |
2 | 70 | 90 | 90 | 82 | 85 |
Customizing aDataFrame
s Indices with theindex
Attribute
- Can use the
index
attributeto change theDataFrame
s indices from sequential integers to labels - Must provide a one-dimensional collection that has the same number of elements as there arerowsin the
DataFrame
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | 87 | 100 | 94 | 100 | 83 |
Test2 | 96 | 87 | 77 | 81 | 65 |
Test3 | 70 | 90 | 90 | 82 | 85 |
Accessing aDataFrame
s Columns
- Can quickly and conveniently look at your data in many different ways, including selecting portions of the data
- Get
Eva
s grades by name - Displays her column as a
Series
- If a
DataFrame
s column-name strings are valid Python identifiers, you can use them as attributes
Selecting Rows via theloc
andiloc
Attributes
-
DataFrame
s support indexing capabilities with[]
, but pandas documentation recommends using the attributesloc
,iloc
,at
andiat
- Optimized to access
DataFrame
s and also provide additional capabilities
- Optimized to access
- Access a row by its label via the
DataFrame
sloc
attribute
- Access rows by integer zero-based indices using the
iloc
attribute(thei
iniloc
means that its used with integer indices)
Selecting Rows via Slices and Lists with theloc
andiloc
Attributes
- Index can be aslice
- When using slices containinglabelswith
loc
, the range specifiedincludesthe high index ('Test3'
):
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | 87 | 100 | 94 | 100 | 83 |
Test2 | 96 | 87 | 77 | 81 | 65 |
Test3 | 70 | 90 | 90 | 82 | 85 |
- When using slices containinginteger indiceswith
iloc
, the range you specifyexcludesthe high index (2
):
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | 87 | 100 | 94 | 100 | 83 |
Test2 | 96 | 87 | 77 | 81 | 65 |
- Selectspecific rowswith alist
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | 87 | 100 | 94 | 100 | 83 |
Test3 | 70 | 90 | 90 | 82 | 85 |
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | 87 | 100 | 94 | 100 | 83 |
Test3 | 70 | 90 | 90 | 82 | 85 |
Selecting Subsets of the Rows and Columns
- View only
Eva
s andKatie
s grades onTest1
andTest2
Eva | Katie | |
---|---|---|
Test1 | 100 | 100 |
Test2 | 87 | 81 |
- Use
iloc
with a list and a slice to select the first and third tests and the first three columns for those tests
Wally | Eva | Sam | |
---|---|---|---|
Test1 | 87 | 100 | 94 |
Test3 | 70 | 90 | 90 |
Boolean Indexing
- One of pandas more powerful selection capabilities isBoolean indexing
- Select all the A gradesthat is, those that are greater than or equal to 90:
- Pandas checks every grade to determine whether its value is greater than or equal to 90 and, if so, includes it in the new
DataFrame
. - Grades for which the condition is
False
are represented asNaN
(not a number)in the new `DataFrame -
NaN
is pandas notation for missing values
- Pandas checks every grade to determine whether its value is greater than or equal to 90 and, if so, includes it in the new
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | NaN | 100.0 | 94.0 | 100.0 | NaN |
Test2 | 96.0 | NaN | NaN | NaN | NaN |
Test3 | NaN | 90.0 | 90.0 | NaN | NaN |
- Select all the B grades in the range 8089
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test1 | 87.0 | NaN | NaN | NaN | 83.0 |
Test2 | NaN | 87.0 | NaN | 81.0 | NaN |
Test3 | NaN | NaN | NaN | 82.0 | 85.0 |
- Pandas Boolean indices combine multiple conditions with the Python operator
&
(bitwise AND),nottheand
Boolean operator - For
or
conditions, use|
(bitwise OR) - NumPy also supports Boolean indexing for
array
s, but always returns a one-dimensional array containing only the values that satisfy the condition
Accessing a SpecificDataFrame
Cell by Row and Column
-
DataFrame
methodat
andiat
attributes get a single value from aDataFrame
- Can assign new values to specific elements
Descriptive Statistics
-
DataFrame
sdescribe
methodcalculates basic descriptive statistics for the data and returns them as aDataFrame
- Statistics are calculated by column
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
count | 3.000000 | 3.000000 | 3.000000 | 3.000000 | 3.000000 |
mean | 84.333333 | 96.666667 | 90.333333 | 87.666667 | 77.666667 |
std | 13.203535 | 5.773503 | 3.511885 | 10.692677 | 11.015141 |
min | 70.000000 | 90.000000 | 87.000000 | 81.000000 | 65.000000 |
25% | 78.500000 | 95.000000 | 88.500000 | 81.500000 | 74.000000 |
50% | 87.000000 | 100.000000 | 90.000000 | 82.000000 | 83.000000 |
75% | 91.500000 | 100.000000 | 92.000000 | 91.000000 | 84.000000 |
max | 96.000000 | 100.000000 | 94.000000 | 100.000000 | 85.000000 |
- Quick way to summarize your data
- Nicely demonstrates the power of array-oriented programming with a clean, concise functional-style call
- Can control the precision and other default settings with pandas
set_option
function
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
count | 3.00 | 3.00 | 3.00 | 3.00 | 3.00 |
mean | 84.33 | 96.67 | 90.33 | 87.67 | 77.67 |
std | 13.20 | 5.77 | 3.51 | 10.69 | 11.02 |
min | 70.00 | 90.00 | 87.00 | 81.00 | 65.00 |
25% | 78.50 | 95.00 | 88.50 | 81.50 | 74.00 |
50% | 87.00 | 100.00 | 90.00 | 82.00 | 83.00 |
75% | 91.50 | 100.00 | 92.00 | 91.00 | 84.00 |
max | 96.00 | 100.00 | 94.00 | 100.00 | 85.00 |
- For student grades, the most important of these statistics is probably the mean
- Can calculate that for each student simply by calling
mean
on theDataFrame
Transposing theDataFrame
with theT
Attribute
- Can quicklytransposerows and columnsso the rows become the columns, and the columns become the rowsby using the
T
attributeto get a view
Test1 | Test2 | Test3 | |
---|---|---|---|
Wally | 87 | 96 | 70 |
Eva | 100 | 100 | 90 |
Sam | 94 | 87 | 90 |
Katie | 100 | 81 | 82 |
Bob | 83 | 65 | 85 |
- Assume that rather than getting the summary statistics by student, you want to get them by test
- Call
describe
ongrades.T
Test1 | Test2 | Test3 | |
---|---|---|---|
count | 5.00 | 5.00 | 5.00 |
mean | 92.80 | 85.80 | 83.40 |
std | 7.66 | 13.81 | 8.23 |
min | 83.00 | 65.00 | 70.00 |
25% | 87.00 | 81.00 | 82.00 |
50% | 94.00 | 87.00 | 85.00 |
75% | 100.00 | 96.00 | 90.00 |
max | 100.00 | 100.00 | 90.00 |
- Get average of all the students grades on each test
Sorting by Rows by Their Indices
- Can sort a
DataFrame
by its rows or columns, based on their indices or values - Sort the rows by theirindicesindescendingorder using
sort_index
and its keyword argumentascending=False
Wally | Eva | Sam | Katie | Bob | |
---|---|---|---|---|---|
Test3 | 70 | 90 | 90 | 82 | 85 |
Test2 | 96 | 100 | 87 | 81 | 65 |
Test1 | 87 | 100 | 94 | 100 | 83 |
Sorting by Column Indices
- Sort columns into ascending order (left-to-right) by their column names
-
axis=1
keyword argumentindicates that we wish to sort thecolumnindices, rather than the row indices-
axis=0
(the default) sorts therowindices
-
Bob | Eva | Katie | Sam | Wally | |
---|---|---|---|---|---|
Test1 | 83 | 100 | 100 | 94 | 87 |
Test2 | 65 | 100 | 81 | 87 | 96 |
Test3 | 85 | 90 | 82 | 90 | 70 |
Sorting by Column Values
- To view
Test1
s grades in descending order so we can see the students names in highest-to-lowest grade order, call methodsort_values
-
by
andaxis
arguments work together to determine which values will be sorted- In this case, we sort based on the column values (
axis=1
) forTest1
- In this case, we sort based on the column values (
Eva | Katie | Sam | Wally | Bob | |
---|---|---|---|---|---|
Test1 | 100 | 100 | 94 | 87 | 83 |
Test2 | 100 | 81 | 87 | 96 | 65 |
Test3 | 90 | 82 | 90 | 70 | 85 |
- Might be easier to read the grades and names if they were in a column
- Sort the transposed
DataFrame
instead
Test1 | Test2 | Test3 | |
---|---|---|---|
Eva | 100 | 100 | 90 |
Katie | 100 | 81 | 82 |
Sam | 94 | 87 | 90 |
Wally | 87 | 96 | 70 |
Bob | 83 | 65 | 85 |
- Since were sorting only
Test1
s grades, we might not want to see the other tests at all - Combine selection with sorting
Copy vs. In-Place Sorting
-
sort_index
andsort_values
return acopyof the originalDataFrame
- Could require substantial memory in a big data application
- Can sortin placeby passing the keyword argument
inplace=True
19922020 by Pearson Education, Inc. All Rights Reserved. This content is based on Chapter 5 of the bookIntro to Python for Computer Science and Data Science: Learning to Program with AI, Big Data and the Cloud. DISCLAIMER: The authors and publisher of this book have used their best efforts in preparing the book. These efforts include the development, research, and testing of the theories and programs to determine their effectiveness. The authors and publisher make no warranty of any kind, expressed or implied, with regard to these programs or to the documentation contained in these books. The authors and publisher shall not be liable in any event for incidental or consequential damages in connection with, or arising out of, the furnishing, performance, or use of these programs.
Attachments:
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access with AI-Powered Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started