Question
please fix this python code: I am pretty sure the problem is with the getValueAtLine function, but the error is line 50, in main if
please fix this python code: I am pretty sure the problem is with the getValueAtLine function, but the error is line 50, in main if currMovie[1] == "movie" and currMovie[4] == "0": IndexError: list index out of range
I think the getValueAtLine function could be to blame....something to do with all the escape characters it reads from the files. these three files that i am reading from are very large files, averaging about 5 million lines
This point is to write python code to read files from IMDb (internet movie database) and pick 20 random movies(not tv shows) and not adult movies and to write them along with their principal cast members to a text document. Fields should be separated by tab characters, and there should be one record per line.
import random # open files # read files # do some stuff # close files open('name.basics1.tsv', encoding="utf8") titles = open('title.basics1.tsv', encoding="utf8") #open('titles.txt', encoding="utf8") # open('title.principals1.tsv', encoding="utf8") # code to count the lines of the specific file def countLines(fileObj): #fileObj = open(fileName, encoding='utf8') counter = 0 line = " " while not line == "": counter += 1 line = fileObj.readline() fileObj.seek(0) return counter # gets the value of each line def getValueAtLine(fileObj, lineNum): #fileObj = open(fileName, encoding='utf8') counter = 0 line = " " while not counter == lineNum: counter += 1 line = fileObj.readline() fileObj.seek(0) return line # main function part 1: starts by getting all the lines in the file, opens, # and creating an empty array of movies. While the "movies" array is less than 20, # it will go through the line numbers randomly and grab the value at the line, split it at the tab, # and if the movie is a movie (currMovie[1]=="movie" and is NOT an adult movie (currmovie[4]=="0") # then we will append that currMovie value to the movies array. def main(): titles = open("title.basics1.tsv", encoding="utf8") maxLines = countLines(titles) movies = [] while not len(movies) == 20: lineNumber = int(random.uniform(2, maxLines)) currMovie = getValueAtLine(titles, lineNumber).split("\t") if currMovie[1] == "movie" and currMovie[4] == "0": movies.append(currMovie) titles.close() # this code is getting all the lines of the principal cast. # getting the cast at a specific line, splitting them at the tabs, # and then for currMovie in the array of movies, seeing if the # tconst of titles.basics matches the tconst of titles.principals # then we split them at the commas(getting principal cast) castNum = open("title.principals1.tsv") line = " " while not line == "": line = castNum.readline()[:-1] lineSplit = line.split("\t") for currMovie in movies: if currMovie[0] == lineSplit[0]: currMovie.append(lineSplit[1].split(",")) castNum.close() # gets the cast names for everything, splits at tabs, and appends each # of the cast in the currMovie[-1] column to the movies array # with corresponding numbers castName = open("name.basics1.tsv") line = " " while not line == "": line = castName.readline() lineSplit = line.split("\t") for currMovie in movies: for cast in currMovie[-1]: if lineSplit[0] == cast: currMovie[-1].append(lineSplit[1]) currMovie[-1].remove(lineSplit[0]) castName.close() # writes everything to the titles.txt document titleAndCast = open("titles.txt", 'w') string = "" for currMovie in movies: string += currMovie[2] for cast in currMovie[-1]: string += "\t" + cast string += " " titleAndCast.write(string) titleAndCast.close() # runs the main function main()
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started