Question
I need help cleaning a dataset please provide the code it can be downloaded from here https://www.kaggle.com/tmdb/tmdb-movie-metadata/data what i have done so far below, dont
I need help cleaning a dataset please provide the code
it can be downloaded from here https://www.kaggle.com/tmdb/tmdb-movie-metadata/data
what i have done so far below, dont mind the importing because I will use the rest when I have a clean set.
from datetime import timedelta, date import datetime import numpy as np import pandas as pd import string import re import csv import requests import string
data from https://www.kaggle.com/tmdb/tmdb-movie-metadata/data df_movies = pd.read_csv('tmdb_5000_movies.csv', delimiter = ',', header = 0, skipinitialspace = True)
df_movies.drop(columns='homepage', inplace=True) df_movies.drop(columns='popularity', inplace=True) df_movies.drop(columns='overview', inplace=True) df_movies.drop(columns='status', inplace=True) df_movies.drop(columns='tagline', inplace=True) df_movies.drop(columns='vote_average', inplace=True) df_movies.drop(columns='vote_count', inplace=True) df_movies.drop(columns='id', inplace=True)
df_movies.drop(columns='id', inplace=True)
df_movies.head()
I want it so that the 'genres' column only says the genre whether it is action adventure and so on. Same goes for 'production_company' and 'production_country' and 'spoken_language'.
Then I need you to remove all rows where 'spoken_language is not english or en, and create a separate column with just the year of the movie's release, titled 'release_year' and order it by 'release-year' and then 'revenue'.
Thanks!
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started