Answered step by step
Verified Expert Solution
Link Copied!

Question

1 Approved Answer

Reading multiple files for processing In this task, you are asked you to read and process whats in a number of json data files located

Reading multiple files for processing

In this task, you are asked you to read and process whats in a number of json data files located in a directory in your computers file system. How should you go about doing this? And what if the json files you want to read are in a directory with other kinds of files?

The assignment data are in this zip file linked below:

https://mega.nz/#!VnAnES7T!Pd1cYATTG6G8qCLXq6nHORBosJy9lCKdUb6IGR1j2B8

There are different ways you could read the files you need to process. You may be familiar with the Python os package. os provides a very large number of operation system level functions and tools, including some for showing the current working directory, changing the current working directory, listing the files in a directory, and other often useful things.

Another useful package is the glob package. This package has methods for pathnames matching particular patterns.

Your Python distribution already has os, and probably also glob. Put the json files in a directory on your computer. Then, using Python, do the following first using os, and then using glob:

1. Read the names of the json files from the directory;

2. Print out the json file names.

The way you do this should work for an arbitrarily large number of files, and also with a directory with files other than just json files. You can add some other kinds of files (e.g., .txt., .xls, .ipynb) to your directory to be sure that your code selects files correctly. Share your code and your output here on Canvas.

Note that glob has different methods producing the names of files that match a specified pattern. One of them is an iterator, and the other returns a Python list. When you do the above using glob, try both methods. How do they differ? What conditions would cause you to prefer one over the other?

As a last tantalizing tidbit, what if your directory contains what appear to be hotel json files that actually arent json at all, but are something else, like .xls or .docx, even though they have names and extensions like the hotel files youre working with? That is, what if there was a file named 1000666.json mixed in with real hotel review json files that is really an .xls file? Is there a way you can catch a problem like this?

Step by Step Solution

There are 3 Steps involved in it

Step: 1

blur-text-image

Get Instant Access with AI-Powered Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

blur-text-image

Step: 3

blur-text-image

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Students also viewed these Databases questions