Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 22, 2024

C PROGRAMMING Introduction File formats for this part of the project are the same as in the first. The CSV file with movie metadata will

C PROGRAMMING

Introduction File formats for this part of the project are the same as in the first. The CSV file with movie metadata will remain the same. The sorting algorithm will also remain the same. If you properly modularized your code in Project 0, you should be able to reuse almost all of your code. In this project, you will read in a directory name and walk through the directory structure to find all .csv files. There may be multiple levels of directories that you will need to recurse through. You will then fork child processes to sort each of the files and output the results to a different file. You should NOT use exec for this project. You should write one program that, when copied from the parent to the child process, will continue running. You can use the return value of fork() in a conditional to choose different blocks of code to run within your code. You will want to make sure to prevent zombie and orphan child processes. You will also want to make sure you to not create fork bombs that will bring down machines. In all cases of bad input, you should fail gracefully (e.g. no segfaults). You will output metadata about your processes to STDOUT. This metadata will show the total number of processes created and the pids of all created processes. Methodology a. Parameters Your code will read in a set of parameters via the command line. Records will be stored in .csv files in the provided directory. As mentioned above, directory structures may have multiple levels and you must find all .csv files. Your code should ignore non .csv files and .csv files that do not have the correct format of the movie_metadata csv (e.g. csv files that have other random data in them). Remember, the first record (line) is the column headings and should not be sorted as data. Your code must take in a command-line parameter to determine which value type (column) to sort on. If that parameter is not present (?-> throw an error, or default behavior). The first argument to your program will be '-c' to indicate sorting by column and the second will be the column name: ./sorter -c food Be sure to check the arguments are there and that they correspond to a listed value type (column heading) in the CSV. For this phase you'll extend your flags from one to three. The second parameter to your program will be -d indicating the directory the program should search for .csv files. This parameter is optional. The default behavior will search the current directory. ./sorter -c food -d thisdir/thatdir The third parameter to your program will be -o indicating the output directory for the sorted versions of the input file. This parameter is optional. The default behavior will be to output in the same directory as the source file. ./sorter -c movie_title -d thisdir -o thatdir b. Operation Your code will be reading in and traversing the entire directory. In order to run your code to test it, you will need to open each CSV and read it for processing: ./sorter -c movie_title -d thisdir -o thatdir Your code's output will be a series of new CSV files outputted to the file whose name is the name of the CSV file sorted, with "-sorted-" postpended to the name. e.g: 'movie_metadata.csv' sorted on the field 'movie_title' will result in a file named "movie_metadata-sorted-movie_title.csv". On each new file in a directory you encounter, you should fork() a child process to do the actual sorting. On each new directory you encounter, you should fork() a child process to process the directory. To STDOUT, output the following metadata in the shown order: Initial PID: XXXXX PIDS of all child processes: AAA,BBB,CCC,DDD,EEE,FFF, etc Total number of processes: ZZZZZ You may assume the total number of files and directories will not exceed 255. c. Structure Your code should use Mergesort to do the actual sorting of records. It is a powerful algorithm with an excellent average case. Results: Submit your "sorter.c", "sorter.h" and "mergesort.c" as well as any other source files your header file references.

-----------------------------------------------------------

HERE IS THE CODE I GOT SO FAR.

SORTER.C

#include "sorter.h"

#include

#include "mergesort.c"

FILE *file;

void addTail(LL* dlist, data cdata){ //takes a linked list and a data, adds the data to a node and puts it at the end of the linked list

Node* datanode = (Node*) malloc(sizeof(Node)); //allocate space for a node

datanode->next = NULL;

datanode->ndata = cdata; //set the new node's ndata to cdata

if((dlist->head == NULL) && (dlist->tail == NULL))

{ //empty list case

dlist->head = datanode;

dlist->tail = datanode;

dlist->count++;

}

else if((dlist->head == dlist->tail)){ //single node case

dlist->tail = datanode;

dlist->head->next = dlist->tail;

dlist->count++;

}

else{// every other case

dlist->tail->next = datanode;

dlist->tail = datanode;

dlist->count++;

}

void Finish(LL* dlist){ //frees everything in a LL

Node* temp = dlist->head;

int n = 0;

while(temp!= NULL){

char** ftemp = temp->ndata.fielddata;

for(n = 0;n < dlist->numfields;n++){

free(ftemp[n]);

}

free(ftemp);

Node* oldtemp = temp;

temp = temp->next;

free(oldtemp);

}

for(n = 0;n < dlist->numfields;n++){

free(dlist->fields[n]);

}

free(dlist->fields);

free(dlist->types);

free(dlist);

}

void initializeList(LL* dlist, char* fields){ //initializes the numfields, types, and fields in the linked list

char* temp = strdup(fields);

char* currvar = (char*) strtok(temp, ",");

int totalfields = 0; //counts total fields, determined by the number of tokens found

while(currvar != 0){ //discovers total number of fields

totalfields++;

currvar = (char*) strtok(NULL,",");

}

dlist->numfields = totalfields;

dlist->types = (int*) malloc(sizeof(int)*dlist->numfields); //allocates fields for keeping track of what type each field is, to be used once all data is received

dlist->fields = (char**) malloc(sizeof(char*)*dlist->numfields); //keeps track of the name of the fields, to be used later when sorting

free(temp);

temp = strdup(fields);

currvar = (char*) strtok(temp, ",");

int n;

for(n = 0;n < dlist->numfields;n++){

if(n == dlist->numfields-1){ //strips the newline character for the last field

int specchar = strcspn(currvar, " "); //find the location of the first or the first

if(specchar != 0){ //if the location isn't the start

currvar[specchar] = '\0'; //remove the special characters

}

dlist->fields[n] = strdup(currvar);

currvar = (char*) strtok(NULL,",");

}

free(temp);

free(currvar);

}

void initializeListTypes(LL* dlist){ //determines what type of data belongs to what field

if(dlist->head == NULL){

return;

}

Node* temp = dlist->head;

int n = 0;

for(n;n < dlist->numfields;n++){

char* tempstr = temp->ndata.fielddata[n];

char* endptr;

long lg = 0;

float fl = 0.0;

if(n == dlist->numfields-1){ //if this field is the last field on the line, remove and

int specchar = strcspn(tempstr, " "); //find the location of the first or the first

if(specchar != 0){ //if the location isn't the start

tempstr[specchar] = '\0'; //remove the special characters

}

if(strchr(tempstr, '.')){ //if it has a '.', it's a string or a float/double

fl = strtof(tempstr, &endptr);

if((endptr == tempstr) || (endptr != (tempstr+strlen(tempstr)))){ //if endptr == tempstr, no floats at the start, the other conditional checks to make sure that the end of the read in float is the end of the string inputted

dlist->types[n] = 0;

}

else{

dlist->types[n] = 2;

}

else{ //if it has no '.', it's either an int or a string

lg = strtol(tempstr, &endptr, 10);

dlist->types[n] = 0;

}

else{

dlist->types[n] = 1;

}

void export(LL* dlist){

// output to a file FOR TESTING PURPOSES

//char const *fileName = "sortedmovies.csv";

//FILE *fp = fopen(fileName, "w");

//if(fp == NULL){

// printf("error");

//}

int n = 0;

for(n = 0;n < dlist->numfields;n++){

if(n != dlist->numfields-1){

// printf("%s,",dlist->fields[n]);

fprintf(stdout,"%s,", dlist->fields[n]);

}

else{

// printf("%s ",dlist->fields[n]);

fprintf(stdout, "%s ",dlist->fields[n]);

}

Node* temp = dlist->head;

while(temp != NULL){

if(temp->ndata.comma == NULL){ //if there's no commas in the fields

for(n = 0;n < dlist->numfields;n++){

if(n != dlist->numfields-1){

// printf("%s,",temp->ndata.fielddata[n]);

fprintf(stdout,"%s,",temp->ndata.fielddata[n]);

}

else{

// printf("%s ",temp->ndata.fielddata[n]);

fprintf(stdout,"%s ",temp->ndata.fielddata[n]);

}

else{

for(n = 0;n < dlist->numfields;n++){

if(temp->ndata.comma[n] == 0){

if(n != dlist->numfields-1){

//printf("%s,",temp->ndata.fielddata[n]);

fprintf(stdout,"%s,",temp->ndata.fielddata[n]);

}

else{

//printf("%s ",temp->ndata.fielddata[n]);

fprintf(stdout,"%s ",temp->ndata.fielddata[n]);

}

else{

if(n != dlist->numfields-1){

//printf("\"%s\",",temp->ndata.fielddata[n]);

fprintf(stdout,"\"%s\",",temp->ndata.fielddata[n]);

}

else{

//printf("\"%s\" ",temp->ndata.fielddata[n]);

fprintf(stdout,"\"%s\" ",temp->ndata.fielddata[n]);

}

temp = temp->next;

}

//fclose(fp);

}

int main(int argc, char *argv[]){

//Constructs an array containing all of the data data into cdata

file = stdin;

LL* dlist = (LL*) malloc(sizeof(LL)); //malloc a data linked list to which data nodes will be added

char* test = (char*) malloc(sizeof(char)*1000); //for getting data from the file

memset(test, 0 , sizeof(char)*1000);

//i should make sure there's no problem when there's more than 1000 chars

fgets(test,1000,file); //skips the column line

initializeList(dlist, test);

char c;

int currentvar=0;

int count = 0;

int place = 0;

int quote = 0;

int x = 0;

data cdatanode = {};

cdatanode.fielddata = (char**) malloc(sizeof(char*)*dlist->numfields);

cdatanode.comma = NULL;

memset(test, 0 , sizeof(char)*1000);

while((c = fgetc(file)) != EOF){

if(c < 0 || c > 128){

continue;

}

if(x < 200)

{

printf("C is %c ", c);

printf("Test is %s ",test);

}

if((c == ',' || c == ' ') && (quote == 0)){

cdatanode.fielddata[currentvar] = strdup(test); //copy test to cdatanode STRDUP IS MALLOC, REMEMBER TO FREE

if(c == ' '){

addTail(dlist, cdatanode);

count++;

currentvar = 0;

place = 0;

memset(test, 0 , sizeof(char)*1000);

cdatanode.fielddata = (char**) malloc(sizeof(char*)*dlist->numfields); //get new char array for cdatanode

cdatanode.comma = NULL;

}

else{

currentvar++;

place = 0;

memset(test, 0 , sizeof(char)*1000);

}

else if(c == '"'){

if(quote == 0){

//insert comma handling here, somehow

cdatanode.comma = (int*) malloc(sizeof(int)*dlist->numfields);

memset(cdatanode.comma,0,sizeof(sizeof(int)*dlist->numfields));

cdatanode.comma[currentvar] = 1;

quote = 1;

}

else{

quote = 0;

}

else{

test[place] = c;

place++;

}

x++;

}

free(cdatanode.fielddata);

free(test);

initializeListTypes(dlist);

if(argc >= 3){

if(strcmp("-c", argv[1]) == 0){

mergeSortBegin(dlist, argv[2]);

}

export(dlist); //exports the dlist into csv

}

SORTER.H

//Suggestion: define a struct that mirrors a record (row) of the data set

#ifndef _SORTER_H_

#define _SORTER_H_

typedef struct _data { //THIS DATA STRUCT IS CURRENTLY BAD, PUT DATA IN NODE

char** fielddata; //2d array to contain string types

int* comma; //does the node have a comma in the field n? 0 = no, 1 = yes

//might want to make the data types a linked list also?

//or make a LL to keep track when creating, then place into array??

} data;

typedef struct _Node {

data ndata; //data in a node

//char** data; use this instead of a data struct

struct _Node* next;

}Node;

typedef struct _LL { //linked list that contains the nodes with the movies

Node* head;

Node* tail;

int count; //count of total nodes in list

int numfields; //count of the total fields in the csv

int sortingfield; //keeps track of the field to be sorted, where sortingfield = # field sorted

int sortingtype; //keeps track of what type is being sorted, 0 = string, 1 = int, 2 = float

int* types; //keeps track of types, 0 = string, 1 = int, 2 = float

char** fields; //keeps track of the name fields

}LL;

//Suggestion: prototype a mergesort function

LL* mergesortBegin(LL* dlist, char* field);

Node* mergeSort(LL* dlist, Node* head);

Node* split(Node* head);

Node* merge(LL* dlist, Node* left, Node* right);

#endif

MERGESORT.C

#include "sorter.h"

#include

FILE *file;

void mergeSortBegin(LL* dlist, char* field){ //takes in a data struct array and a field to sort by

if((dlist->head == NULL) || (dlist->head == dlist->tail)){ //if no nodes or one nodes, already solved

printf("There are no entries in the csv, cannot sort.");

return;

}

int n = 0;

int found = 0;

for(n = 0;n < dlist->numfields; n++){ //determines the field to be sorted

if(strcmp(field,dlist->fields[n]) == 0){ //note: 0 = string, 1 = int, 2 = float

dlist->sortingfield = n; //determines the field to sort by

//printf("%d ", n);

//printf("%s ", field);

found = 1;

}

if(found == 0){

printf("Field not found. Please sort by one of the fields in the csv file. ");

}

dlist->sortingtype = dlist->types[dlist->sortingfield];

//printf("%d ", dlist->sortingtype);

dlist->head = mergeSort(dlist,dlist->head);

}

Node* mergeSort(LL* dlist, Node* head){//note: String's mergesort

if((head == NULL) || (head->next == NULL)){

return head;

}

Node* temp = head;

int n = 0;

while(temp != NULL){

if(strcmp(temp->ndata.fielddata[11], "xXx ") ==0){

// printf("%s, %d", temp->ndata.fielddata[11],n);

}

temp = temp->next;

n++;

}

Node* mid = split(head);

head = mergeSort(dlist, head);

mid = mergeSort(dlist, mid);

head = merge(dlist,head,mid);

return head;

}

Node* split(Node* head){ //given a head node, returns a pointer to the middle node

if((head == NULL) || (head->next == NULL)){

return NULL;

}

Node* temp = head->next;

Node* prev = head;

while(temp != NULL){

temp = temp->next;

if(temp != NULL){

prev = prev->next;

temp = temp->next;

}

Node* midpt = prev->next;

prev->next = NULL;

return midpt;

}

Node* merge(LL* dlist, Node* left, Node* right){

if( left == NULL){

return right;

}

else if(right == NULL){

return left;

}

if(dlist->sortingtype == 0){ //if sorting a string

//alphabetic sorting method

char* leftstr = strdup(left->ndata.fielddata[dlist->sortingfield]);

int n = 0;

for(n = 0;n < strlen(leftstr);n++){

leftstr[n] = tolower(leftstr[n]);

}

char* rightstr = strdup(right->ndata.fielddata[dlist->sortingfield]);

for(n = 0;n < strlen(rightstr);n++){

rightstr[n] = tolower(rightstr[n]);

}

char* leftstr = left->ndata.fielddata[dlist->sortingfield];

char* rightstr = right->ndata.fielddata[dlist->sortingfield];

int cmp = strcmp(leftstr,rightstr);

Node* final = NULL;

if(cmp <= 0){

final = left;

final->next = merge(dlist,left->next,right);

}

else{

final = right;

final->next = merge(dlist,left,right->next);

}

return final;

}

// if its an int

else if(dlist->sortingtype == 1) {

// convert the char string to an int

Node* final = NULL;

int leftNumber = atoi(left->ndata.fielddata[dlist->sortingfield]);

int rightNumber = atoi(right->ndata.fielddata[dlist->sortingfield]);

if(leftNumber <= rightNumber) {

final = left;

final->next = merge(dlist,left->next,right);

} else {

final = right;

final->next = merge(dlist,left,right->next);

}

return final;

}

Step by Step Solution

There are 3 Steps involved in it

Step: 1

Get Instant Access to Expert-Tailored Solutions

See step-by-step solutions with expert insights and AI powered tools for academic success

Step: 2

Step: 3

Ace Your Homework with AI

Get the answers you need in no time with our AI-driven, step-by-step assistance

Get Started

Recommended Textbook for

Machine Learning And Knowledge Discovery In Databases European Conference Ecml Pkdd 2015 Porto Portugal September 7 11 2015 Proceedings Part 2 Lnai 9285

Authors: Annalisa Appice ,Pedro Pereira Rodrigues ,Vitor Santos Costa ,Joao Gama ,Alipio Jorge ,Carlos Soares

What are two important advantages of using the indirect plan for bad news messages? (Objectives 2 and 3)

Answered: 1 week ago

Previous Question Next Question