Answered step by step

Verified Expert Solution

Link Copied!

Question

1 Approved Answer

Posted on Sep 24, 2024

JAVA Trying to use buffered reader and buffered writer to find invalid tags and values in data. IIf Ris encounters a duplicate DO value while

JAVA

Trying to use buffered reader and buffered writer to find invalid tags and values in data. IIf Ris encounters a duplicate DO value while reading the data file, it must provide an error message indicating that it encountered a duplicate and the title of the publication at which the duplicate was found, then move on to reading, processing, and outputting the next publication in data file. The exact format of the error must be as follows: Duplicate DO: r for Title: n. Second, If Ris encounters any invalid TAG or TY type, it must mark that ris entry (publication) as invalid. The exact error must be as follows: Invalid entry at DO or TY: r. If the DO is not a duplicate and the entry is valid, Ris should calculate an extra tag field SM. SM must summarize the references as SM UniqueCount: , NumOfAuthors: , LongestTagValue: , MostKW: (). UniqueCount is the number of the tag fields which has values. NumOfAuthors is the number of the authors for this reference. LongestTagValue is the tag that has the longest values. For LongestTagValue calculation, AB should not be counted. MostKW is the most repeated keyword value.

Here is what I have so far:

import java.io.BufferedReader;

import java.io.BufferedWriter;

import java.io.FileReader;

import java.io.FileWriter;

import java.io.IOException;

import java.util.ArrayList;

import java.util.HashMap;

import java.util.HashSet;

import java.util.Map;

import java.util.Set;

public class Example {

public static void main(String[] args) {

BufferedReader bf;

BufferedWriter bw;

try (BufferedReader in = new BufferedReader(new FileReader("references.ris"));

BufferedWriter out = new BufferedWriter(new FileWriter("output.txt"))) {

Set doSet = new HashSet<>();

String line, doValue = "", tiValue = "", longestTagValue = "", mostKW = "";

int uniqueCount = 0, numOfAuthors = 0, maxKWCount = 0;

boolean validEntry = true;

ArrayList validTags = new ArrayList<>();

ArrayList validTYValues = new ArrayList<>();

while ((line = in.readLine()) != null) {

if (line.substring(0, 2).equalsIgnoreCase("TY")) {

Map kwCount = new HashMap<>();

String tags [] = new String[100];

String values [] = new String[100];

int index = 0;

while(!line.isBlank()) {

String[] parts = line.split(" -", 2);

String tag = parts[0].trim();

tags[index] = tag;

String value = "";

if (parts.length == 2) {

value = parts[1].trim();

}

values[index] = value;

index++;

line = bf.readLine();

}

for(int i = 0; i < index; i++) {

}

catch (IOException e) {

System.err.println("An IO exception occurred: " + e.getMessage());

}

*****************************************************************************************

Example of tags and values

TY - CONF TI - Optimal trade-off between accuracy and network cost of distributed learning in Mobile Edge Computing: An analytical approach T2 - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) SP - 1 EP - 9 AU - L. Valerio AU - A. Passarella AU - M. Conti PY - 2017 KW - Cloud computing KW - Distributed databases KW - Data analysis KW - Analytical models KW - Mobile communication KW - Data mining KW - Data models DO - 10.1109/WoWMoM.2017.7974310 JO - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) IS - SN - VO - VL - JA - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) Y1 - 12-15 June 2017 AB - The most widely adopted approach for knowledge extraction from raw data generated at the edges of the Internet (e.g., by IoT or personal mobile devices) is through global cloud platforms, where data is collected from devices, and analysed. However, with the increasing number of devices spread in the physical environment, this approach rises several concerns. The data gravity concept, one of the basis of Fog and Mobile Edge Computing, points towards a decentralisation of computation for data analysis, whereby the latter is performed closer to where data is generated, for both scalability and privacy reasons. Hence, data produced by devices might be processed according to one of the following approaches: (i) directly on devices that collected it (ii) in the cloud, or (iii) through fog/mobile edge computing techniques, i.e., at intermediate nodes in the network, running distributed analytics after collecting subsets of the data. Clearly, (i) and (ii) are the two extreme cases of (iii). It is worth noting that the same analytics task executed at different collection points in the network, comes at different costs in terms of traffic generated over the network. Precisely, these costs refer to the traffic generated to move data towards the collection point selected (e.g. the Edge or the Cloud) and the one induced by the distributed analytics process. Until now, deciding if to use intermediate collection points, and which one they should be in order to both obtain a target accuracy and minimise the network traffic, is an open question. In this paper, we propose an analytical framework able to cope with this problem. Precisely, we consider learning tasks, and define a model linking the accuracy of the learning task performed with a certain set of collection points, with the corresponding network traffic. The model can be used to identify, given the specification of the learning problem (e.g. binary classification, regression, etc.), and its target accuracy, what is the optimal level for collecting data in order to minimise the total network cost. We validate our model through simulations in order to show that setting, in simulation, the level of intermediate collection indicated by our model, leads to the minimum cost for the target accuracy. ER -