Question

1 Approved Answer

Posted on Sep 11, 2024

Convert this to a Java program, specifically Eclipse IDE, any help is appreciated! :) Description: Ris data was extracted from the IEEE explorer database. This

image text in transcribed

Convert this to a Java program, specifically Eclipse IDE, any help is appreciated! :)

Description: Ris data was extracted from the IEEE explorer database. This data is used for citations and references. For this homework, you will put together several techniques and concepts you have learned in CS 210 (or from a similar background) and some new techniques to make Ris a program that reads, processes, and outputs data from these references. Ris version 0.1 (which satisfies the requirements of this homework) will read through a file of references.ris data, perform simple processing on the data, and send the resulting output. Operational Issues: Your program will read the publication data file (a ris file) via redirected standard input (use references.ris). Each publication entry is organized as follows: Each entry will begin with a tag (TY), a designation of the type of entry (such as "TY"), a colon (-), the value of the TY, and then start a new tag on the new line. The publication entry will then contain zero or more lines, each with the same basic form: zero or more spaces and/or tabs, a field designator (such as "KW"), a colon (-), and arbitrary text for value. Between two publications, there may be empty line or lines that make you understand that the new publication starts. For example; publication L. Valerio, A. Passarella and M. Conti, "Optimal trade-off between accuracy and network cost of distributed learning in Mobile Edge Computing: An analytical approach," 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM), 2017, pp. I-9, doi: 10.1109/WoWMoM.2017.7974310. is shown as Networks (WoWMoM) JO - 2017 IEEE I8th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) IS - SN VO - VL JA - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) Y1 - 12-15 June 2017 AB - The most widely adopted approach for knowledge extraction from raw data generated at the edges of the Internet (e.g., by IoT or personal mobile devices) is through global cloud platforms, where data is collected from devices, and analysed. However, with the increasing number of devices spread in the physical environment, this approach rises several concerns. The data gravity concept, one of the basis of Fog and Mobile Edge Computing, points towards a decentralisation of computation for data analysis, whereby the latter is performed closer to where data is generated, for both scalability and privacy reasons. Hence, data produced by devices might be processed according to one of the following approaches: (i) directly on devices that collected it (ii) in the cloud, or (iii) through fog/mobile edge computing techniques, i.e., at intermediate nodes in the network, running distributed analytics after collecting subsets of the data. Clearly, (i) and (ii) are the two extreme cases of (iii). It is worth noting that the same analytics task executed at different collection points in the network, comes at different costs in terms of traffic generated over the network. Precisely, these costs refer to the traffic generated to move data towards the collection point selected (e.g. the Edge or the Cloud) and the one induced by the distributed analytics process. Until now, deciding if to use intermediate collection points, and which one they should be in order to both obtain a target accuracy and minimise the network traffic, is an open question. In this paper, we propose an analytical framework able to cope with this problem. Precisely, we consider learning tasks, and define a model linking the accuracy of the learning task performed with a certain set of collection points, with the corresponding network traffic. The model can be used to identify, given the specification of the learning problem (e.g. binary classification, regression, etc.), and its target accuracy, what is the optimal level for collecting data in order to minimise the total network cost. We validate our model through simulations in order to show that setting, in simulation, the level of intermediate collection indicated by our model, leads to the minimum cost for the target accuracy. ER - The first line indicates that the entry TY has "CONF" as its value. And so on. This entry happens to contain other fields, as explained RISExplanations.txt file. As Ris program reads the data file, there are correctness checks that it must make on the data. First, Ris must ensure that the DO (found in the tag list) of each reference is unique (not repeated). If Ris encounters a duplicate DO value while reading the data file, it must provide an error message indicating that it encountered a duplicate and the title of the publication at which the duplicate was found, then move on to reading, processing, and outputting the next publication in data file. The exact format of the error must be as follows: Duplicate DO: r for Title: n Naturally, r should be replaced by the duplicated DO value, and n should be replaced by the reference Title (TI value) at which the duplicate was found. Second, If Ris encounters any invalid TAG or TY type, it must mark that ris entry (publication) as invalid. The exact error must be as follows: Invalid entry at DO or TY: r Naturally, r should be replaced by the DO value or Title (whichever exist and you prefer). Please see the RISExplanations.txt file for the valid TAG and TY types. If the DO is not a duplicate and the entry is valid, Ris should calculate an extra tag field SM. SM must summarize the references as SM - UniqueCount: ..., NumOfAuthors: ..., LongestTagValue: ..., MostKW: (...). UniqueCount is the number of the tag fields which has values. NumOfAuthors is the number of the authors for this reference. LongestTagValue is the tag that has the longest values. For LongestTagValue calculation, AB should not be counted. MostKW is the most repeated keyword value. MostKW can have more than one most repetitive words because the frequencies of words can be same. You need to list all by separating them with a comma in the parenthesis like (wordA, wordB:x). ). If the words in the keywords are connected by '-' or any other character, they are also counted as one word, not multiple words. Eventually, you need to write each publication after process your data as: As the data is read in, line by line, and processed to check for duplicates and invalid values, Ris will send it back out again, albeit in a slightly altered form, by removing AB tag and adding SM tag for each publication. Lines containing duplicate DOs will not be output, only error messages will be output for those publications. Example output for the above publication is: TY - CONF TI - Optimal trade-off between accuracy and network cost of distributed learning in Mobile Edge Computing: An analytical approach T2 - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) SP 1 EP 9 AU - L. Valerio AU - A. Passarella AU - M. Conti PY - 2017 KW - Cloud computing KW - Distributed databases KW - Data analysis KW - Analytical models KW - Mobile communication KW - Data mining KW - Data models DO - 10.1109/WoWMoM.2017.7974310 JO - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) IS - SN - VO - VL - JA - 2017 IEEE 18th International Symposium on A World of Wireless, Mobile and Multimedia Networks (WoWMoM) Y1 - 12-15 June 2017 ER - SM - UniqueCount:13, NumOfAuthors:3, LongestTagValue:TI, MostKW: (data:3) Implementation Issues: You must create and submit Java source code to complete this homework (as well as all other homework in this course). You will use the standard Java input and output streams (system in and system out, respectively) for all input and output for this assignment. To test your code on the data file, you will want to tell Ris that you wish to redirect with BufferedReader or Scanner to read from your data file