Question
This assignment provides you an opportunity to utilize the functions and tools of Excel to solve a basic analytics scenario: data validation.The purpose of this
This assignment provides you an opportunity to utilize the functions and tools of Excel to solve a basic analytics scenario: data validation.The purpose of this assignment is to use Excel to quickly identify the possible errors within the database, and not tovisually examine all6,000+ database records.
The HR tab inHospital.xlsxfile contains the employee records downloaded from the Human Resource database of a fictitious hospital. As with any acquired data, you must be aware of the limitations of your dataset.
The sourcing database for Hospital.xlsx has a limit of twenty (20) characters for the Address 1 and Address 2 fields.Therefore, truncated addresses are not considered errors. The only errors in the Address fields that can be easily identified in Excel would be a missing address or an address of insufficient length to contain a deliverable address. Six characters is the minimum length for an address (e.g. "1 U St".)
To accurately identify the possible errors in the Address 1 field, through Excel, you would need to parsing algorithm to correctly determine the components of an address including the primary address number, pre-directional, street name, suffix, post-directional, secondary address identifier, and secondary address.Compound this with the Address 1 truncation issue and the creation of such an algorithm would be way too advanced for this course.Therefore, there are NO Address 1 fields errors except a missing address or an address of insufficient length to contain a deliverable address.
Data is required in all fields except Address 2 and Phone. If a phone number is entered it must adhere to the 999-999-9999 format.No rules concerning cases exist for any field.
Both five-digit zip codes and zip+4 zip codes are allowed in the Postal Code field.Also, remember that a zip code may have more than one authorized city or spelling. Only one US zip code in Arizona (85142) crosses county lines.
On the Errors tab inHospital.xlsxfile, document each error by identifying the location of the error (ID Number), the type of error (omission, range, data entry, etc.), the correct information if known, and the formula used to identify the error. You receive one point for each error record identified; i.e. even if a record contains more than one error, you receive only one point for that record. At least 100 data error records exist in this file.Extra credit is available.
An additional file nameZipcode.xlsxis available as a reference.
Remember, use Excel to sort and filter the database. Also, use the lookup and reference; text; and logical functions to assignment. If you do not explain how the error was identified, you do not get credit for the error.
The purpose of this assignment is to use Excel to quickly identify the possible errors within the database; andNOTtovisually examine all6000+ database records.NO credit is given for errors identified through visual examination.
Hint: Visual inspection does not catch leading or trailing spaces.Using the TRIM function eliminates that issue.
The general types of errors are omission (data is missing); range (data is outside expected values); and data entry (data is not inputted correctly). Some errors can be found by looking at just the field; while others can other be identified when looking at the relationship between multiple (2+) fields.
Repeating the following steps will help you complete the assignment.
1.Understand the validation rules and conditions for a field
2.Identify possible error(s) that could exist for that field or that field in combination with other fields
3. Excel formula in a new column that performs a specific data validation test for the possible error
4.Copy the formula to all records. Using relative references makes this possible.
5.Filter the column(s) to show only those records that do not return the expected (valid) value.
6.Record the unexpected (invalid /error) records
To illustrate, let's walk though these steps using the postal code field.
1.No restriction in the field format. Both five-digit zip codes, and zip+4 zip codes are allowed. Reference file (zipcode.xlsx) provides relationships between city, state, zip, and county. Only one US zip code (in Arizona (85142)) crosses county lines.
2.Length of a valid postal code entry can be either 5 or 10 character in length.
3.Place excel formula =LEN(i2) in cell (P2) to return character length
4.Copy to all records
5.Filter column P for records without a length of 5 or 10.
6.Record errors
Also, you can use MID(i2,6,1) to determine if the sixth character of a 10-character postal code is a dash. Remember that certain functions may return a different data type then you expect. Functions such as DATEVALUE or VALUE correct the data type.
On to a brief discussion of each file.
ID Number:This is the record key and no conditions or limitations are given. Only errors possible are omission and duplication errors.Duplicates can be found through a pivot table.
First Name: The assignment does not state a minimum name length requirement; so, initials are valid. In addition, you can not assume that any name is misspelled since many names has multiple nicknames and variants. For example, Elizabeth's variants include Lyza, Lyssie, Lysa, Lizzy, Lizzie, Lizzi, Libbi, Leesa, Ellsi, Bettie, Bettey, Bett, Betsie, Betsi, Betsey, Bet, Bessy, Bessie, Bessi, Lisa, Liza, Liz, Libby, Leeza, Ellie, Eliza, Buffy, Betty, Bette, Betsy, Beth, Bess, Lilibet, Lizbeth, Alizabeth, Elisabeth, Elisabith, Elisheba, Elisheva, Elizabe, Ellisa, Ellspet, Ellyse, Ellyssa, Ellyza, Elsee, Elsy, Elysabeth, Elyse, Elyssa, Elyza, Elyzza, Lilabet, Lilbeth, Lisabet, Lisabeth, Lisanne, Lisbeth, Lizabeth, Lizanne, Lizbet, Lyzanne, Lyzbet, Lyzbeth, Lyzbette, and Lyzette. Therefore, the only errors possible are omission errors.
Last Name: The assignment does not state a minimum name length requirement. Many east Asian names are two characters long e.g. WU, NG, and HU. Minimal length testing would be a valid test for a field length of 1.As with the first name, you cannot assume that the last name is misspelled. Omission errors are possible.
Name:The name field is derived from the concatenation of the last name, a comma and space, and the first name. This is an example of a field where you need to check its relationship to other fields to see if an error exists.You would need to replicate the concatenation and the two fields and compare the result with the name field.Using the TRIM function removes unseen leading and trailing black spaces.
Address 1: There areNOAddress 1 fields errors except omission or insufficient length.I have seen valid addresses that are six characters long (1 Q St).
Address 2: Because Address 2 is only used as a continuation of Address 1, no validation errors can be determined.Per the instructions, the field may be blank, so no omission errors exist. Insufficient length can not be tested without knowing Address 1.
City: Omission errors are possible.
State: Omission errors are possible.
Postal Code:This field was shown above as an example. No restriction in the field format. Both five-digit zip codes, and zip+4 zip codes are allowed. Length of a valid postal code entry can be either 5 or 10 character in length. Omission errors are also possible.
County: Omission errors are possible.
Birthday:The minimum age of an employee is determined by law. Minors aged 15 and older may work with proper documentation.Many elderly people may wish to volunteer to provide themselves with a social outlet or to have a sense of purpose. However, a check of maximum age for employees can be done.The oldest living human is about 113. Ages greater than this can be considered errors.Omission errors are also possible.
Hire Date: Error tests must be made is conjunction with birthday.Errors include employee hired before the age of fifteen.Omission errors are also possible.
Phone: The phone field, if entered, must be a specific length.It must also be in a specific format.The LEN and MID functions mentioned for the postal code may be also used for the phone field to test for appropriate length and format.No omission errors are possible since the instructions state that the field may be blank.
In the case of the Phone field, all the phone numbers are fictious as you can easily identify by the 555 exchange.If this were a real database, you could test the validity of the numbers if you had a reference file of authorized area codes and exchanges.You cannot assume duplicate phone numbers are in fact errors since there is not enough contextual information provided.If the phone was indicated to be a home phone, you could consider the duplicates as errors if the Address were not same and the Address field was indicated to be the home address.In this database, duplicate numbers could in fact be the central number for a particular medical department that all the employees of that department are using as their contact number. Since you cannot assume, the duplicates cannot be considered errors.
You were also provided a reference file (zipcode.xlsx) that provides relationships between city, state, zip, and county.
State: Since there is a one to one relationship between postal code and state, you can use VLOOKUP to validate the postal code / state relationship.The VLOOKUP does not let you know whether the postal code or the state is the field in error just that the combination is incorrect. Remember that the zip code field in the reference contains the five-digit zip code whereas the database accepts either the five-digit zip codes and zip+4 zip codes.
County:Similarly, you could validate the county field with a VLOOKUP using the five-digit zip code since only one US zip code (in Arizona (85142)) crosses county lines.
City: A postal code may have more than one authorized city or city spelling.Therefore, you need to use the MATCH and INDEX functions together to validate that the city / postal code pair are correct. The combination of the INDEX and MATCH functions is one of Excel's most powerful features. Their use increases your versatility in doing reference lookups.
A record can appear in multiple error tests for example, many of the omission errors are for the same record.Remember, you receive only one point per record regardless of the number of errors contained within that record.
Finally, do not assume anything. For example, the assignment does not state that the data must be uppercase..
Step by Step Solution
There are 3 Steps involved in it
Step: 1
Get Instant Access to Expert-Tailored Solutions
See step-by-step solutions with expert insights and AI powered tools for academic success
Step: 2
Step: 3
Ace Your Homework with AI
Get the answers you need in no time with our AI-driven, step-by-step assistance
Get Started