A researcher is analyzing data about students in a school district to determine whether there is a relationship between grade point average and number of absences. The researcher plans on compiling data from several sources to create a record for each student.

The researcher has access to a database with the following information about each student.

  • Last name
  • First name
  • Grade level (9, 10, 11, or 12)
  • Grade point average (on a 0.0 to 4.0 scale)

The researcher also has access to another database with the following information about each student.

  • First name
  • Last name
  • Number of absences from school
  • Number of late arrivals to school Upon compiling the data, the researcher identifies a problem due to the fact that neither data source uses a unique ID number for each student. Which of the following best describes the problem caused by the lack of unique ID numbers?

Correct answer: A
Reflection: A unique identifier would be required in order to distinguish between two students with the same first and last names.


A team of researchers wants to create a program to analyze the amount of pollution reported in roughly 3,000 counties across the United States. The program is intended to combine county data sets and then process the data. Which of the following is most likely to be a challenge in creating the program?

Correct answer: B
Reflection: It will be a challenge to clean the data from the different counties to make the data uniform. The way pollution data is captured and organized may vary significantly from county to county.


A student is creating a Web site that is intended to display information about a city based on a city name that a user enters in a text field. Which of the following are likely to be challenges associated with processing city names that users might provide as input?

Select two answers.

Correct answer: B
Reflection: Different users may abbreviate city names differently. This may require the student to clean the data to make it uniform before it can be processed.
Correct answer: C
Reflection: Misspelled city names will not be an exact match to information stored by the Web site. This may require the student to clean the data to make it uniform before it can be processed.


Each student at a school has a unique student ID number. A teacher has the following spreadsheets available.

  • Spreadsheet I contains information on all students at the school. For each entry in this spreadsheet, the student name, the student ID, and the student’s grade point average are included.
  • Spreadsheet II contains information on only students who play at least one sport. For each entry in this spreadsheet, the student ID and the names of the sports the student plays are included.
  • Spreadsheet III contains information on only students whose grade point average is greater than 3.5. For each entry in this spreadsheet, the student name and the student ID are included. Spreadsheet IV contains information on only students who play more than one sport. For each entry in this spreadsheet, the student name and the student ID are included.

The teacher wants to determine whether students who play a sport are more or less likely to have higher grade point averages than students who do not play any sports. Which of the following pairs of spreadsheets can be combined and analyzed to determine the desired information?
Correct answer: A
Reflection: The desired information can be determined by using the student IDs in spreadsheet II to identify the students who play a sport. Once the students who play a sport are identified, the grade point averages of students who play sports in spreadsheet I can be compared to the grade point averages of all other students in spreadsheet I.


A database of information about shows at a concert venue contains the following information.

  • Name of artist performing at the show
  • Date of show
  • Total dollar amount of all tickets sold Which of the following additional pieces of information would be most useful in determining the artist with the greatest attendance during a particular month?

Correct answer: A
Reflection: The attendance for a particular show can be calculated dividing the total dollar amount of all tickets sold by the average ticket price.


Delivery trucks enter and leave a depot through a controlled gate. At the depot, each truck is loaded with packages, which will then be delivered to one or more customers. As each truck enters and leaves the depot, the following information is recorded and uploaded to a database.

  • The truck’s identification number
  • The truck’s weight
  • The date and time the truck passes through the gate
  • Whether the truck is entering or leaving the depot Using only the information in the database, which of the following questions CANNOT be answered?

Correct answer: B
Reflection: The data captured each time a truck enters or leaves the depot do not include any information about the number of customers or deliveries associated with the truck.


A camera mounted on the dashboard of a car captures an image of the view from the driver’s seat every second. Each image is stored as data. Along with each image, the camera also captures and stores the car’s speed, the date and time, and the car’s GPS location as metadata. Which of the following can best be determined using only the data and none of the metadata?

Correct answer: D
Reflection: Determining the number of bicycles the car encountered would require the use of image recognition software to examine the images collected by the camera. The images are the data collected and no metadata would be required.


A teacher sends students an anonymous survey in order to learn more about the students’ work habits. The survey contains the following questions.

  • On average, how long does homework take you each night (in minutes) ?
  • On average, how long do you study for each test (in minutes) ?
  • Do you enjoy the subject material of this class (yes or no) ? Which of the following questions about the students who responded to the survey can the teacher answer by analyzing the survey results?

  • Do students who enjoy the subject material tend to spend more time on homework each night than the other students do?
  • Do students who spend more time on homework each night tend to spend less time studying for tests than the other students do?
  • Do students who spend more time studying for tests tend to earn higher grades in the class than the other students do?

Correct answer: C
Reflection: Question I can be answered because the teacher can detect a correlation between responses to questions 1 and 3 on the survey. Question II can be answered because the teacher can detect a correlation between responses to questions 1 and 2 on the survey. Question III cannot be answered because the survey is anonymous and the teacher cannot compare student grades with the responses to the survey questions.


A city maintains a database of all traffic tickets that were issued over the past ten years. The tickets are divided into the following two categories.

  • Moving violations
  • Nonmoving violations The data recorded for each ticket include only the following information.

  • The month and year in which the ticket was issued
  • The category of the ticket Which of the following questions CANNOT be answered using only the information in the database?

Correct answer: B Reflection: The database only tracks the month and year that each ticket was issued. There is no information about whether the tickets were issued on weekends or weekdays.


The owner of a clothing store records the following information for each transaction made at the store during a 7-day period.

  • The date of the transaction
  • The method of payment used in the transaction
  • The number of items purchased in the transaction
  • The total amount of the transaction, in dollars

Customers can pay for purchases using cash, check, a debit card, or a credit card. Using only the data collected during the 7-day period, which of the following statements is true?