CHAPTER 14: BIG DATA ANALYTICS AND NOSQL

Description

CIS 3365 Flashcards on CHAPTER 14: BIG DATA ANALYTICS AND NOSQL, created by Miguel Lucero on 21/03/2017.
Miguel Lucero
Flashcards by Miguel Lucero, updated more than 1 year ago
Miguel Lucero
Created by Miguel Lucero about 8 years ago
147
1
1 2 3 4 5 (0)

Resource summary

Question Answer
1. Much ambiguity exists in defining Big Data. a. True b. False ANSWER: True
2. For a data set to be considered Big Data, it must display all the “3 Vs” – volume, velocity and variety. a. True b. False ANSWER: False
3. Scaling out is keeping the same number of systems, but migrating each system to a larger one. a. True b. False ANSWER: False
4. In many ways, the issues of associated with volume and velocity are the same. a. True b. False ANSWER: True
5. The analysis of data to produce actionable results is feedback loop processing. a. True b. False ANSWER: True
6. Relational databases rely on unstructured data. a. True b. False ANSWER: False
7. One tenet of Big Data is that all data that is capable of being captured should be. a. True b. False ANSWER: False
8. The ability to graphically data in a way that makes it understandable is the concept of value. a. True b. False ANSWER: False
9. Characteristics that are important in working with data in the relational database model also apply to Big Data. a. True b. False ANSWER: True
10. Hadoop is a database that has become the de facto standard for most Big Data storage and processing. a. True b. False ANSWER: False
11. Under the HDFS system, using a write-one, ready-many model simplifies concurrency issues. a. True b. False ANSWER: True
12. A block report is used to let the name node know that the data mode is still available. a. True b. False ANSWER: False
13. A reduce function takes a collection of key-value pairs with the same key value and summarizes them into a single result. a. True b. False ANSWER: True
14. Hive is a good choice for jobs that require a small subset of data to be returned very quickly. a. True b. False ANSWER: False
15. Hadoop is a high-level tool that requires little effort to create, manage and use. a. True b. False ANSWER: False
16. Flume is a tool for converting data back and forth between a relational database and the HDFS. a. True b. False ANSWER: False
17. Most NoSQL products run only in a Linux or Unix environment. a. True b. False ANSWER: True
18. Key-value and document databases are structurally similar. a. True b. False ANSWER: True
19. A column-family database is a NoSQL database model that organizes data in key-value pairs with keys mapped to a set of columns in the value component. a. True b. False ANSWER: True
20. Interest in graph databases can be tied to the area of social networks. a. True b. False ANSWER: True
21. Explanatory analytics uses predictive analytics as a stepping stone to create explanatory models. a. True b. False ANSWER: False
22. Data mining focuses on the discovery and explanation stages of knowledge acquisition. a. True b. False ANSWER: True
23. _ _ is NOT one of the “3 Vs” of Big Data. a. Volume b. Velocity c. Validation d. Variety c. Validation
24. _ ___ is keeping the same number of systems, but migrating each system to a larger system. a. Clustering b. Scaling up c. Streaming d. Scaling out b. Scaling up
25. __ ___ focuses on filtering data as it enters the system to determine which data to keep and which to discard. a. Scaling up b. Feedback loop processing c. Stream processing d. Scaling out c. Stream processing
26. A(n) __ is a process or set of operations in a calculation. a. algorithm b. feedback loop c. stream d. structure a. algorithm
27. Big Data: a. relies on the use of structured data b. captures data in whatever format it naturally exists c. relies on the use of unstructured data d. imposes a structure on data when it is captured b. captures data in whatever format it naturally exists
28. In the context of Big Data, _____ relates to differences in meaning. a. variety b. variability c. veracity d. viability b. variability
29. In the context of Big Data, _____ refers to the trustworthiness of a set of data. a. value b. variability c. veracity d. viability c. veracity
30. By default, Hadoop uses a replication factor of: a. one b. two c. three d. four c. three
31. Which of the following is NOT a key assumption of the Hadoop Distributed File System? a. High volume b. Write-many, read-once c. Streaming access d. Fault-tolerance b. Write-many, read-once
32. When using a HDFS, the _____ node creates new files by communicating with the ____ node. a. client, name b. name, client c. client, data d. data, client a. client, name
33. When using a HDFS, a heartbeat is sent every _____ to notify the name node that the data mode is still available. a. 3 hours b. 3 seconds c. 6 hours d. 6 seconds b. 3 seconds
34. When using MapReduce, a _______ function takes a collection and data and sorts and filters it into a set of key-value pairs. a. reduce b. map c. data d. block b. map
35. When using MapReduce, best practices suggest that the number of mappers on a given node should be: a. 100 or more b. 100 or less c. 50 or less d. at least 300 b. 100 or less
36. processing occurs when a program runs from beginning to end without any user interaction. a. Hadoop b. Block c. Hive d. Batch d. Batch
37. Two of the most popular applications to simplify the process of creating MapReduce jobs are Hive and a. Flume b. Pig c. Sqoop d. Impala b. Pig
38. ___ is a tool for converting data back and forth between a relational database and the HDFS. a. Flume b. Pig c. Sqoop d. Impala c. Sqoop
39. ___ was the first SQL-on-Hadoop application. a. Flume b. Pig c. Sqoop d. Impala d. Impala
40. Which of the following is NOT one of the standard NoSQL categories? a. document databases b. column-oriented databases c. graph databases d. chart databases d. chart databases
41. To query the value component of the pair when using a key-value database, use get or: a. store b. fetch c. retrieve d. gather b. fetch
42. Document databases group documents into logical groups called: a. buckets b. sets c. collections d. blocks c. collections
43. ______minimizes the number of disk reads necessary to retrieve a row of data. a. Column-oriented database b. Row-centric storage c. Column-family database d. Column-centric storage b. Row-centric storage
44. Modeling and storing data about relationships is the focus of: a. key-value databases b. column-oriented databases c. document databases d. graph databases d. graph databases
45. uses statistical analysis to answer questions about the how and why of relationships. a. Explanatory analytics b. Data mining c. Predictive analytics d. Knowledge acquisition a. Explanatory analytics
46. uses statistical tools to answer questions about future data occurrences. a. Explanatory analytics b. Data mining c. Predictive analytics d. Knowledge acquisition c. Predictive analytics
47. The goal of the _____ phase of data mining is to identify common data characteristics or patterns. a. data preparation b. data analysis and classification c. knowledge acquisition d. prognosis b. data analysis and classification
48. The end user decides what techniques to apply to the data when using the _____ mode of data mining a. guided b. prognosis c. directed d. automated a. guided
49. Most BI vendors are dropping the term “data mining” and replacing it with the term: a. explanatory analytics b. data analytics c. predictive analytics d. knowledge acquisition c. predictive analytics
Show full summary Hide full summary

0 comments

There are no comments, be the first and leave one below:

Similar

HISTOGRAMS
Elliot O'Leary
LOGARITHMS
pelumi opabisi
Sociology- Key Concepts
Becky Walker
GCSE Physics Revision notes
Megan McDonald
Computing Hardware - CPU and Memory
ollietablet123
Physics 1A - Energy
Zaki Rizvi
OCR gcse computer science
Jodie Awthinre
Cell Physiology and General Physiology of Excitable Tissues- Physiology PMU 2nd Year
Med Student
1PR101 2.test - Část 6.
Nikola Truong
Specific Topic 7.3 Timber selection
T Andrews