Modulo 2 - Big Data Analysis & Technology Concepts

Frage 1 von 200

1

Big Data Analysis

Wähle eine der folgenden:

differs from traditional data analysis primarily because of the volume, velocity and variety characteristics of the data it processes
When two variables are considered to be _____________ they are considered to be aligned based on a linear relationship
This means that when one variable changes, the other variable also changes proportionally and constantly
this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality

Erklärung

Frage 2 von 200

1

step-by-step process

Wähle eine der folgenden:

is needed to organize the task involved with retrieving, processing, producing and repurposing data
therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage
Although the data format may be the same, the data model may be different
depending on the business scope of the analysis project and nature of the business problems being addressed, the requiered datasets and their sources can be internal and/or external to the enerprise

Erklärung

Frage 3 von 200

1

Big Data Analysis Lifecycle

Wähle eine oder mehr der folgenden:

Business Case Evaluation
Data Identification
A/B Testing
An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient

Erklärung

Frage 4 von 200

1

Big Data Analysis Lifecycle

Wähle eine oder mehr der folgenden:

Data Adquisition & Filtering
Data Extraction
suggestions commonly pertain to recommending items, such as movies, books, web pages, people, etc.
the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic

Erklärung

Frage 5 von 200

1

Big Data Analysis Lifecycle

Wähle eine oder mehr der folgenden:

Data Validation & Cleansing
Data Aggregation & Representation
The ______ itself is a visual, color-coded representation of data values
the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic

Erklärung

Frage 6 von 200

1

Big Data Analysis Lifecycle

Wähle eine oder mehr der folgenden:

Data Analysis
Data Visualization
A _______ is generally expressed using a line chart, with time plotted on the x-axis and recorded data value plotted on the y-axis
Utilization of Analysis Results

Erklärung

Frage 7 von 200

1

Business Case Evaluation

Wähle eine oder mehr der folgenden:

requires that a business case be created, assessed and approved prior to proceeding with the actual hands-on analysis task
helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle
Unstructured text is generally much more difficult to analyze and search, compared to structured text
is an example of the application of the law of large numbers

Erklärung

Frage 8 von 200

1

Business Case Evaluation

Wähle eine oder mehr der folgenden:

the further identification of KPI during this stage helps determine how closely the data analysis outcome needs to meet the identified goals and objectives
based on the business requirements documented, it can be determined whether the business problems being addressed are really Big Data problems
Applications for ___________ include fraud detection, medical diagnosis, network data analysis and sensor data analysis
can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior

Erklärung

Frage 9 von 200

1

Business Case Evaluation

Wähle eine oder mehr der folgenden:

in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety
Note also that another outcome of this stage is the determination of the underlying budget required to carry out the analysis project
The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming
Workflow Engine

Erklärung

Frage 10 von 200

1

Business Case Evaluation

Wähle eine oder mehr der folgenden:

any required purchase of tools, hardware, training, etc. need to be understood in advance, so that the anticipated investment can be weighed against the expected benefits of archieving the goals
initial iteration of the big data analysis lifecycle will require more up-front investment of Big Data technologies, products and training compared to later iterations where these earlier investment can be repeatedly leveraged
In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample
can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior

Erklärung

Frage 11 von 200

1

Data Identification

Wähle eine oder mehr der folgenden:

is dedicated to identify datasets (and their sources) required for the analysis project
identifying a wider variety of data sources may increase the probability of finding hidden patterns and correlations
Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________
Clustering

Erklärung

Frage 12 von 200

1

Data Identification

Wähle eine oder mehr der folgenden:

it can be beneficial to identify as many types of related data sources and insights as possible, especially when we dont know exactly what we're looking for
depending on the business scope of the analysis project and nature of the business problems being addressed, the requiered datasets and their sources can be internal and/or external to the enerprise
examples of appenden metadata can include dataset size and structure, source information, date and time of creation or collection, language-specific information, etc.
is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data

Erklärung

Frage 13 von 200

1

Internal Datasets

Wähle eine der folgenden:

a list of available datasets from sources, such as data marts and operational systems, are typically compiled and matched against a predefined dataset specification
A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers
Subsequent to __________ being made available to business users to support business decision-making (such as via dashboard), there may be further oportunities to utilize the __________
In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Erklärung

Frage 14 von 200

1

External Dataset

Wähle eine der folgenden:

a list of possible third-party data providers (data markets and publicity available datasets), are generally compiled. Some forms of external data may be embedded within blogs or other types of content-based Websites, in which case they may need to be harvested via automated tools
can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior
Reconciling these differences can require complex logic that is executed automatically without the need for human intervention
A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes

Erklärung

Frage 15 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

the data is gathered from all of the data sources that were identified during the previous stage and is then subjected to the automated filtering of corrupt data or data that has been deemed to have no value to the analysis objectives
depending on the type of data source, data may come as a dump of files (such as data purchased from a third-party data provider), or may require API integration (such as with Twitter)
A ________ engine enables data to be moved in or out big data solution storage devices
A _________ comprises grouped read/writes, with a larger data footprint consisting of complex joins and high-latency responses

Erklärung

Frage 16 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

In many cases, especially where external, unstructured data is concerned, some or most of the acquired data may be irrelevant (noise) and can be discarded as par of the filtering process
data classified as "corrupt" can include records with missing or nonsensical values or invalid data types
it involves plotting entities as nodes and connections as edges between nodes
OLTP and operational systems (write-intensive) as well as operational BI and analytics (read-intensive), both fall within this category

Erklärung

Frage 17 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

data that is filtered out for one analysis may possibly be valuable for a different type of analysis
therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage
is an example of the application of the law of large numbers
Coordination Engine

Erklärung

Frage 18 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

both internal and external data needs to be persisted once it gets generated or enters the enterprise boundary
for batch analytics, this data is persisted to disk prior to analysis
extracting text for text analytics, which requires scans of whole documents, will not be necessary if the underlying Big Data solution can already read the document in its native format directly
is dedicated to determining how and where processed analysis data can be further leveraged

Erklärung

Frage 19 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

in the case of realtime analytics, the data is analyzed first and then persisted to disk
metadata can be added via automation to data from both internal and external data sources to improve the classification and querying
Also known as offline processing, ________ processing involves processing data in batches and usually imposes delays (resulting in high-latency responses)
is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Erklärung

Frage 20 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

examples of appenden metadata can include dataset size and structure, source information, date and time of creation or collection, language-specific information, etc.
it is vital that metadata be machine-readable and passed forward along subsequent analysis stages
The ability to analyze massive amounts of data and find useful insights carries little value if the only ones that can interpret the results are the analysis
both version are subjected to an experiment simultaneously, the observations are recorded to determine which version is more successful

Erklärung

Frage 21 von 200

1

Data adquisition & filtering

Wähle eine oder mehr der folgenden:

this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality
metadata is added through an automated mechanism to data received from both internal and external data sources
any required purchase of tools, hardware, training, etc. need to be understood in advance, so that the anticipated investment can be weighed against the expected benefits of archieving the goals
helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle

Erklärung

Frage 22 von 200

1

Data Extraction

Wähle eine oder mehr der folgenden:

Some of the data identified as input for the analysis may arrive in a format incompatible with the big data solution
the need to address disparate types of data is more likely with data from external sources
make it possible to develop highly reliable, highly available distributed big data solutions that can be deployed in a cluster
Data needs to be imported before it can be processed by the big data solution

Erklärung

Frage 23 von 200

1

Data Extraction

Wähle eine oder mehr der folgenden:

is dedicated to extracting disparate data and transforming it into a format that the underlying big data solution can use for the purpose of the data analysis
the extend of extraction and transformation required depends on the types of analytics and capabilities of the big data solution
provenance can play an important role in determining the accuracy and quality of qustionable data
is closely related to parallel data processing in how the same principle of "divide-and-conquer" is applied

Erklärung

Frage 24 von 200

1

Data Extraction

Wähle eine oder mehr der folgenden:

extracting text for text analytics, which requires scans of whole documents, will not be necessary if the underlying Big Data solution can already read the document in its native format directly
further transformation is needed in order to separate the data into two separate fields as required by the big data solution
it can also be used to make predictions about the values of the dependent variable while it is still unknown
However, ___________ is always archieved through physically separate machines that are networked together as a cluster

Erklärung

Frage 25 von 200

1

Data Validation & Cleansing

Wähle eine oder mehr der folgenden:

Invalid data can skew and falsify analysis results
Unlike traditional enterprise data where the data structure is pre-defined and data is pre-validated, data input into big data analyses can be unstructured without any indication of validity
More than one independent variable can be tested at the same time
The _______ essencially acts a resource arbitrator that manages and allocates available resources

Erklärung

Frage 26 von 200

1

Data Validation & Cleansing

Wähle eine oder mehr der folgenden:

its complexity can further make it difficult to arrive at a set of suitable validation constraints
is dedicated to establishing (often complex) validation rules and removing any known invalid data
is closely related to the concept of classificatopm and clustering, although its algorithms focus on finding abnormal values
for batch analytics, ______________ can be achieved via an offline ETL operation

Erklärung

Frage 27 von 200

1

Data Validation & Cleansing

Wähle eine oder mehr der folgenden:

Big data solutions often receive redundant data across different datasets
this redundancy can be exploited to explore interconnected datasets in order to assemble validation parameters and fill in missing valid data
A ___ represents a geographic measure by which different regions are color-coded according to a certain theme
A _________ is a file system that can store large files spread across a cluster

Erklärung

Frage 28 von 200

1

Data Validation & Cleansing

Wähle eine oder mehr der folgenden:

for batch analytics, ______________ can be achieved via an offline ETL operation
The presence of invalid data is resulting in spikes. Although the data appears abnormal, it may be indicative of a new pattern
A _______ is generally expressed using a line chart, with time plotted on the x-axis and recorded data value plotted on the y-axis
for realtime analytics, a more complex in-memory system is required to validate and cleanse the data at the source

Erklärung

Frage 29 von 200

1

Data Validation & Cleansing

Wähle eine oder mehr der folgenden:

provenance can play an important role in determining the accuracy and quality of qustionable data
data that appears to be invalid may still be valuable in that it may possess hidden patterns and trends
No hypothesis or predetermined assumptions are generated
A _______ database is a non-relational database that is highly scalable, fault-tolerant and specifically designed to house unstructured data

Erklärung

Frage 30 von 200

1

Data Aggregation & Representation

Wähle eine oder mehr der folgenden:

Data may be spread across multiple datasets, requiring that datasets be joined together via common fields, in other cases, the same data fields may appear in multiple datasets
Either way, a method of data reconciliation is required or the dataset representing the correct value needs to be determined
Law of Diminishing Marginal Utility
A ______ can be in the form of a chart or a map

Erklärung

Frage 31 von 200

1

Data Aggregation & Representation

Wähle eine oder mehr der folgenden:

is dedicated to integrating multiple datasets together to arrive at a unified view
future dara analysis requirements need to be considered during this stage to help foster data reusability
The ________ mechanism can also be used for support distributed locks, support distributed queues, establish a highly available registry for obtaining configuration information, reliable asynchronous communication between processes that are running on different servers
essentially provides the ability to discover text rather than just search it

Erklärung

Frage 32 von 200

1

Data structure and Semantics

Wähle eine oder mehr der folgenden:

performing the stage of data aggregation & representation can become complicated because of differences in this
Reconciling these differences can require complex logic that is executed automatically without the need for human intervention
Within Big Data ________ can first be applied to discover if a relationship exists
both version are subjected to an experiment simultaneously, the observations are recorded to determine which version is more successful

Erklärung

Frage 33 von 200

1

Data structure

Wähle eine der folgenden:

Although the data format may be the same, the data model may be different
are an effective visual analysis technique for expressing patterns, data compositions via part-whole relations and geographic distributions of data
Data Adquisition & Filtering
Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Erklärung

Frage 34 von 200

1

Semantics

Wähle eine der folgenden:

A value that is labelled differently in two different datasets may mean the same thing
Instead of hard-coding the required learning rules, either supervised or unsupervised machine learning is applied to develop the computer's understanding of the __________
Network Analysis
In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Erklärung

Frage 35 von 200

1

Data Aggregation

Wähle eine oder mehr der folgenden:

The large volumes processed by Big Data solutions can make____________ a time and effort-intensive operation
Whether _____________ is required or not, it is important to understand that the same data can be stored in many different forms. One form may be better suited for a particular type of analysis than another
require processing resources that they request from the resource manager
the data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions

Erklärung

Frage 36 von 200

1

Data structure standarized

Wähle eine der folgenden:

can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database
A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes
The _____ essencially acts a resource arbitrator that manages and allocates available resources
comprise random read/writes that involve fewer joins and require low-latency responses, with a smaller data footprint

Erklärung

Frage 37 von 200

1

Data Analysis

Wähle eine oder mehr der folgenden:

is dedicated to carrying out the actual analysis task, which typically involves one or more types of analysis
this stage can be iterative in nature, especially if the _________________ is exploratory so that analysis is repeated until the appropiate pattern or correlation is uncovered
A _______ may internally use a processing engine to process multiple large datasets in parallel
the accuracy and applicability of the patterns and relationships that are found in a large dataset will be higher than that of a smaller dataset

Erklärung

Frage 38 von 200

1

Data Analysis

Wähle eine oder mehr der folgenden:

the exploratory analysis approach is explained shortly, along with confirmatory analysis
depending on the type of analytics required, this stage can be as simple as querying a dataset to compute an aggregation for comparision
make it possible to develop highly reliable, highly available distributed big data solutions that can be deployed in a cluster
Correlation, regression, time series analysis, classification, clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis are considered forms of ________

Erklärung

Frage 39 von 200

1

Data Analysis

Wähle eine oder mehr der folgenden:

it can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies, or to generate a statistical or mathematical model to depict relationship between variables
The approach taken when carrying out this stage can be classified as confirmatory analysis or exploratory analysis (the latter is linked to data mining)
The results of completing the _______________ stage provide users with the ability to perform visual analysis, allowing for the discovery of answers to questions that users have not yet even formulated
A given ______ may support either data ingress or egress functions

Erklärung

Frage 40 von 200

1

Confirmatory Data Analysis

Wähle eine oder mehr der folgenden:

_____________ is a deductive approach where the cause of phenomenon being investigated is proposed beforehand
the data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions
can be used to determine the number of entities that fall within a certain radius of another entity
can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database

Erklärung

Frage 41 von 200

1

hypothesis

Wähle eine der folgenden:

The proposed cause or assumption is called a ____________
is an item filtering technique based on the collaboration (merging) of users' past behavior
This type of environment is provided by a platform that is comprised of a set of distributed storage and processing technologies
As the amount of digitized documents, e-mails, social media posts and log files increases, business have an increasing need to leverage any value that can be extracted from these forms of semi-structured and unstructured data

Erklärung

Frage 42 von 200

1

Confirmatory Data Analysis

Wähle eine oder mehr der folgenden:

the data is then analyzed to prove or disprove the hypothesis and provide definitive answers to specific questions
Data samples are typically used
this information can then be integrated into the decision-making process
Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed

Erklärung

Frage 43 von 200

1

Exploratory Data Analysis

Wähle eine oder mehr der folgenden:

_____________ is an inductive approach that is closely associated to data mining
No hypothesis or predetermined assumptions are generated
can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics
is an item filtering technique based on the collaboration (merging) of users' past behavior

Erklärung

Frage 44 von 200

1

Exploratory Data Analysis

Wähle eine oder mehr der folgenden:

Instead, the data is explored through analysis to develop an understanding of the cause of the phenomenon
Although it may not provide definitive answers, this method provides a general direction that can facilitate the discovery of patterns or anomalies
represents a constant rate of change
Large amounts of data and visual analysis are typically used

Erklärung

Frage 45 von 200

1

Data visualization

Wähle eine oder mehr der folgenden:

The ability to analyze massive amounts of data and find useful insights carries little value if the only ones that can interpret the results are the analysis
is dedicated to using __________________ techniques and tools to graphically communicate the analysis results for efective interpretarion by business users
is the process of finding data that is significantly different from or inconsistent with the rest of the data within a given dataset
for batch analytics, ______________ can be achieved via an offline ETL operation

Erklärung

Frage 46 von 200

1

Data visualization

Wähle eine oder mehr der folgenden:

Business users need to be able to understand the results in order to obtain value from the analysis and subsequently have the ability to provide feedback
The results of completing the _______________ stage provide users with the ability to perform visual analysis, allowing for the discovery of answers to questions that users have not yet even formulated
Large amounts of data and visual analysis are typically used
is an item filtering technique focused on the similarity between users and items

Erklärung

Frage 47 von 200

1

Data visualization

Wähle eine oder mehr der folgenden:

The same results may be presented in a number of different ways, which can influence the interpretation of the results
Consequently, it is important to use the most suitable visualization technique by keeping the business domain in context
Another aspect to keep in mind is that providing a method of drilling down to comparatively simple statistics were generated
The objective is to use graphic representations to develop a deeper understanding of the data being analyzed. Specifically, it helps identify and highlight hiden patterns, correlations and anomalies

Erklärung

Frage 48 von 200

1

Analysis results

Wähle eine der folgenden:

Subsequent to __________ being made available to business users to support business decision-making (such as via dashboard), there may be further oportunities to utilize the __________
Natural Language Processing
A ___________ provides the ability to design and process a complex sequence of operations that can be triggered either at set time intervals or when data becomes available
includes both text and speech recognition

Erklärung

Frage 49 von 200

1

Utilization Analysis results

Wähle eine oder mehr der folgenden:

is dedicated to determining how and where processed analysis data can be further leveraged
Depending on the nature of the analysis problems being addressed, it is possible for the analysis results to produce "models" that encapsulate new insights and understandings about the nature of the patterns and relationships that exist within the data that was just analyzed
Data Transfer Engine
is generally applied via the following two approaches: collaborative ____________ and content-based ____________

Erklärung

Frage 50 von 200

1

Utilization Analysis results

Wähle eine oder mehr der folgenden:

A model look like a mathematical equation or a set of rules
Models can be used to improve business process logic, application system logic and can form the basis of a new system or software program
A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers
new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems

Erklärung

Frage 51 von 200

1

Input for enterprise systems

Wähle eine oder mehr der folgenden:

Filtering
An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient
the data analysis results may be automatically (or manually) fed directly into enterprise systems to enhance and optimized their behavior and performance
new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems

Erklärung

Frage 52 von 200

1

Business Process Optimization

Wähle eine oder mehr der folgenden:

The identified patterns, correlations and anormalies discovered during the data analysis are used to refine business processes
models may also lead to opportunities to improve business process logic
is a computer's ability to comprehend human speech and text as naturally understood by humans
When two variables are considered to be _____________ they are considered to be aligned based on a linear relationship

Erklärung

Frage 53 von 200

1

Alerts

Wähle eine der folgenden:

Data analysis results can be used as input for existing _______ or may form the basis of new _______
this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality
Text Analytics
Recommender systems may also be based on a hybrid of both collaborative _______ and content-based _______ to fine-tune the accuracy and effectiveness of generated suggestions

Erklärung

Frage 54 von 200

1

Data Analysis Techniques

Wähle eine oder mehr der folgenden:

Statistical Analysis
Visual Analysis
it can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies, or to generate a statistical or mathematical model to depict relationship between variables
Note that distributed file systems and databases are both on disk _________ mechanisms

Erklärung

Frage 55 von 200

1

Data Analysis Techniques

Wähle eine oder mehr der folgenden:

Machine Learning
Semantic Analysis
A _______ database is a non-relational database that is highly scalable, fault-tolerant and specifically designed to house unstructured data
Each node in the _____ has its own dedicated resources such as memory and hard drive and runs its own operating system just like a desktop computer

Erklärung

Frage 56 von 200

1

Statistical Analysis

Wähle eine oder mehr der folgenden:

A/B Testing
Correlation
Unstructured text is generally much more difficult to analyze and search, compared to structured text
Regression

Erklärung

Frage 57 von 200

1

Machine Learning

Wähle eine oder mehr der folgenden:

Classification
Clustering
The use of ________ can reduce development time and enables the manipulation of large datasets without the need to write complex programming logic
it is vital that metadata be machine-readable and passed forward along subsequent analysis stages

Erklärung

Frage 58 von 200

1

Machine Learning

Wähle eine oder mehr der folgenden:

Outlier Detection
Filtering
Some propietary ________ also provide specialized data analysis features, such as text analytics and machine log analysis processing
Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Erklärung

Frage 59 von 200

1

Visual Analysis

Wähle eine oder mehr der folgenden:

Heat Maps
Time series analysis
examples of appenden metadata can include dataset size and structure, source information, date and time of creation or collection, language-specific information, etc.
can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database

Erklärung

Frage 60 von 200

1

Visual Analysis

Wähle eine oder mehr der folgenden:

Network Analysis
Spatial Data Analysis
it can be based on either supervised or unsupervised learning
Invalid data can skew and falsify analysis results

Erklärung

Frage 61 von 200

1

Semantic Analysis

Wähle eine oder mehr der folgenden:

suggest that there is no relationship at all between the two variables
Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning
Natural Language Processing
Text Analytics

Erklärung

Frage 62 von 200

1

Semantic Analysis

Wähle eine der folgenden:

Sentiment Analysis
suggest that there is no relationship at all between the two variables
Applications for ___________ include fraud detection, medical diagnosis, network data analysis and sensor data analysis
Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed

Erklärung

Frage 63 von 200

1

Statistical Analysis

Wähle eine oder mehr der folgenden:

uses statistical methods based on mathematical formulas as a means for analyzing data
this type of analysis is commonly used to describe datasets via summarization, such as providing the mean, median or mode of statistics associated with the dataset
Spatial or geospatial data is commonly used to identify the geographic location of individual entities
The _____ essencially acts a resource arbitrator that manages and allocates available resources

Erklärung

Frage 64 von 200

1

Statistical Analysis

Wähle eine der folgenden:

it can also be used to infer patterns and relationships within the dataset, such as regression and correlation
is generally applied via the following two approaches: collaborative ____________ and content-based ____________
is a supervised learning technique by which data is classified into relevant, previously learned categories
We may be further interested in discovering how closely Variables A and B are related, which means we may also want to analyze the extend to which Variable B increases in relation to Variable A's increase

Erklärung

Frage 65 von 200

1

A/B Testing

Wähle eine oder mehr der folgenden:

also known as split or bucket testing compares two versions of an element to determine which version is superior based on a pre-defined metric
the element can be a range of things
is expressed as a decimal number between -1 to +1, which is known as the _____________ coefficient
Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________

Erklärung

Frage 66 von 200

1

A/B Testing

Wähle eine oder mehr der folgenden:

the current version of the element is called the control version, whereas the modified version is called the treatment
both version are subjected to an experiment simultaneously, the observations are recorded to determine which version is more successful
However, ___________ is always archieved through physically separate machines that are networked together as a cluster
Instead of hard-coding the required learning rules, either supervised or unsupervised machine learning is applied to develop the computer's understanding of the __________

Erklärung

Frage 67 von 200

1

A/B Testing

Wähle eine oder mehr der folgenden:

Although __________ can be implemented in almost any domain, it is most often used in marketing
Generally, the objective is to gauge human behavior with the goal of increasing sales
This is a traditional data analysis principle that claims that data held in a reasonably sized dataset provides the maximum value
Either way, a method of data reconciliation is required or the dataset representing the correct value needs to be determined

Erklärung

Frage 68 von 200

1

A/B Testing

Wähle eine der folgenden:

In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product
A ______ can be in the form of a chart or a map
for batch analytics, ______________ can be achieved via an offline ETL operation
Correlation, regression, time series analysis, classification, clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis are considered forms of ________

Erklärung

Frage 69 von 200

1

Correlation

Wähle eine oder mehr der folgenden:

is an analysis technique used to determine whether two variables are related to each other
if they are found to be related, the next step is to determine what their relationship is
Query Engine
In general, the more learning data the computer has, the more correctly it can decipher human text and speech

Erklärung

Frage 70 von 200

1

Correlation

Wähle eine oder mehr der folgenden:

We may be further interested in discovering how closely Variables A and B are related, which means we may also want to analyze the extend to which Variable B increases in relation to Variable A's increase
The use of ________ helps to develop and understanding of a dataset and find relationships that can assist in explaining a phenomenon
Network Analysis
Data may be spread across multiple datasets, requiring that datasets be joined together via common fields, in other cases, the same data fields may appear in multiple datasets

Erklärung

Frage 71 von 200

1

Correlation

Wähle eine oder mehr der folgenden:

Is therefore commonly used for data mining where the identification of relationships between variables in a dataset leads to the discovery of patterns and anomalies
This can reveal the nature of the dataset or the cause of a phenomenon
Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3
Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety

Erklärung

Frage 72 von 200

1

Correlated

Wähle eine oder mehr der folgenden:

When two variables are considered to be _____________ they are considered to be aligned based on a linear relationship
This means that when one variable changes, the other variable also changes proportionally and constantly
items can be ______ either based on a user's own behavior or by matching the behavior of multiple users
Note that a workflow engine may provide integration with a _______ to enable the automated import and export data

Erklärung

Frage 73 von 200

1

Correlation

Wähle eine oder mehr der folgenden:

is expressed as a decimal number between -1 to +1, which is known as the _____________ coefficient
The degree of relationship changes from being strong to weak when moving from -1 to 0 or +1 to 0
For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device
essentially provides the ability to discover text rather than just search it

Erklärung

Frage 74 von 200

1

+1

Wähle eine oder mehr der folgenden:

suggest that there is a strong positive relationship between the two variables
When one variable increases, the other also increases and viceversa
typically involve large quantities of data with sequential read/writes, and comprises a group of read or write queries
Data Extraction

Erklärung

Frage 75 von 200

1

0

Wähle eine oder mehr der folgenden:

suggest that there is no relationship at all between the two variables
when one increases, the other may stay the same, or increase or decrease arbitrarily
it can also be used to make predictions about the values of the dependent variable while it is still unknown
Generally, the objective is to gauge human behavior with the goal of increasing sales

Erklärung

Frage 76 von 200

1

-1

Wähle eine oder mehr der folgenden:

suggest that there is a strong negative relationship between the two variables
when one variable increases, the other decreases and viceversa
can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics
Therefore, the value of each additional batch does not diminish value; rather, it provides more value

Erklärung

Frage 77 von 200

1

Regression

Wähle eine oder mehr der folgenden:

The analysis technique of _________ explores how a dependent variable is related to an independent variable within a dataset
As a sample scenario, __________ could help determine the type of relationship that exists between temperature (independent variable) and crop yield (dependent variable)
In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample
Data Analysis

Erklärung

Frage 78 von 200

1

Regression

Wähle eine oder mehr der folgenden:

Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable
When the independent variable increases, for example, does the dependent variable also increase? If yes, is the increase in a linear or non-linear proportion?
Business Case Evaluation
A _________ comprises grouped read/writes, with a larger data footprint consisting of complex joins and high-latency responses

Erklärung

Frage 79 von 200

1

Regression

Wähle eine oder mehr der folgenden:

More than one independent variable can be tested at the same time
However, in such cases only one independent variable may change. The others are kept constant
The results of completing the _______________ stage provide users with the ability to perform visual analysis, allowing for the discovery of answers to questions that users have not yet even formulated
a list of possible third-party data providers (data markets and publicity available datasets), are generally compiled. Some forms of external data may be embedded within blogs or other types of content-based Websites, in which case they may need to be harvested via automated tools

Erklärung

Frage 80 von 200

1

Regression

Wähle eine oder mehr der folgenden:

can help enable a better undestanding of what a phenomenon is, and why it occurred
it can also be used to make predictions about the values of the dependent variable while it is still unknown
Users of Big Data solutions can make numerous data processing requests, each of which can have different processing workload requirements
The _____ essencially acts a resource arbitrator that manages and allocates available resources

Erklärung

Frage 81 von 200

1

Linear regression

Wähle eine der folgenden:

represents a constant rate of change
it can be as challenging as combining data mining and complex statistical analysis techniques to discover patterns and anomalies, or to generate a statistical or mathematical model to depict relationship between variables
the ______ states that the confidence with which predictions can be made increases as the size of the data that is being analyzed increases
Data samples are typically used

Erklärung

Frage 82 von 200

1

Non-linear regression

Wähle eine der folgenden:

This type of environment is provided by a platform that is comprised of a set of distributed storage and processing technologies
Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes
A ________ is a method of storing and organizing data on a storage medium, such as hard drives, DVD´s, and flash drives
represents the variable rate of change

Erklärung

Frage 83 von 200

1

Correlation

Wähle eine oder mehr der folgenden:

does not imply a causation. The change in the value of one variable may not be responsible for the change in the value of the second variable, although both may change at the same rate
assumes that both variables are independent
Within Big Data ________ can first be applied to discover if a relationship exists
However, ___________ is always archieved through physically separate machines that are networked together as a cluster

Erklärung

Frage 84 von 200

1

Regression

Wähle eine oder mehr der folgenden:

deal with already identified dependent and independent variables
implies that there is a degree of causation between the dependent and independent variables that may be direct or indirect
can then be applied to further explore the relationship and predict the values of the dependent variable, based on the known values of the independent variables
is an example of the application of the law of large numbers

Erklärung

Frage 85 von 200

1

Visual Analysis

Wähle eine oder mehr der folgenden:

is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception
based on the premise that humans can understand and draw conclusions from graphics more quickly than from text, _______ act as a discovery tool in the field of Big Data
As the amount of digitized documents, e-mails, social media posts and log files increases, business have an increasing need to leverage any value that can be extracted from these forms of semi-structured and unstructured data
Is therefore commonly used for data mining where the identification of relationships between variables in a dataset leads to the discovery of patterns and anomalies

Erklärung

Frage 86 von 200

1

Visual Analysis

Wähle eine oder mehr der folgenden:

The objective is to use graphic representations to develop a deeper understanding of the data being analyzed. Specifically, it helps identify and highlight hiden patterns, correlations and anomalies
is also directly related to exploratory data analysis, as it encourages the formulation of questions from different angles
Workflow Engine
require processing resources that they request from the resource manager

Erklärung

Frage 87 von 200

1

Heat Maps

Wähle eine oder mehr der folgenden:

are an effective visual analysis technique for expressing patterns, data compositions via part-whole relations and geographic distributions of data
they also facilitate the identification of areas of interest and the discovery of extreme (high/low) values within a dataset
Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3
Data analysis results can be used as input for existing _______ or may form the basis of new _______

Erklärung

Frage 88 von 200

1

Heat Maps

Wähle eine oder mehr der folgenden:

The ______ itself is a visual, color-coded representation of data values
Each value is given a color according to its type, or the range that it falls under
Solely analyzing operational (structured) data may cause businesses to miss out on cost-saving or business expansion opportunities, especially those that are customer-focused
Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning

Erklärung

Frage 89 von 200

1

Heat Maps

Wähle eine oder mehr der folgenden:

A ______ can be in the form of a chart or a map
Instead of coloring the whole region, the map may be superimposed by a layer made up of collections of colored shapes representing various regions
suggestions commonly pertain to recommending items, such as movies, books, web pages, people, etc.
Sentiment Analysis

Erklärung

Frage 90 von 200

1

chart

Wähle eine der folgenden:

A _____ represents a matrix of values in which each cell is color-coded according to the value
in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety
items can be ______ either based on a user's own behavior or by matching the behavior of multiple users
Big data solutions often receive redundant data across different datasets

Erklärung

Frage 91 von 200

1

map

Wähle eine der folgenden:

A ___ represents a geographic measure by which different regions are color-coded according to a certain theme
Although __________ can be implemented in almost any domain, it is most often used in marketing
NLP, Text analytics and sentiment analysis be used in support of __________
As a sample scenario, __________ could help determine the type of relationship that exists between temperature (independent variable) and crop yield (dependent variable)

Erklärung

Frage 92 von 200

1

Heat Maps

Wähle eine der folgenden:

Instead of coloring the whole region, the map may be superimposed by a layer made up of collections of colored shapes representing various regions
Data needs to be imported before it can be processed by the big data solution
The data collected for _______ is always time-dependent
Named Entities(person, group, place, company), Pattern-Based Entities(social insurance number, zip code), Concepts (an abstract representation of a entity), Facts (relationship between entities)

Erklärung

Frage 93 von 200

1

Time series Analysis

Wähle eine oder mehr der folgenden:

is the analysis of data that is recorded over periodic intervals of time
this type of analysis makes use of _________, which is a time-ordered collections of values recorded over regular time intervals
in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety
data that appears to be invalid may still be valuable in that it may possess hidden patterns and trends

Erklärung

Frage 94 von 200

1

Time series Analysis

Wähle eine oder mehr der folgenden:

helps to uncover patterns within data that are time-dependent. Once identified, the pattern can be extrapolated for future predictions.
are usually used for forecasting by identifying long-term trends, seasonal periodic patterns and irregular short-term variations in the dataset
For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device
Big Data solutions can be partially or fully deployed in clouds in order to leverage the storage and computing resources that are available from the cloud provider

Erklärung

Frage 95 von 200

1

Time series Analysis

Wähle eine oder mehr der folgenden:

Unlike other types of analyses, _________ always includes time as a comparision variable
The data collected for _______ is always time-dependent
Data samples are typically used
is solely based on the similarity between users' behavior, and requires a large amount of user behavior data in order to accurately filter items

Erklärung

Frage 96 von 200

1

Time series Analysis

Wähle eine der folgenden:

A _______ is generally expressed using a line chart, with time plotted on the x-axis and recorded data value plotted on the y-axis
is the specialized analysis of text through the application of data mining, machine learning and natural language processing techniques to extract value out of unstructured text
new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems
depending on the type of data source, data may come as a dump of files (such as data purchased from a third-party data provider), or may require API integration (such as with Twitter)

Erklärung

Frage 97 von 200

1

Network

Wähle eine der folgenden:

Within the context of visual analysis, a _______ is a interconnected collection of entities
new models may be used to improve the programming logic within existing enterprise systems or may form the basis of new systems
this type of analysis makes use of _________, which is a time-ordered collections of values recorded over regular time intervals
Consequently, it is important to use the most suitable visualization technique by keeping the business domain in context

Erklärung

Frage 98 von 200

1

Entity

Wähle eine oder mehr der folgenden:

An ____ can be a person, a group or some other business domain object such as a product
may be connected with another directly or indirectly
is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception
Also known as online processing, ____________ processing follows an approach whereby data is processed interactively, without delay (resulting in low-latency responses)

Erklärung

Frage 99 von 200

1

Network Analysis

Wähle eine oder mehr der folgenden:

Some connections may only be one-way, so that traversal in the reverse direction is not possible
is a technique that focuses on analyzing relationships between entities within the network
For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device
provides analysis features more sophisticated than those of heat maps

Erklärung

Frage 100 von 200

1

Network Analysis

Wähle eine oder mehr der folgenden:

it involves plotting entities as nodes and connections as edges between nodes
There are specialized variations of __________ include route optimization, social network analysis and spread prediction
Unlike other types of analyses, _________ always includes time as a comparision variable
are based on predictive analytics techniques and therefore are associated with the same analysis techniques as predictive analytics. Additionally, _____ may utilize heat maps, network analysis and spatial data analysis to graphically show various outcomes

Erklärung

Frage 101 von 200

1

Spatial Data Analysis

Wähle eine oder mehr der folgenden:

focused on analyzing location-based data in order to find different geographic relationship and patterns between entities
Spatial or geospatial data is commonly used to identify the geographic location of individual entities
in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety
The ________ mechanism can also be used for support distributed locks, support distributed queues, establish a highly available registry for obtaining configuration information, reliable asynchronous communication between processes that are running on different servers

Erklärung

Frage 102 von 200

1

Spatial Data Analysis

Wähle eine oder mehr der folgenden:

is manipulated through a geographical information system (GIS) that plots spatial data on a map generally using its longitude and latitude coordinates
With the ever-increasing availability of location-based data, _________ can be analyzed to gain location insights
is dedicated to establishing (often complex) validation rules and removing any known invalid data
Correlation

Erklärung

Frage 103 von 200

1

Spatial Data Analysis

Wähle eine oder mehr der folgenden:

Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning
Data used as input for_________ can either contain exact locations (longitude,latitude) or the information required to calculate locations (such as zip codes or IP addresses)
the accuracy and applicability of the patterns and relationships that are found in a large dataset will be higher than that of a smaller dataset
helps decision-makers understand the business resources that will need to be utilized and which business challenges the analysis will tackle

Erklärung

Frage 104 von 200

1

Spatial Data Analysis

Wähle eine oder mehr der folgenden:

provides analysis features more sophisticated than those of heat maps
can be used to determine the number of entities that fall within a certain radius of another entity
the need to address disparate types of data is more likely with data from external sources
is dedicated to establishing (often complex) validation rules and removing any known invalid data

Erklärung

Frage 105 von 200

1

Machine Learning

Wähle eine oder mehr der folgenden:

Law of large numbers
Law of Diminishing Marginal Utility
A target user´s past behavior (likes, rating, purchase history, etc.) is collaborated with the behavior of similar users
initial iteration of the big data analysis lifecycle will require more up-front investment of Big Data technologies, products and training compared to later iterations where these earlier investment can be repeatedly leveraged

Erklärung

Frage 106 von 200

1

Law of large numbers

Wähle eine oder mehr der folgenden:

the ______ states that the confidence with which predictions can be made increases as the size of the data that is being analyzed increases
the accuracy and applicability of the patterns and relationships that are found in a large dataset will be higher than that of a smaller dataset
Data Extraction
is an analysis technique used to determine whether two variables are related to each other

Erklärung

Frage 107 von 200

1

Law of large numbers

Wähle eine der folgenden:

this means that the greater the amount of data available for analysis, the better we become at making correct decisions
Within computing, a ______ is a tightly coupled collection of servers, or nodes. These servers usually have the same hardware specifications and are connected together via network to work as a single unit
Unlike traditional enterprise data where the data structure is pre-defined and data is pre-validated, data input into big data analyses can be unstructured without any indication of validity
Classification

Erklärung

Frage 108 von 200

1

Law of Diminishing Marginal Utility

Wähle eine oder mehr der folgenden:

In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample
This is a traditional data analysis principle that claims that data held in a reasonably sized dataset provides the maximum value
A _______ provides a logical view of the data stored on the storage medium as a tree structure of files and directories
this redundancy can be exploited to explore interconnected datasets in order to assemble validation parameters and fill in missing valid data

Erklärung

Frage 109 von 200

1

Law of Diminishing Marginal Utility

Wähle eine oder mehr der folgenden:

The ____ does not apply to Big Data
The greater the volume and variety of data that Big Data solutions can process allows for each additional batch of data to carry greater potential of unearthing new patterns and anomalies
for speech recognition, the system attemps to comprehend the speech and then performs an action, such as transcribing text
Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable

Erklärung

Frage 110 von 200

1

Law of Diminishing Marginal Utility

Wähle eine der folgenden:

Therefore, the value of each additional batch does not diminish value; rather, it provides more value
Classification
This means that when one variable changes, the other variable also changes proportionally and constantly
is an example of the application of the law of large numbers

Erklärung

Frage 111 von 200

1

Classification

Wähle eine oder mehr der folgenden:

is a supervised learning technique by which data is classified into relevant, previously learned categories
the system is fed data that is already categorized or labeled, so that it can develop an understanding of the different categories
therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage
Therefore, the value of each additional batch does not diminish value; rather, it provides more value

Erklärung

Frage 112 von 200

1

Classification

Wähle eine oder mehr der folgenden:

the system is fed unknown (but similar) data for classification, based on the understanding it developed
a common application of this technique is for the filtering of e-mail spam. Note that ___________ can be performed for two or more categories
can help enable a better undestanding of what a phenomenon is, and why it occurred
it involves plotting entities as nodes and connections as edges between nodes

Erklärung

Frage 113 von 200

1

Classification

Wähle eine der folgenden:

in a simplified _____ process, the machine is fed labeled data during training that builds its understanding of the _______. The machine is then fed unlabeled data, which is classifies itself
also known as split or bucket testing compares two versions of an element to determine which version is superior based on a pre-defined metric
A file is an atomic unit of storage used by the _________ to stored data. Files are organizated inside of a directory
The objective is to use graphic representations to develop a deeper understanding of the data being analyzed. Specifically, it helps identify and highlight hiden patterns, correlations and anomalies

Erklärung

Frage 114 von 200

1

Clustering

Wähle eine oder mehr der folgenden:

is an unsupervised learning technique by which data is divided into different groups so that the data in each group has similar properties
There is no prior learning of categories required; intead, categories are implicity generated based on the data groupings
Big Data solutions can be partially or fully deployed in clouds in order to leverage the storage and computing resources that are available from the cloud provider
Applications include document classification and search, as well as builiding a 360-degree view of a customer by extracting information from a CRM system

Erklärung

Frage 115 von 200

1

Clustering

Wähle eine oder mehr der folgenden:

How the data is grouped depends on the type of algorithm used. Each algorithm uses a different technique to identify ______
is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data
Is solely dedicated to individual user preferences and does not require data about other users
A ______ can be in the form of a chart or a map

Erklärung

Frage 116 von 200

1

Clustering

Wähle eine der folgenden:

can be applied to the categorization of unknown documents, and to personalized marketing campaings by grouping together customers with similar behavior
Within computing, a ______ is a tightly coupled collection of servers, or nodes. These servers usually have the same hardware specifications and are connected together via network to work as a single unit
items can be ______ either based on a user's own behavior or by matching the behavior of multiple users
the ______ states that the confidence with which predictions can be made increases as the size of the data that is being analyzed increases

Erklärung

Frage 117 von 200

1

Outlier Detection

Wähle eine oder mehr der folgenden:

is the process of finding data that is significantly different from or inconsistent with the rest of the data within a given dataset
The machine learning technique is used to identify anomalies, abnormalities and desviation that can be advantageous (such as oportunities) or disadvantageous (such a risk)
The data collected for _______ is always time-dependent
for realtime analytics, a more complex in-memory system is required to validate and cleanse the data at the source

Erklärung

Frage 118 von 200

1

Outlier Detection

Wähle eine oder mehr der folgenden:

is closely related to the concept of classificatopm and clustering, although its algorithms focus on finding abnormal values
it can be based on either supervised or unsupervised learning
involves the simultaneous execution of multiple sub-tasks that collectivelly comprise a larger task
it can also be used to infer patterns and relationships within the dataset, such as regression and correlation

Erklärung

Frage 119 von 200

1

Outlier Detection

Wähle eine der folgenden:

Applications for ___________ include fraud detection, medical diagnosis, network data analysis and sensor data analysis
suggest that there is a strong positive relationship between the two variables
To the client, a file appears local and can be accessed via multiple locations
Heat Maps

Erklärung

Frage 120 von 200

1

Filtering

Wähle eine oder mehr der folgenden:

is the automated process of finding relevant items from a pool of items
items can be ______ either based on a user's own behavior or by matching the behavior of multiple users
The ____ does not apply to Big Data
Although __________ can be implemented in almost any domain, it is most often used in marketing

Erklärung

Frage 121 von 200

1

Filtering

Wähle eine oder mehr der folgenden:

is generally applied via the following two approaches: collaborative ____________ and content-based ____________
A common medium by which ________ is implemented is via the use of a recommender system
A given ______ may support either data ingress or egress functions
A ________ generally provides only one of the listed functions

Erklärung

Frage 122 von 200

1

Colaborative Filtering

Wähle eine oder mehr der folgenden:

is an item filtering technique based on the collaboration (merging) of users' past behavior
A target user´s past behavior (likes, rating, purchase history, etc.) is collaborated with the behavior of similar users
We may be further interested in discovering how closely Variables A and B are related, which means we may also want to analyze the extend to which Variable B increases in relation to Variable A's increase
There are specialized variations of __________ include route optimization, social network analysis and spread prediction

Erklärung

Frage 123 von 200

1

Colaborative Filtering

Wähle eine oder mehr der folgenden:

Based on the similarity of the user´s behavior, items are filtered for the target user
is solely based on the similarity between users' behavior, and requires a large amount of user behavior data in order to accurately filter items
A _____ represents a matrix of values in which each cell is color-coded according to the value
The presence of invalid data is resulting in spikes. Although the data appears abnormal, it may be indicative of a new pattern

Erklärung

Frage 124 von 200

1

Colaborative Filtering

Wähle eine der folgenden:

is an example of the application of the law of large numbers
A ______ can be in the form of a chart or a map
the system is fed unknown (but similar) data for classification, based on the understanding it developed
_____________ is an inductive approach that is closely associated to data mining

Erklärung

Frage 125 von 200

1

Content-based Filtering

Wähle eine oder mehr der folgenden:

is an item filtering technique focused on the similarity between users and items
A user profile is created based on the user´s past behavior (likes, rating, purchase history, etc.)
Analytics Engine
can be used to determine the number of entities that fall within a certain radius of another entity

Erklärung

Frage 126 von 200

1

Content-based Filtering

Wähle eine oder mehr der folgenden:

The similarities identified between the user profile and the attributes of various items, lead to items being filtered for the user
Is solely dedicated to individual user preferences and does not require data about other users
Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable
However, in such cases only one independent variable may change. The others are kept constant

Erklärung

Frage 127 von 200

1

Filtering

Wähle eine oder mehr der folgenden:

A recommender system predicts user preferences and generate suggestions for the user accordingly
suggestions commonly pertain to recommending items, such as movies, books, web pages, people, etc.
represents a constant rate of change
can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics

Erklärung

Frage 128 von 200

1

Filtering

Wähle eine oder mehr der folgenden:

A recommender system typically uses either colaborative _____ or content-based _________ to generate suggestions
Recommender systems may also be based on a hybrid of both collaborative _______ and content-based _______ to fine-tune the accuracy and effectiveness of generated suggestions
Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________
Data analysis results can be used as input for existing _______ or may form the basis of new _______

Erklärung

Frage 129 von 200

1

Semantic Analysis

Wähle eine der folgenden:

In order for the machines to extract valuable information, text and speech data needs to be understood by the machines in the same way as humans do. _____ represents practices for extracting meaningful information from textual and speech data
require processing resources that they request from the resource manager
The same results may be presented in a number of different ways, which can influence the interpretation of the results
Instead, the data is explored through analysis to develop an understanding of the cause of the phenomenon

Erklärung

Frage 130 von 200

1

Natural Language Processing

Wähle eine oder mehr der folgenden:

is a computer's ability to comprehend human speech and text as naturally understood by humans
this allows computers to perform a variety of useful task, such as full-text searches
A _______ database generally provides an API-based query interface, rather than the SQL Interface
Each node in the _____ has its own dedicated resources such as memory and hard drive and runs its own operating system just like a desktop computer

Erklärung

Frage 131 von 200

1

Natural Language Processing

Wähle eine oder mehr der folgenden:

Instead of hard-coding the required learning rules, either supervised or unsupervised machine learning is applied to develop the computer's understanding of the __________
In general, the more learning data the computer has, the more correctly it can decipher human text and speech
A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers
functionally can be further grouped into the following categories: event, file, relational

Erklärung

Frage 132 von 200

1

Natural Language Processing

Wähle eine oder mehr der folgenden:

includes both text and speech recognition
for speech recognition, the system attemps to comprehend the speech and then performs an action, such as transcribing text
A user profile is created based on the user´s past behavior (likes, rating, purchase history, etc.)
The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming

Erklärung

Frage 133 von 200

1

Text Analytics

Wähle eine oder mehr der folgenden:

Unstructured text is generally much more difficult to analyze and search, compared to structured text
is the specialized analysis of text through the application of data mining, machine learning and natural language processing techniques to extract value out of unstructured text
As the amount of digitized documents, e-mails, social media posts and log files increases, business have an increasing need to leverage any value that can be extracted from these forms of semi-structured and unstructured data
Useful insights from text-based data can be gained by helping businesses develop an understanding of the information that is contained within a large body of text

Erklärung

Frage 134 von 200

1

Text Analytics

Wähle eine oder mehr der folgenden:

essentially provides the ability to discover text rather than just search it
The basic tenet of ___________ is to turn unstructured text into data that can be searched and analyzed
Analysts working with big data solutions are not expected to know how to program processing engines
comprise random read/writes that involve fewer joins and require low-latency responses, with a smaller data footprint

Erklärung

Frage 135 von 200

1

Text Analytics

Wähle eine oder mehr der folgenden:

Solely analyzing operational (structured) data may cause businesses to miss out on cost-saving or business expansion opportunities, especially those that are customer-focused
Applications include document classification and search, as well as builiding a 360-degree view of a customer by extracting information from a CRM system
However, in such cases only one independent variable may change. The others are kept constant
is a form of data analysis that involves the graphic representation of data to enable or enhance its visual perception

Erklärung

Frage 136 von 200

1

Text Analytics

Wähle eine oder mehr der folgenden:

generally involves two steps: Parsing text within documents to extract, Categorization of documents using these extracted entities and facts
the extracted information can be used to perform a context-specific search on entities, based on the type of relationship that exists between the entities
identifying a wider variety of data sources may increase the probability of finding hidden patterns and correlations
Similarly, processed data may need to be exported to other systems before it can be used outside of the big data solution

Erklärung

Frage 137 von 200

1

Parsing text within documents to extract:

Wähle eine der folgenden:

Named Entities(person, group, place, company), Pattern-Based Entities(social insurance number, zip code), Concepts (an abstract representation of a entity), Facts (relationship between entities)
Data Extraction
is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data
The ________ mechanisms is responsible for processing data (usually retrieved from storage devices) based on pre-defined logic, in order to produce a result

Erklärung

Frage 138 von 200

1

Sentiment Analysis

Wähle eine oder mehr der folgenden:

is a specialized form of text analysis that focuses on determining the bias or emotions of individuals
this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language
The proposed cause or assumption is called a ____________
In other areas such as the scientific domains, the objective may simply be to observe which version works better in order to improve a process or product

Erklärung

Frage 139 von 200

1

Sentiment Analysis

Wähle eine oder mehr der folgenden:

not only provides information about how individuals feel, but also the intensity of their feeling
this information can then be integrated into the decision-making process
Machine Learning
Instead, the data is explored through analysis to develop an understanding of the cause of the phenomenon

Erklärung

Frage 140 von 200

1

Sentiment Analysis

Wähle eine der folgenden:

Common applications for __________ include early identification of customer satisfaction or dissatisfaction, gauging product sucess or failure and spotting new trends
Utilization of Analysis Results
Generally, the objective is to gauge human behavior with the goal of increasing sales
are usually divided into two types: Batch and Transactional

Erklärung

Frage 141 von 200

1

Quantitative Analysis

Wähle eine der folgenden:

Correlation and regression are examples of ________. A/B testing can make use of ____________ techniques for results comparision.
Unstructured text is generally much more difficult to analyze and search, compared to structured text
Storage Device
Clustering

Erklärung

Frage 142 von 200

1

Qualitative Analysis

Wähle eine der folgenden:

NLP, Text analytics and sentiment analysis be used in support of __________
Machine Learning
To the client, a file appears local and can be accessed via multiple locations
For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Erklärung

Frage 143 von 200

1

Data Mining

Wähle eine der folgenden:

can be carried out via the use of supported by correlation, heat maps, time series analysis, network analysis, spatial data analysis, clustering, outlier detection, natural language processing and text analytics
this stage can be iterative in nature, especially if the _________________ is exploratory so that analysis is repeated until the appropiate pattern or correlation is uncovered
An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient
metadata is added through an automated mechanism to data received from both internal and external data sources

Erklärung

Frage 144 von 200

1

Descriptive Analytics

Wähle eine der folgenden:

A/B testing, heat maps and spatial data analysis are considered forms of ____________
There are specialized variations of __________ include route optimization, social network analysis and spread prediction
The output of one workflow can become the input of another workflow
Strategic BI and analytics fall in this category, since they are highly read intensive task involving large volumes of data

Erklärung

Frage 145 von 200

1

Diagnostic Analytics

Wähle eine der folgenden:

Correlation, regression, time series analysis, network analysis and spatial data analysis are considered forms of _________
The workflow logic processed by a _____________ mechanism can involve the participation of other big data mechanism
The ________ mechanisms is responsible for processing data (usually retrieved from storage devices) based on pre-defined logic, in order to produce a result
in order to qualify as a Big Data Problem, a business problem needs to be directly related to one or more of the Big Data characteristics of volume, velocity or variety

Erklärung

Frage 146 von 200

1

Predictive Analysis

Wähle eine der folgenden:

Correlation, regression, time series analysis, classification, clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis are considered forms of ________
is an unsupervised learning technique by which data is divided into different groups so that the data in each group has similar properties
Named Entities(person, group, place, company), Pattern-Based Entities(social insurance number, zip code), Concepts (an abstract representation of a entity), Facts (relationship between entities)
provenance can play an important role in determining the accuracy and quality of qustionable data

Erklärung

Frage 147 von 200

1

Prescriptive Analytics

Wähle eine der folgenden:

are based on predictive analytics techniques and therefore are associated with the same analysis techniques as predictive analytics. Additionally, _____ may utilize heat maps, network analysis and spatial data analysis to graphically show various outcomes
Applying this technique helps determine how the value of the dependent variable changes in relation to change in the value of the independent variable
This can reveal the nature of the dataset or the cause of a phenomenon
Time series analysis

Erklärung

Frage 148 von 200

1

Supervised Learning

Wähle eine der folgenden:

Classification, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________
A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes
uses statistical methods based on mathematical formulas as a means for analyzing data
A user profile is created based on the user´s past behavior (likes, rating, purchase history, etc.)

Erklärung

Frage 149 von 200

1

Unsupervised Learning

Wähle eine der folgenden:

Clustering, outlier detection, filtering, natural language processing, text analytics and sentiment analysis can utilize ___________
Data analysis results can be used as input for existing _______ or may form the basis of new _______
A _____ represents a matrix of values in which each cell is color-coded according to the value
Models can be used to improve business process logic, application system logic and can form the basis of a new system or software program

Erklärung

Frage 150 von 200

1

Cluster

Wähle eine oder mehr der folgenden:

Within computing, a ______ is a tightly coupled collection of servers, or nodes. These servers usually have the same hardware specifications and are connected together via network to work as a single unit
Each node in the _____ has its own dedicated resources such as memory and hard drive and runs its own operating system just like a desktop computer
These engines may provide the agent-based processing of inflight data, which enables various data cleasing and transformation activities to be performed in realtime
Unexpected findings or anomalies are usually ignored since a predetermined cause was assumed

Erklärung

Frage 151 von 200

1

Cluster

Wähle eine der folgenden:

In the diagram, a _____ is used to execute a task based on distributed / parallel data processing frameworks
A/B Testing
Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety
Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3

Erklärung

Frage 152 von 200

1

File System

Wähle eine oder mehr der folgenden:

A ________ is a method of storing and organizing data on a storage medium, such as hard drives, DVD´s, and flash drives
A file is an atomic unit of storage used by the _________ to stored data. Files are organizated inside of a directory
This can reveal the nature of the dataset or the cause of a phenomenon
The proposed cause or assumption is called a ____________

Erklärung

Frage 153 von 200

1

File System

Wähle eine oder mehr der folgenden:

A _______ provides a logical view of the data stored on the storage medium as a tree structure of files and directories
Operating systems employ ______ for data storage. Each operating system provides support for one or more ________, like NTFS for windows and ext for linux
this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language
Within Big Data ________ can first be applied to discover if a relationship exists

Erklärung

Frage 154 von 200

1

Distributed File System

Wähle eine oder mehr der folgenden:

A _________ is a file system that can store large files spread across a cluster
To the client, a file appears local and can be accessed via multiple locations
is the process of finding data that is significantly different from or inconsistent with the rest of the data within a given dataset
Natural Language Processing

Erklärung

Frage 155 von 200

1

Distributed File System

Wähle eine der folgenden:

Examples include the Google File System and Hadoop ________
requires that a business case be created, assessed and approved prior to proceeding with the actual hands-on analysis task
Machine Learning
Data Aggregation & Representation

Erklärung

Frage 156 von 200

1

NoSQL

Wähle eine oder mehr der folgenden:

A _______ database is a non-relational database that is highly scalable, fault-tolerant and specifically designed to house unstructured data
A _______ database generally provides an API-based query interface, rather than the SQL Interface
The use of ________ helps to develop and understanding of a dataset and find relationships that can assist in explaining a phenomenon
when one increases, the other may stay the same, or increase or decrease arbitrarily

Erklärung

Frage 157 von 200

1

NoSQL

Wähle eine der folgenden:

However, some _______ databases may also provide a SQL-like query interface
this allows computers to perform a variety of useful task, such as full-text searches
depending on the type of analytics required, this stage can be as simple as querying a dataset to compute an aggregation for comparision
processing engine, storage device, resource manager

Erklärung

Frage 158 von 200

1

Parallel Data Processing

Wähle eine oder mehr der folgenden:

involves the simultaneous execution of multiple sub-tasks that collectivelly comprise a larger task
the premise is to reduce the execution time by dividing a single larger task into multiple smaller task
A _________ in Big Data os defined as the amount and nature of data that is processed within a certain amount of time
For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device

Erklärung

Frage 159 von 200

1

Parallel Data Processing

Wähle eine der folgenden:

Although __________ can be archieved through multiple networked machines, it is more typically achieved within the confines of a single machine (multiple processors or cores)
the system is fed data that is already categorized or labeled, so that it can develop an understanding of the different categories
Law of Diminishing Marginal Utility
provides analysis features more sophisticated than those of heat maps

Erklärung

Frage 160 von 200

1

Distributed Data Processing

Wähle eine oder mehr der folgenden:

is closely related to parallel data processing in how the same principle of "divide-and-conquer" is applied
However, ___________ is always archieved through physically separate machines that are networked together as a cluster
essentially provides the ability to discover text rather than just search it
This allows large amounts of data to be imported or exported within a short period of time

Erklärung

Frage 161 von 200

1

Processing workloads

Wähle eine oder mehr der folgenden:

A _________ in Big Data os defined as the amount and nature of data that is processed within a certain amount of time
are usually divided into two types: Batch and Transactional
includes both text and speech recognition
A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers

Erklärung

Frage 162 von 200

1

Batch Workload

Wähle eine oder mehr der folgenden:

Also known as offline processing, ________ processing involves processing data in batches and usually imposes delays (resulting in high-latency responses)
typically involve large quantities of data with sequential read/writes, and comprises a group of read or write queries
also known as split or bucket testing compares two versions of an element to determine which version is superior based on a pre-defined metric
An ____ can be a person, a group or some other business domain object such as a product

Erklärung

Frage 163 von 200

1

Batch Workload

Wähle eine oder mehr der folgenden:

Queries can be complex and involve multiple joins
Strategic BI and analytics fall in this category, since they are highly read intensive task involving large volumes of data
Sentiment Analysis
NLP, Text analytics and sentiment analysis be used in support of __________

Erklärung

Frage 164 von 200

1

Batch Workload

Wähle eine der folgenden:

A _________ comprises grouped read/writes, with a larger data footprint consisting of complex joins and high-latency responses
Spatial Data Analysis
Correlation, regression, time series analysis, network analysis and spatial data analysis are considered forms of _________
Users of Big Data solutions can make numerous data processing requests, each of which can have different processing workload requirements

Erklärung

Frage 165 von 200

1

Transactional workload

Wähle eine oder mehr der folgenden:

Also known as online processing, ____________ processing follows an approach whereby data is processed interactively, without delay (resulting in low-latency responses)
involves small amounts of data with random read/writes
Data Validation & Cleansing
A/B Testing

Erklärung

Frage 166 von 200

1

Transactional workload

Wähle eine oder mehr der folgenden:

OLTP and operational systems (write-intensive) as well as operational BI and analytics (read-intensive), both fall within this category
Although these workloads contain a mix of read/write queries, they are generally more write-intensive than read-intensive
can act as a common denominator that can be used for a range of analysis techniques and projects. This can require establishing a central, standard analysis repository, such as a NoSQL database
comprise random read/writes that involve fewer joins and require low-latency responses, with a smaller data footprint

Erklärung

Frage 167 von 200

1

Cloud Computing

Wähle eine oder mehr der folgenden:

is a specialized form of distibuted computing that introduce utilization models for remotely provisioning scalable and measured IT resources
Big Data solutions can be partially or fully deployed in clouds in order to leverage the storage and computing resources that are available from the cloud provider
Data samples are typically used
It can also represent hierarchical values by using color-coded nested rectangles

Erklärung

Frage 168 von 200

1

Cloud Computing

Wähle eine oder mehr der folgenden:

the clustered processing resources required by Big Data solutions can benefit from the highly scalable and elastic IT resources available on cloud-based environments
Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes
is generally applied via the following two approaches: collaborative ____________ and content-based ____________
In short _______ provides the three ingredients required for a big data solutions: input data, computing and storage

Erklärung

Frage 169 von 200

1

It makes sence from enterprises already using cloud computing to reuse the cloud from their Big Data initiatives, because:

Wähle eine oder mehr der folgenden:

IT already possesses the required cloud computing skills
the imput data already exists in the cloud
Correlation, regression, time series analysis, network analysis and spatial data analysis are considered forms of _________
In the context of traditional data analysis, the ______ states that, starting with a reasonably large sample size, the value obtained from the analysis of additional data decreases as more data is successively added to the original sample

Erklärung

Frage 170 von 200

1

Cloud Computing

Wähle eine der folgenden:

Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3
Workflow Engine
not only provides information about how individuals feel, but also the intensity of their feeling
This can reveal the nature of the dataset or the cause of a phenomenon

Erklärung

Frage 171 von 200

1

Big data Mechanisms

Wähle eine oder mehr der folgenden:

Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety
This type of environment is provided by a platform that is comprised of a set of distributed storage and processing technologies
it can be beneficial to identify as many types of related data sources and insights as possible, especially when we dont know exactly what we're looking for
Is solely dedicated to individual user preferences and does not require data about other users

Erklärung

Frage 172 von 200

1

Big data Mechanisms

Wähle eine oder mehr der folgenden:

represents the primary, common components of big data solutions, regardless of the open source or vendor products used for implementation
Storage Device
Instead of coloring the whole region, the map may be superimposed by a layer made up of collections of colored shapes representing various regions
Query Engine

Erklärung

Frage 173 von 200

1

Big data Mechanisms

Wähle eine oder mehr der folgenden:

Processing Engine
Resource Manager
Applications of __________ include operations and logistic optimization, environmental sciences and infrastructure planning
is generally used in data mining to get an understanding of the properties of a given dataset. After developing this understanding, classification can be used to make better predictions about similar, but new or unseen data

Erklärung

Frage 174 von 200

1

Big data Mechanisms

Wähle eine oder mehr der folgenden:

Data Transfer Engine
Analytics Engine
is closely related to parallel data processing in how the same principle of "divide-and-conquer" is applied
The data collected for _______ is always time-dependent

Erklärung

Frage 175 von 200

1

Big data Mechanisms

Wähle eine oder mehr der folgenden:

Workflow Engine
Coordination Engine
Recommender systems may also be based on a hybrid of both collaborative _______ and content-based _______ to fine-tune the accuracy and effectiveness of generated suggestions
There is no prior learning of categories required; intead, categories are implicity generated based on the data groupings

Erklärung

Frage 176 von 200

1

At minimun, any given big data solution needs to contain the _, and _____ mechanism in order to effectively process large datasets in support of the big data analysis lifecycle

Wähle eine der folgenden:

processing engine, storage device, resource manager
storage device, analytics engine, coordination engine
processing engine , query engine, data transfer engine
resource manager, analytics engine, workflow engine

Erklärung

Frage 177 von 200

1

Storage Device

Wähle eine oder mehr der folgenden:

___________ mechanisms provide the underlying data storage environment for persisting the datasets that are processed by big data solutions
A ________ is a method of storing and organizing data on a storage medium, such as hard drives, DVD´s, and flash drives
A _______ can exists as a distibuted file system or a database
The ability to analyze massive amounts of data and find useful insights carries little value if the only ones that can interpret the results are the analysis

Erklärung

Frage 178 von 200

1

Storage Device

Wähle eine oder mehr der folgenden:

Distributed file systems can be used for persisting immutable data that is intended for streaming access or batch processing
Databases, such as NoSQL repositories, can be used for structured and unstructured storage and read/write data access
Note that distributed file systems and databases are both on disk _________ mechanisms
Natural Language Processing

Erklärung

Frage 179 von 200

1

Processing Engine

Wähle eine oder mehr der folgenden:

The ________ mechanisms is responsible for processing data (usually retrieved from storage devices) based on pre-defined logic, in order to produce a result
Any data processing that is requested by the big data solution is fulfilled by the __________
Whether _____________ is required or not, it is important to understand that the same data can be stored in many different forms. One form may be better suited for a particular type of analysis than another
Hadoop's batch-based data processing fully lends itself to the pay -per-use model of __________, which can reduce operational costs since a typical Hadoop cluster size can range from a few to a few thousand nodes

Erklärung

Frage 180 von 200

1

Processing Engine

Wähle eine oder mehr der folgenden:

A big data _________ utilizes a distributed parallel programming framework that enables it to process very large amounts of data distributed across multiple nodes
require processing resources that they request from the resource manager
Classification
are usually used for forecasting by identifying long-term trends, seasonal periodic patterns and irregular short-term variations in the dataset

Erklärung

Frage 181 von 200

1

Batch Processing Engine

Wähle eine der folgenden:

Provides support for batch data where processing tasks can take anywhere from minutes to hours to complete. This type of processing engine is considered to have high latency
The identified patterns, correlations and anormalies discovered during the data analysis are used to refine business processes
When one variable increases, the other also increases and viceversa
Operating systems employ ______ for data storage. Each operating system provides support for one or more ________, like NTFS for windows and ext for linux

Erklärung

Frage 182 von 200

1

Realtime Processing Engine

Wähle eine der folgenden:

Provides support for realtime data with sub-seconds response times. This type of processing engine is considered to have low latency
To the client, a file appears local and can be accessed via multiple locations
Migrating to the cloud is logical for enterprises planning to run analytics on datasets that are available via data markets, as most data markets store their data in the cloud such as Amazon S3
depending on the type of data source, data may come as a dump of files (such as data purchased from a third-party data provider), or may require API integration (such as with Twitter)

Erklärung

Frage 183 von 200

1

Resource Manager

Wähle eine oder mehr der folgenden:

Users of Big Data solutions can make numerous data processing requests, each of which can have different processing workload requirements
Data that is held in storage can be processed in a variety of ways by a given Big Data solutions and all data processing requests require the allocation of processing resources
it involves plotting entities as nodes and connections as edges between nodes
the system is fed data that is already categorized or labeled, so that it can develop an understanding of the different categories

Erklärung

Frage 184 von 200

1

Resource Manager

Wähle eine oder mehr der folgenden:

A _______ acts as a schedules and prioritizes processing requests according to individual processing workload requirements
The _____ essencially acts a resource arbitrator that manages and allocates available resources
A value that is labelled differently in two different datasets may mean the same thing
The proposed cause or assumption is called a ____________

Erklärung

Frage 185 von 200

1

Data Transfer Engine

Wähle eine oder mehr der folgenden:

Data needs to be imported before it can be processed by the big data solution
Similarly, processed data may need to be exported to other systems before it can be used outside of the big data solution
this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language
Text Analytics

Erklärung

Frage 186 von 200

1

Data Transfer Engine

Wähle eine oder mehr der folgenden:

A ________ engine enables data to be moved in or out big data solution storage devices
Unlike other data processing systems where input data conforms to a schema and is mostly structured, data sources for a big data solution tend to include a mix of structured and unstructured data
is dedicated to determining how and where processed analysis data can be further leveraged
A given ______ may support either data ingress or egress functions

Erklärung

Frage 187 von 200

1

Data Transfer ingress and egress

Wähle eine der folgenden:

functionally can be further grouped into the following categories: event, file, relational
the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic
can help enable a better undestanding of what a phenomenon is, and why it occurred
therefore, it is advisable to store a verbatim copy of the original dataset before proceeding with the filtering. To save on required storage space, the verbatim copy is compressed before storage

Erklärung

Frage 188 von 200

1

Data Transfer Engine

Wähle eine oder mehr der folgenden:

A ________ generally provides only one of the listed functions
It is common for multiple diferent ________ to be part a big data solution to facilitate a range of import and export requirements for different types of data
A _______ provides a logical view of the data stored on the storage medium as a tree structure of files and directories
are based on predictive analytics techniques and therefore are associated with the same analysis techniques as predictive analytics. Additionally, _____ may utilize heat maps, network analysis and spatial data analysis to graphically show various outcomes

Erklärung

Frage 189 von 200

1

Data Transfer Ingress Engine

Wähle eine der folgenden:

Event-based __________ generally use a publish-subcribe model based on the use of a queue to ensure high reliability and availability
A file is an atomic unit of storage used by the _________ to stored data. Files are organizated inside of a directory
it can be based on either supervised or unsupervised learning
The data collected for _______ is always time-dependent

Erklärung

Frage 190 von 200

1

Data Transfer Engine

Wähle eine oder mehr der folgenden:

These engines may provide the agent-based processing of inflight data, which enables various data cleasing and transformation activities to be performed in realtime
enable the substitution of data that is distributed across a range of sources residing in multiple systems outside of the big data solution
Heat Maps
is manipulated through a geographical information system (GIS) that plots spatial data on a map generally using its longitude and latitude coordinates

Erklärung

Frage 191 von 200

1

Data Transfer Engine

Wähle eine oder mehr der folgenden:

A _______ may internally use a processing engine to process multiple large datasets in parallel
This allows large amounts of data to be imported or exported within a short period of time
Note that a workflow engine may provide integration with a _______ to enable the automated import and export data
is a computer's ability to comprehend human speech and text as naturally understood by humans

Erklärung

Frage 192 von 200

1

Query Engine

Wähle eine oder mehr der folgenden:

The processing engine enables data to be queried and manipulated in other ways, but to implement this type of functionality requires custom programming
Analysts working with big data solutions are not expected to know how to program processing engines
this form of analysis determines the attitude of the author (of the text) by analyzing the text within the context of the natural language
the extracted information can be used to perform a context-specific search on entities, based on the type of relationship that exists between the entities

Erklärung

Frage 193 von 200

1

Query Engine

Wähle eine oder mehr der folgenden:

The _______ mechanism abstracts the processing engine from end-users by providing a front-end user-interface that can used to query underlying data, along with features for creating query execution plans
Languages that are more familiar and easier to work with (such as SQL) can be used by non-technical users to perform ETL tasks and run ad hoc queries for data analysis activities
this helps maintain data provenance throughout the big data analysis lifecycle, which helps establish and preserve data accuracy and quality
Either way, a method of data reconciliation is required or the dataset representing the correct value needs to be determined

Erklärung

Frage 194 von 200

1

Query Engine

Wähle eine oder mehr der folgenden:

Common processing functions performed by a ______ include sum,average, group by join and sort
Under the hood, the ________ seamlessly transforms user queries into the relevant low-level code that can be used by the processing engine
The use of ________ can reduce development time and enables the manipulation of large datasets without the need to write complex programming logic
based on the business requirements documented, it can be determined whether the business problems being addressed are really Big Data problems

Erklärung

Frage 195 von 200

1

Analytics Engine

Wähle eine oder mehr der folgenden:

The ________ mechanism is able to process advanced statistical and machine learning algorithms in support of analytics processing requirements, including the identification of patterns and correlations
It generally uses the processing engine mechanism to run algorithms on large datasets.
A _______ database generally provides an API-based query interface, rather than the SQL Interface
A ________ generally provides only one of the listed functions

Erklärung

Frage 196 von 200

1

Analytics Engine

Wähle eine oder mehr der folgenden:

An ________ is employed when the comparatively simple data manipulation functions of a query engine are insufficient
Some propietary ________ also provide specialized data analysis features, such as text analytics and machine log analysis processing
How the data is grouped depends on the type of algorithm used. Each algorithm uses a different technique to identify ______
This is a traditional data analysis principle that claims that data held in a reasonably sized dataset provides the maximum value

Erklärung

Frage 197 von 200

1

Workflow Engine

Wähle eine oder mehr der folgenden:

A ___________ provides the ability to design and process a complex sequence of operations that can be triggered either at set time intervals or when data becomes available
The workflow logic processed by a _____________ mechanism can involve the participation of other big data mechanism
Strategic BI and analytics fall in this category, since they are highly read intensive task involving large volumes of data
Law of large numbers

Erklärung

Frage 198 von 200

1

Workflow Engine

Wähle eine oder mehr der folgenden:

For example, a __________ can execute logic that collects relational data from multiple databases at regular intervals via the data transfer engine mechanism, applies a set of ETL operations via the processing engine mechanism and finally persists the results to a NoSQL storage device
The defined workflows are analogous to a flowchart with control logic (such as decisions, forks, joins) and generally rely on a batch-style processing engine for execution
The output of one workflow can become the input of another workflow
it can be based on either supervised or unsupervised learning

Erklärung

Frage 199 von 200

1

Coordination Engine

Wähle eine oder mehr der folgenden:

A distributed Big Data solution that needs to run on multiple servers relies on the coordination engine mechanism to ensure operational consistency across all of the participating servers
make it possible to develop highly reliable, highly available distributed big data solutions that can be deployed in a cluster
A model look like a mathematical equation or a set of rules
data that appears to be invalid may still be valuable in that it may possess hidden patterns and trends

Erklärung

Frage 200 von 200

1

Coordination Engine

Wähle eine oder mehr der folgenden:

the processing engine mechanism will often use the ___________ to coordinate data processing across a large number of servers. This way, the processing engine does not require its own coordination logic
The ________ mechanism can also be used for support distributed locks, support distributed queues, establish a highly available registry for obtaining configuration information, reliable asynchronous communication between processes that are running on different servers
in the case of realtime analytics, the data is analyzed first and then persisted to disk
Big Data solutions require a distibuted processing environment that can accomodate large-scale data volumes, velocity and variety

	Erstellt von Juan Taborda vor mehr als 7 Jahre