Module 11: Advanced Big Data Architecture

Pregunta 1 de 114

1

Operational Data Store (ODS)

Selecciona una de las siguientes respuestas posibles:

As an EDW contains large amounts of data, it is of particular interest when designing an architecture for a Big Data platform. It not only serves as a data source but also as the default interface through which various BI and analysis activities are carried out.
Although a single EDW can house multiple ODSs, because their primary role is to facilitate near-realtime reporting, their use is optional.
On the other hand, Big Data is mostly comprised of unstructured data that has no defined structure. Unless analyzed, the data may not have any value. Big Data analysis requires data to be stored in its raw form without being modeled first. Once collected, the exploratory phase separates signal (valuable data) from noise.
EDWs contain high value data that has gone through rigorous validation and quality control checks

Explicación

Pregunta 2 de 114

1

Enterprise Data Warehouse & Big Data

Selecciona una o más de las siguientes respuestas posibles:

staging area
operational data store (ODS)
data mart
analytical database

Explicación

Pregunta 3 de 114

1

Staging Area

Selecciona una o más de las siguientes respuestas posibles:

Although a single EDW can house multiple ODSs, because their primary role is to facilitate near-realtime reporting, their use is optional.
It may not be possible to extract data from all systems at the same time because of various technical or business-related issues. Due to this, a storage buffer where data extracted from different systems at varying times with differing frequencies can be stored is required
It is generally an insert/read-only database utilizing either shared-nothing MPP architecture or shared-everything architecture. Data is fed from the data warehouse into the analytical database on regular intervals
It usually includes an ETL process that ferries data from source systems into a temporary storage area. This process also contains data cleansing, validation and model transformation operations

Explicación

Pregunta 4 de 114

1

Data Warehouse

Selecciona una o más de las siguientes respuestas posibles:

generally contains recent data. However, the degree of “data freshness” depends upon the reporting requirements. As a result, the range of data stored may span from hours to months
a relational database that acts as the single version of truth for the enterprise by storing standardized data from across the enterprise in a denormalized form that is fit for reporting and data analysis
stores data related to various business entities, such as products or customers. Unlike an OLTP system, data is either inserted or retrieved but not updated in a data warehouse
the queries are generally more complex, involving multiple tables spanning a longer range of data.

Explicación

Pregunta 5 de 114

1

Data Mart

Selecciona una de las siguientes respuestas posibles:

Although the historical data can go back up to several years, the freshness of the current data depends on an enterprise’s reporting and analysis requirements
Some basic level of data model transformation and denormalization may also be performed in support of efficient reporting
provides a particular view on the data held in the data warehouse. Although makes data analysis and reporting easier and faster because the stored data is highly customized according to the specific requirements, it does result in data redundancy.
contains large amounts of data, it is of particular interest when designing an architecture for a Big Data platform

Explicación

Pregunta 6 de 114

1

Analytical Database

Selecciona una o más de las siguientes respuestas posibles:

It is generally an insert/read-only database utilizing either shared-nothing MPP architecture or shared-everything architecture
Data is highly standardized because it has gone through data cleansing, validation, quality and de-duplication processes, further suggesting that the data is of high value
Some basic level of data model transformation and denormalization may also be performed in support of efficient reporting
These are generally expensive and may come bundled with the required hardware and software in the form of an appliance

Explicación

Pregunta 7 de 114

1

EDW & Big Data Comparison

Selecciona una o más de las siguientes respuestas posibles:

Contain high value data that has gone through rigorous validation and quality control checks
On the other hand, Big Data datasets must be stored in their raw unstructured forms, and their values are unknown
Big Data requires a repository that acts as a sink for a variety of data sources where data is stored as is
Stores data related to various business entities, such as products or customers

Explicación

Pregunta 8 de 114

1

EDW & Big Data Integration

Selecciona una o más de las siguientes respuestas posibles:

Big Data requires a distributed and highly scalable storage and processing architecture with scale-out support
Most implementations of the Big Data appliance enable realtime and near-realtime analytics without the need for integrating multiple disparate technologies
A batch processing engine, such as MapReduce, can be used to convert semi- and unstructured data into meaningful structured data
The next-generation data warehouse consists of heterogeneous technologies providing support for structured as well as semi- and unstructured data storage and analysis

Explicación

Pregunta 9 de 114

1

Series Approach

Selecciona una o más de las siguientes respuestas posibles:

The introduction of the Big Data platform in this configuration is comparatively less disruptive because the Big Data platform is essentially an add-on module for processing semi- and unstructured data
Provides a highly scalable data storage and processing environment
BI tools and other analytical applications are unable to make use of the Big Data platform directly
The implementation and maintenance of the interconnect can become complex if it incorporates complicated data processing, such as translation between different data types

Explicación

Pregunta 10 de 114

1

Big Data Appliance Approach

Selecciona una o más de las siguientes respuestas posibles:

relational and non-relational storage
configuration, management and application development environments
an interconnect (between data storage and processing resources)
is analogous to the parallel approach and is also known as the logical data warehouse

Explicación

Pregunta 11 de 114

1

Data Virtualization Approach

Selecciona una o más de las siguientes respuestas posibles:

It requires complex initial configuration, which usually results in consultation costs
It is generally implemented as Data-as-a-Service (DaaS) by applying service-orientation principles.
This approach makes non-relational data (Big Data datasets) more accessible through the use of standardized interfaces
Is generally implemented through complex software that can be expensive to acquire

Explicación

Pregunta 12 de 114

1

To reduce storage cost and speed up operational reporting, an online transaction processing system (OLTP) can be replaced with an operational data store (ODS).

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 13 de 114

1

In a data warehouse, data is kept in a fully normalized form for easier reporting

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 14 de 114

1

When compared with an ODS, a data warehouse’s queries are generally more complex, involving multiple tables spanning over a longer range of data. However, data import is less frequent because a data warehouse is not used for operational reporting

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 15 de 114

1

An analytical database can either be based on a columnar database or in-memory solutions for fast data access

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 16 de 114

1

To obtain the benefits linked with the adoption of Big Data, an EDW needs to be replaced with Big Data-specific technologies since the EDW cannot store unstructured data

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 17 de 114

1

The next-generation data warehouse consists of Big Data storage technologies that can store large amounts of structured as well as unstructured data

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 18 de 114

1

In a Big Data environment, the query workloads are generally unknown because of the adhoc nature of analytical queries

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 19 de 114

1

In the series approach of EDW and Big Data integration, semi-structured and unstructured data is ingested by the Big Data platform, and only structured data is ingested by the EDW

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 20 de 114

1

One disadvantage of the series approach is that the Big Data platform cannot be directly accessed for performing analysis on large amounts of raw data

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 21 de 114

1

In the parallel approach of EDW and Big Data integration, the interconnect is a one-way connector between the EDW and the Big Data platform

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 22 de 114

1

One of the disadvantages of the Big Data appliance is that it does not provide horizontal scalability since it is a boxed solution

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 23 de 114

1

The Big Data appliance approach makes on-going system maintenance easier because this approach combines the EDW and the Big Data platform into a single preconfigured system

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 24 de 114

1

The data virtualization approach is also known as the logical data warehouse

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 25 de 114

1

The data virtualization approach uses an interconnect to provide a unified view of data across multiple data sources.

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 26 de 114

1

One of the disadvantages of the virtualization approach is that data from all data sources still needs to be copied over into a central repository in order to create the required services

Selecciona uno de los siguientes:

VERDADERO
FALSO

Explicación

Pregunta 27 de 114

1

Big Data & Cloud Computing

Selecciona una o más de las siguientes respuestas posibles:

Can be utilized as a technology-enabler for Big Data under such circumstances
Ingested data is stored in a distributed file
A single dataset may be of interest to multiple clients developed using different technologies that require data to be available in a specific format
Specialized form of distributed computing that introduces utilization models for remotely provisioning scalable and measured IT resources

Explicación

Pregunta 28 de 114

1

Big Data and Cloud Computing

Selecciona una o más de las siguientes respuestas posibles:

Processing and storage technologies that use cluster-based processing and storage resources
The on-demand and elastic nature provides the ability for a much quicker setup of a Big Data platform
Has the potential to provide the basic components for a Big Data solution environment, including data, storage and processing resources
Whether processing data in batch or realtime mode, the pay-per-use model can be fully utilized to build a cluster whose size can be regulated based on the volume and velocity characteristics of Big Data

Explicación

Pregunta 29 de 114

1

Cloud Delivery Models

Selecciona una o más de las siguientes respuestas posibles:

Infrastructure-as-a-Service (IaaS)
Platform-as-a-Service (PaaS)
Software-as-a-Service (SaaS)
Component-as-a-Service (CaaS)

Explicación

Pregunta 30 de 114

1

Cloud Deployment Model

Selecciona una o más de las siguientes respuestas posibles:

Heterogeneous Cloud
Private Cloud
Managed Cloud
Hybrid Cloud

Explicación

Pregunta 31 de 114

1

Public Cloud

Selecciona una o más de las siguientes respuestas posibles:

Is ideal for enterprises that initially built up Big Data analytics in-house but now want to scale out.
Can be used when input datasets are already stored in the cloud
Is generally less secure but more scalable due to larger pooling of storage and processing resources
It is also ideal when datasets reside within an enterprise’s firewall.

Explicación

Pregunta 32 de 114

1

Private Cloud

Selecciona una o más de las siguientes respuestas posibles:

It is also ideal when workloads vary
Is generally less secure but more scalable due to larger pooling of storage and processing resources
It is also ideal when datasets reside within an enterprise’s firewall
Can help develop low latency data analysis capabilities

Explicación

Pregunta 33 de 114

1

Hybrid Cloud

Selecciona una o más de las siguientes respuestas posibles:

It is also ideal when workloads vary
Can be used when input datasets are already stored in the cloud
Is a suitable choice when starting a Big Data project
is a suitable choice when using a combination of sensitive data and public datasets

Explicación

Pregunta 34 de 114

1

Big Data and Cloud Computing Issues

Selecciona una o más de las siguientes respuestas posibles:

data privacy
regulatory compliance
network connectivity
data virtualization

Explicación

Pregunta 35 de 114

1

Cloud-Related Big Data Patterns

Selecciona una o más de las siguientes respuestas posibles:

Cloud-based Big Data Analysis
Cloud-based Big Data Visualization
Cloud-based Big Data Storage
Cloud-based Big Data Processing

Explicación

Pregunta 36 de 114

1

Cloud-based Big Data Storage

Selecciona una o más de las siguientes respuestas posibles:

This pattern can also be employed when the data sources, such as the CRM system, reside in the same cloud (faster data transfer) or a proof-of-concept is being developed
This ability to store raw data spanning over longer periods of time increases the overall potential of finding valuable insights
Represents a solution environment comprised of inexpensive NoSQL storage
Is associated with the storage device (distributed file system/NoSQL) and data transfer engine mechanisms

Explicación

Pregunta 37 de 114

1

Data Transformation Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

This generally involves the use of NoSQL databases such that the downstream applications can communicate directly with these databases using RESTful APIs
The underlying idea is to be able to ingest large amounts of raw data and pre-process it in order to make it suitable for traditional enterprise systems
Keeping multiple copies of the same dataset in different formats is not only inefficient but also adds operational and storage overheads
The involved operations can include data cleansing, validation, model transformation and format transformation, as well as the joining of disparate datasets

Explicación

Pregunta 38 de 114

1

Data Transformation Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Poly Source
Large-Scale Batch Processing
High Volume Tabular Storage
Large-Scale Graph Processing

Explicación

Pregunta 39 de 114

1

Application Enhancement Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Ingesting large amounts of data in order to calculate certain statistics or execute a machine learning and then to feed results to enterprise systems
This generally involves the use of NoSQL databases such that the downstream applications can communicate directly with these databases using RESTful APIs
The underlying idea is to be able to ingest large amounts of raw data and pre-process it in order to make it suitable for traditional enterprise systems
A dedicated storage layer helps store, pre-process and further integrate data with structured data without impacting the current storage infrastructure

Explicación

Pregunta 40 de 114

1

Application Enhancement Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

High Volume Tabular Storage
Large-Scale Graph Processing
Canonical Data Format
Data Size Reduction

Explicación

Pregunta 41 de 114

1

Canonical Data FormatPattern

Selecciona una de las siguientes respuestas posibles:

Warrants the use of a memory-based storage device with random read and write capability.
Keeping multiple copies of the same dataset in different formats is not only inefficient but also adds operational and storage overheads
A separate connector is used to connect to a particular query engine or the storage device
The ingested data is stored to the distributed file system, where it is enriched via batch processing and then stored on a NoSQL database

Explicación

Pregunta 42 de 114

1

Realtime Access Storage Pattern

Selecciona una de las siguientes respuestas posibles:

Ingested data is stored to the distributed file system, where it is enriched via batch processing and then stored on a NoSQL database
Exporting the data in the form of a file, importing it into a database and then connecting the analytics tool to the database is not a viable option
Is associated with the serialization engine, data transfer engine, storage device and processing engine mechanisms
The use of disk-based storage devices can severely impact the processing time of data

Explicación

Pregunta 43 de 114

1

Direct Data Access Pattern

Selecciona una de las siguientes respuestas posibles:

Greatly helps in speeding up data analysis and reduces dependence on IT personnel for data analysis tasks
Incurs increased cost because memory-based storage devices are expensive when compared with disk-based storage devices
Keeping multiple copies of the same dataset in different formats is not only inefficient but also adds operational and storage overheads
Is generally employed by enterprises that have just embarked on a Big Data journey

Explicación

Pregunta 44 de 114

1

Analytical Sandbox Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

The results are fed directly to various downstream applications, such as an e-commerce application
Is generally employed by enterprises that have just embarked on a Big Data journey
Represents a standalone solution environment
Offloads existing databases from having to perform complex and long-running data transformation jobs on large datasets

Explicación

Pregunta 45 de 114

1

Analytical Sandbox Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Poly Storage
Poly Source
Poly Sink
Confidential Data Storage

Explicación

Pregunta 46 de 114

1

Confidential Data Storage Pattern

Selecciona una de las siguientes respuestas posibles:

In the case of a clustering algorithm applied to a customer dataset for finding customer cohorts
Is generally opted for by enterprises that want to move towards predictive and prescriptive analytics by creating richer statistical and machine learning models
Can be applied in such a case to ensure that even if malicious users get access to sensitive data, they are unable to read and make use of it
This approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues

Explicación

Pregunta 47 de 114

1

Large-Scale Graph ProcessingPattern

Selecciona una de las siguientes respuestas posibles:

A dedicated storage layer helps store, pre-process and further integrate data with structured data without impacting the current storage infrastructure
It involves traversing through a large number of nodes (entities) via their defined edges (links).
This approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues
Storing and analyzing very large amounts of structured, unstructured and semi-structured Big Data datasets

Explicación

Pregunta 48 de 114

1

Unstructured Data Store Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

The analytical operations performed in support of BI, data mining and creating statistical and machine learning models do not affect the performance
This configuration is generally opted for by enterprises that want to move towards predictive and prescriptive analytics by creating richer statistical and machine learning models
Capable of ingesting and storing large amounts of semi-structured and unstructured data to develop highfidelity statistical and machine learning models for performing predictive and prescriptive analytics
Although analogous to the use of a cloud, this approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues

Explicación

Pregunta 49 de 114

1

Unstructured Data Store Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Random Access Storage
Automated Dataset Execution
File-based Sink
Big Data Processing Environment.

Explicación

Pregunta 50 de 114

1

Batch Data Processing Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

The ingested data is stored to the distributed file system, where it is enriched via batch processing and then stored on a NoSQL database for performing analytical queries
Their current storage infrastructure does not allow them to store semi-structured and unstructured data
A solution environment where the sole purpose of using the Big Data platform is to offload processing of large amounts of structured data
This approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues

Explicación

Pregunta 51 de 114

1

Batch Data Processing Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Canonical Data Format
Relational Sink
Automatic Data Replication and Reconstruction
Automatic Data Sharding

Explicación

Pregunta 52 de 114

1

Dataset DenormalizationPattern

Selecciona una de las siguientes respuestas posibles:

Requires exporting data via a relational data transfer engine to the data warehouse
Can be applied in such a case to ensure that even if malicious users get access to sensitive data
Is a solution environment comprised of inexpensive storage used to store large amounts of data from both internal and external data sources in an online fashion ready for consumption by any enterprise system
Enable the processing of datasets, which requires the use of a batch processing engine

Explicación

Pregunta 53 de 114

1

Online Data Repository Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Storing and analyzing very large amounts of structured, unstructured and semi-structured Big Data datasets
A solution environment comprised of inexpensive storage used to store large amounts of data from both internal and external data sources in an online fashion ready for consumption by any enterprise system
Large data volumes are available and the data itself has not lost its value because it is kept unprocessed in its raw form
The sole purpose of using the Big Data platform is to offload processing of large amounts of structured data

Explicación

Pregunta 54 de 114

1

Online Data Repository Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Automated Dataset Execution
Streaming Access Storage
Random Access Storage
Canonical Data Format

Explicación

Pregunta 55 de 114

1

Big Data Warehouse Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

This configuration is generally opted for by enterprises that want to move towards predictive and prescriptive analytics by creating richer statistical and machine learning models
Large data volumes are available and the data itself has not lost its value because it is kept unprocessed in its raw form
Storing and analyzing very large amounts of structured, unstructured and semi-structured Big Data datasets
Data from structured sources and from unstructured sources can first be stored on a distributed file system

Explicación

Pregunta 56 de 114

1

Big Data Warehouse Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Automatic Data Sharding
Canonical Data Format
Random Access Storage
Confidential Data Storage

Explicación

Pregunta 57 de 114

1

Operational Data Store Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

a solution environment comprised of inexpensive NoSQL storage that is utilized as ___________ where large amounts of transactional data from operational systems across the enterprise are collected for operational BI and reporting
Data from structured sources and from unstructured sources can first be stored on a distributed file system
Large data volumes are available and the data itself has not lost its value because it is kept unprocessed in its raw form
Larger amounts of data that spreads over longer time periods can be stored, thereby providing the opportunity to enrich operational BI

Explicación

Pregunta 58 de 114

1

Operational Data Store Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

High Volume Tabular Storage
Relational Sink
Indirect Data Access
Automated Dataset Execution

Explicación

Pregunta 59 de 114

1

Indirect Data Access Pattern

Selecciona una de las siguientes respuestas posibles:

The data can be imported into fit-forpurpose NoSQL databases, where it can be easily accessed in support of BI, reporting and other analytical use cases
Enable access to pre-processed data or analysis results stored in a Big Data solution environment via existing BI tools
A solution environment comprised of inexpensive NoSQL storage
Enable the processing of such datasets, which requires the use of a batch processing engine

Explicación

Pregunta 60 de 114

1

Realtime Data Processing Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

The data can be imported into fit-forpurpose NoSQL databases, where it can be easily accessed in support of BI, reporting and other analytical use cases
A solution environment capable of processing streams of data in realtime or near-realtime, such as performing analytics on machine-generated or social media data
The streaming data can be stored in disk-based storage, such as the distributed file system, for further analysis
Enable the processing of such datasets, which requires the use of a batch processing engine

Explicación

Pregunta 61 de 114

1

Realtime Data Processing Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Large-Scale Batch Processing
Streaming Source
Automatic Data Replication and Reconstruction
Data Size Reduction

Explicación

Pregunta 62 de 114

1

High Velocity Realtime ProcessingPattern

Selecciona una de las siguientes respuestas posibles:

Enable the immediate export of results
Scenarios where the data needs processing as it arrives to obtain immediate results
A solution environment capable of processing streams of data in realtime or near-realtime
Enable access to pre-processed data or analysis results stored in a Big Data solution environment via existing BI tools

Explicación

Pregunta 63 de 114

1

Streaming Egress Pattern

Selecciona una de las siguientes respuestas posibles:

Storing high-volume and high-variety data in order to perform various analytics in isolation from other enterprise systems
Data needs processing as it arrives to obtain immediate results
Provide integration with the enterprise identity and access management systems (IAMs)
Enable the immediate export of results

Explicación

Pregunta 64 de 114

1

Additional Big Data Patterns

Selecciona una o más de las siguientes respuestas posibles:

Centralized Dataset Governance
Fan-in Ingress
Centralized Dataset Management
Streaming Egress Pattern

Explicación

Pregunta 65 de 114

1

Centralized Access ManagementPattern

Selecciona una de las siguientes respuestas posibles:

Provides a means for performing a range of data governance tasks from a central location
Provide integration with the enterprise identity and access management systems (IAMs)
Maintain data lineage and details about operations performed on the data across multiple processing stages
Enable policy-based access to resources within the Big Data platform via a central interface

Explicación

Pregunta 66 de 114

1

Integrated Access Pattern

Selecciona una de las siguientes respuestas posibles:

Provides a means for performing a range of data governance tasks from a central location
Enable policy-based access to resources within the Big Data platform via a central interface
Can be used to provide integration with the enterprise identity and access management systems (IAMs)
Is associated with the processing engine, storage device, query engine and productivity portal mechanisms

Explicación

Pregunta 67 de 114

1

Centralized Dataset Governance Pattern

Selecciona una de las siguientes respuestas posibles:

A security engine is used to enable single sign-on (SSO) functionality that generally works on the basis of trusting the IAM system for user authentication via the use of tokens
Provides a means for performing a range of data governance tasks from a central location
In order to have maximum confidence in the processing results, there needs to be a way to retrace the processing steps that were taken
Data merging may be required due to reasons such as the data is too fine-grained or arrives out of order, due to network latency or due to factors that are beyond the control of the enterprise

Explicación

Pregunta 68 de 114

1

Automated Processing Metadata Insertion Pattern

Selecciona una de las siguientes respuestas posibles:

Data merging may be required due to reasons such as the data is too fine-grained or arrives out of order, due to network latency or due to factors that are beyond the control of the enterprise
Can be applied to maintain data lineage and details about operations performed on the data across multiple processing stages
Intermediate output from each stage is persisted temporarily to a storage device until the final result is computed and validated
If the final results are incorrect, the entire series of steps need to be executed from scratch even if the results halfway were correct

Explicación

Pregunta 69 de 114

1

Intermediate Results Storage Pattern

Selecciona una de las siguientes respuestas posibles:

Intermediate output from each stage is persisted temporarily to a storage device until the final result is computed and validated
Can be applied to maintain data lineage and details about operations performed on the data across multiple processing stages
In order to have maximum confidence in the processing results, there needs to be a way to retrace the processing steps that were taken
Data needs to be simultaneously processed using different sub-systems

Explicación

Pregunta 70 de 114

1

Fan-in IngressPattern

Selecciona una de las siguientes respuestas posibles:

The application of this design pattern requires the automated addition of metadata, based on a machine-readable standardized structure, during each stage of data processing
Provides scalability in the context of being able to add more data consumers via a simple configuration
Is applied when data needs to be simultaneously processed using different sub-system
Can be applied to implement logic that merges data originating from multiple sources and generally applies to situations where data is acquired in realtime

Explicación

Pregunta 71 de 114

1

Fan-out Ingress Pattern

Selecciona una de las siguientes respuestas posibles:

Intermediate output from each stage is persisted temporarily to a storage device until the final result is computed and validated
Is applied when data needs to be simultaneously processed using different sub-systems
Maintain data lineage and details about operations performed on the data across multiple processing stages
Data is copied from the source location, stored in the queue and then forwarded to the interested subscribers

Explicación

Pregunta 72 de 114

1

John wants to perform predictive analytics using a variety of textual log files. However, the current data storage infrastructure consists of relational database technologies. John accomplishes his goal by storing and pre-processing the log files without affecting current storage. Which compound pattern did John apply?

Selecciona una de las siguientes respuestas posibles:

Online Data Repository
Unstructured Data Store
Big Data Warehouse
Operational Data Store

Explicación

Pregunta 73 de 114

1

Each day ABC’s head office receives a large number of reports from each of its branches across the world. Performance data is extracted from these reports and then imported into the enterprise data warehouse, from where it is used for various reporting tasks. The reports are in XML format and are currently coerced into a relational database and then a utility is run to perform data cleansing and extraction of the required data. The entire process of ingesting and loading into the data warehouse takes a long time, and with the reports getting more detailed, it is anticipated that timely processing of reports may not be possible. Which compound pattern can be applied to address the processing of the XML reports without requiring a staging database?

Selecciona una de las siguientes respuestas posibles:

Analytical Sandbox
Unstructured Data Store
Data Transformation
Big Data Warehouse

Explicación

Pregunta 74 de 114

1

XYZ is enhancing its analytical capabilities by capturing large amounts of structured and unstructured data across the enterprise and enabling its data scientists to perform advanced analytics. However, the Big Data architects have been advised that doing so should not impact the current operations of the enterprise data warehouse and that any required technology infrastructure should be kept separate with respect to the current IT environment. Which compound pattern should the Big Data architects apply for setting up the required Big Data platform?

Selecciona una de las siguientes respuestas posibles:

Batch Data Processing
Operational Data Store
Analytical Sandbox
Online Data Repository

Explicación

Pregunta 75 de 114

1

A large online bookstore currently recommends a random array of books on its website to its potential customers. However, it is planning to display personalized recommendations to its customers based on a profile match and the kinds of books they have bought in the past. This process involves ingesting a large amount of customer profile data from the CRM system, joining it with customer’s shopping history and then applying a machine learning algorithm. The generated results are then embedded on the webpage that the customer is browsing. Which compound pattern can be applied to implement the required solution?

Selecciona una de las siguientes respuestas posibles:

Big Data Warehouse
Online Data Repository
Application Enhancement
Realtime Data Processing

Explicación

Pregunta 76 de 114

1

A large cellular company is improving its monthly billing process by introducing itemized billing. However, with more than 5 million customers, it takes a long time to complete the simple process. The company anticipates that the new feature will take twice the current time. Davon, a Big Data architect, proposes a Big Data technologies-based solution that accomplishes the new itemized billing process quickly. Which compound pattern will Davon apply to complete the task?

Selecciona una de las siguientes respuestas posibles:

Application Enhancement
Online Data Repository
Batch Data Processing
Realtime Data Processing

Explicación

Pregunta 77 de 114

1

A renowned car manufacturer, XYZ, has modernized its manufacturing facility by adding a number of sensors across the assembly line. Each sensor provides a reading every 5 seconds. XYZ needs to monitor the readings transmitted by each of the sensors as soon as they are transmitted. The monitoring process involves a comparison of related groups of sensor readings to make sure that the readings fall within a predetermined range. Which compound pattern can be applied to achieve the desired result?

Selecciona una de las siguientes respuestas posibles:

Realtime Data Processing
Batch Data Processing
Data Transformation
Analytical Sandbox

Explicación

Pregunta 78 de 114

1

The data scientists at ABC often require access to historical data, going back as far ten years, in its raw form for various data analyses. Jackie, the Big Data architect, needs to provide the required data in such a way that the data can be retrieved without any delays. In which configuration should Jackie deploy the Big Data platform?

Selecciona una de las siguientes respuestas posibles:

Data Transformation
Application Enhancement
Big Data Warehouse
Online Data Repository

Explicación

Pregunta 79 de 114

1

The business intelligence team at a large retail store has been asked to integrate weekly sales figures into a dashboard that currently displays daily sales figures. The team notices that the current operational data store used for generating the daily sales figures is already operating at its maximum storage capacity. In which configuration can the team implement a solution when using a Big Data platform?

Selecciona una de las siguientes respuestas posibles:

Big Data Warehouse
Batch Data Processing
Operational Data Store
Analytical Sandbox

Explicación

Pregunta 80 de 114

1

A small toy manufacturer, ABC, has seen a steady growth in the past 5 years. ABC’s current IT landscape consists of an ERP and a CRM system. Both the systems are Open Sourcebased, as ABC can only spare a limited amount of budget for IT. Sales are monitored by generating month-end reports by executing queries again with the ERP and the CRM. However, these reports only go back as far as 6 months, as older data is archived to a tape drive. Which compound pattern can be applied that enables ABC to keep a large amount of transactional data online, from which detailed sales reports can be generated more frequently?

Selecciona una de las siguientes respuestas posibles:

Online Data Repository
Operational Data Store
Big Data Warehouse
Unstructured Data Store

Explicación

Pregunta 81 de 114

1

Lambda Architecture

Selecciona una o más de las siguientes respuestas posibles:

The sole purpose of using this kind of platform is to offload processing of large amounts of structured data
Type of Big Data solution architecture that is comprised of multiple layers and forms the basis for developing highly scalable, available, eventually consistent, fault tolerant and low latency realtime Big Data solutions
Uses a combination of both realtime and batch components that operate in parallel to process data without any delay
Additional processing is generally required to put the data in the correct structure

Explicación

Pregunta 82 de 114

1

Lambda Architecture Terminology

Selecciona una o más de las siguientes respuestas posibles:

View
Model
Indexed View
Indexing

Explicación

Pregunta 83 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( Normalization, Denormalization, Polyglot Persistence, CAP ) is the process of storing data in a form that removes data duplication and ensures consistency

Explicación

Pregunta 84 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( Polyglot Persistence, CAP, SCV, Denormalization ) is the process of storing data in a form that introduces redundancy for faster querying

Explicación

Pregunta 85 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( CAP, Polyglot persistence, SCV, Recomputation Algorithm ) is the practice of using more than one fit-for-purpose storage device for persisting data

Explicación

Pregunta 86 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( Recomputation Algorithm, SCV, CAP, Incremental/Approximate Algorithm ) is a theorem that states a distributed storage system is only able to support two of the following constraints at any point in time: consistency, availability and partition-tolerance

Explicación

Pregunta 87 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( SCV, Recomputation Algorithm, Incremental/Approximate Algorithm, Sharding ) is a principle that states that a processing system is only capable of supporting two of the following: speed, consistency and volume at any point in time

Explicación

Pregunta 88 de 114

1

The ❌ is an algorithm that processes the complete dataset to generate the result

Arrastra y suelta para completar el texto.

Incremental/Approximate Algorithm

Sharding

Replication

recomputation algorithm

Explicación

Pregunta 89 de 114

1

The ❌ is an algorithm that only processes new data. It may use probability-based techniques and may generate results that are not fully reliable/accurate

Arrastra y suelta para completar el texto.

Replication

Sharding

Recomputation Algorithm

incremental algorithm

Explicación

Pregunta 90 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( Replication, Denormalization, Sharding, Normalization ) is a method of achieving scalability by horizontally partitioning a large dataset across multiple nodes

Explicación

Pregunta 91 de 114

1

Selecciona la opción correcta del menú desplegable para completar el texto.

( Sharding, SCV, Replication, Denormalization ) is a method of achieving fault-tolerance by storing multiple copies of a dataset across multiple nodes

Explicación

Pregunta 92 de 114

1

Purpose of the Lambda Architecture

Selecciona una o más de las siguientes respuestas posibles:

This not only helps process voluminous data faster but also helps cater to infrequent or ad-hoc data processing requests that require above-average storage and processing resources
Data architectures are becoming difficult to design and maintain due to the ever-increasing volume, velocity and variety of data.
Efficient data storage and efficient querying have incompatible requirements that require following different strategies
Data is either stored in a disk-based NoSQL or a memory-based storage device, which can be a NoSQL or some other cluster-based storage technology, that enables low latency data access to perform realtime or near-realtime analytics

Explicación

Pregunta 93 de 114

1

Lambda Architecture Characteristics

Selecciona una o más de las siguientes respuestas posibles:

Processes raw data by employing both realtime and batch data processing techniques in parallel
Maintain data lineage and details about operations performed on the data across multiple processing stages
The results generated by realtime processing are based on incremental algorithms that may not be consistent/accurate
Batch data processing eliminates the complexity of maintaining data consistency across nodes by storing only immutable data

Explicación

Pregunta 94 de 114

1

Lambda Architecture Layers

Selecciona una o más de las siguientes respuestas posibles:

Batch
Serving
Speed
Query

Explicación

Pregunta 95 de 114

1

Batch Layer

Selecciona una o más de las siguientes respuestas posibles:

Processing of raw data
Storage of raw data
Ad hoc reporting
Calculation of views

Explicación

Pregunta 96 de 114

1

Lambda Architecture Batch Layer

Selecciona una o más de las siguientes respuestas posibles:

Uses incremental algorithms and processes comparatively smaller amounts of data to provide low latency results
Consists of a storage device (distributed file system), batch processing engine and a workflow engine
Uses a recomputation algorithm to provide consistent accurate views and further provides fault-tolerance when compared with an incremental algorithm
Comprises an enhanced version of the query engine with logic that can automatically and intelligently combine serving and speed views based on the query criteria

Explicación

Pregunta 97 de 114

1

Lambda Architecture Serving Layer

Selecciona una o más de las siguientes respuestas posibles:

Although raw data is stored, for achieving consistency, some structure needs to be applied to the data before storage
The storage device used in this layer only needs to support batch write (no random write) with random read capabilities
As the layer follows the mutable storage model and the processing results are generated more frequently, the storage device that stores the views needs to support random writes with random reads
For keeping the complexity to a minimum and providing faster reads, normally a simple key-value NoSQL database is used

Explicación

Pregunta 98 de 114

1

Lambda Architecture Speed Layer

Selecciona una o más de las siguientes respuestas posibles:

The use of an append-only and streaming data storage device keeps complexity to a minimum
The views created by the batch layer are not amenable to random querying, as these are generally stored in the distributed file system
A memory-based storage device for the storage of raw data and a memory or disk-based NoSQL storage device for the storage of views is generally used
Event data is captured using the event data transfer engine and is processed in memory via the realtime processing engine to create indexed views that are generally stored inside a NoSQL database

Explicación

Pregunta 99 de 114

1

Lambda Architecture Query Layer

Selecciona una o más de las siguientes respuestas posibles:

For easier integration, the speed and serving views should be constructed in a modular manner
Merging the results from views residing in the speed and serving layers for successfully executing a query
Once the latest batch view is available via the serving layer, the corresponding results in the realtime views can be ignored or flushed
Is a high latency layer such that there is a time lag before the latest version of the views, based on fresher data, is available

Explicación

Pregunta 100 de 114

1

Lambda Architecture Layers in Action

Selecciona una o más de las siguientes respuestas posibles:

Raw data is fed simultaneously to the batch and speed layers, generally using the same event data transfer engine
The batch layer can be further used for deep analytics, as it contains complete datasets
The limitations of the SCV principle are also relaxed
Although the speed layer is responsible for processing the entire set of fresh data while the corresponding batch view is not ready, it does not process the entire set as a single job because doing so adds to the latency and results in excessive resource usage

Explicación

Pregunta 101 de 114

1

Lambda Architecture Benefits

Selecciona una o más de las siguientes respuestas posibles:

Algorithms for the speed layer can be complex or might need some time to understand, as they use incremental or approximation (probability)-based techniques that the batch equivalent may not be using
The complexity of the architecture is restricted to the speed layer, as that is where the incremental algorithms and read/write database are used
The immutable nature of the batch layer helps re-process data as a result of a data processing logic change that may occur due to new business requirements or a bug fix
Realtime data processing capability is required with consistent results

Explicación

Pregunta 102 de 114

1

Lambda Architecture Applicability

Selecciona una o más de las siguientes respuestas posibles:

Realtime data processing capability is required with consistent results
Fault-tolerance and accuracy need to be added to the existing realtime system
Loss of data is not acceptable
Polyglot persistence by employing fit-for-purpose storage devices at each layer

Explicación

Pregunta 103 de 114

1

Lambda Architecture Limitations/Challenges

Selecciona una o más de las siguientes respuestas posibles:

Configuring the batch layer to process data in small batches reduces load on the speed layer
Raw data is fed simultaneously to the batch and speed layers, generally using the same event data transfer engine, and each layer can be implemented via a different set of technologies
Complexity is greatly increased, as two separate layers need building and maintaining while ensuring that each provides the same functionality
Requires schema adherence in the batch layer, which adds complexity, adds another step before data can actually be persisted and requires prior knowledge about the structure of the incoming data

Explicación

Pregunta 104 de 114

1

Lambda Architecture Recommendations

Selecciona una o más de las siguientes respuestas posibles:

Employing the same processing engine for both the speed and batch layer, such as Spark, helps keep system complexity to a minimum
The key-value storage model employed in the serving layer may not be sufficient for all types of query requirements
The immutable nature of the batch layer helps re-process data as a result of a data processing logic change that may occur due to new business requirements or a bug fix
A balance is required based on the processing requirements, as the throughput obtained from employing small batches may be less than from larger batches and will further require frequent updates to the serving layer

Explicación

Pregunta 105 de 114

1

In Lambda architecture, which layer(s) is/are responsible for creating indexed views?

Selecciona una o más de las siguientes respuestas posibles:

batch layer
serving layer
speed layer
query layer

Explicación

Pregunta 106 de 114

1

In Lambda CAP is a theorem that applies to

Selecciona una de las siguientes respuestas posibles:

Cloud computing
Distributed storage system
Processing system
EDW

Explicación

Pregunta 107 de 114

1

In Lambda SCV is a theorem that applies to

Selecciona una de las siguientes respuestas posibles:

Storage devices
Distributed Storage System
Cloud computing
Processing System

Explicación

Pregunta 108 de 114

1

Data Transformation Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

A dedicated storage layer helps store, pre-process and further integrate data with structured data without impacting the current storage infrastructure
The underlying idea is to be able to ingest large amounts of raw data and pre-process it in order to make it suitable for traditional enterprise systems
Is ideal for enriching the EDW with unstructured data
This generally involves the use of NoSQL databases such that the downstream applications can communicate directly with these databases using RESTful APIs

Explicación

Pregunta 109 de 114

1

Application Enhancement Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

A dedicated storage layer helps store, pre-process and further integrate data with structured data without impacting the current storage infrastructure
Certain statistics are calculated by processing large amounts of data, or a statistical/machine learning model is run
Solution environment capable of storing high-volume and high-variety data in order to perform various analytics in isolation from other enterprise systems
Examples of functionality enhancement include personalized recommendations and discounts as well as targeted advertisements

Explicación

Pregunta 110 de 114

1

Analytical Sandbox Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Although analogous to the use of a cloud, this approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues
The underlying idea is to be able to ingest large amounts of raw data and pre-process it in order to make it suitable for traditional enterprise systems
Is not integrated with the EDW and is instead used directly to explore data and perform analytics
Keep the Big Data initiative separate from existing IT operations and systems

Explicación

Pregunta 111 de 114

1

Unstructured Data Store Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

This configuration is generally opted for by enterprises that want to move towards predictive and prescriptive analytics
Although analogous to the use of a cloud, this approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues
A dedicated storage layer helps store, pre-process and further integrate data with structured data without impacting the current storage infrastructure
Generally, the ingested data is stored to the distributed file system, where it is enriched via batch processing and then stored on a NoSQL database for performing analytical queries

Explicación

Pregunta 112 de 114

1

Batch Data Processing Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

Such a solution is generally employed by enterprises that have just embarked on a Big Data journey
Once processed, the streaming data can be stored in disk-based storage, such as the distributed file system, for further analysis
This not only helps process voluminous data faster but also helps cater to infrequent or ad-hoc data processing requests
Although analogous to the use of a cloud, this approach provides a better alternative in terms of uploading data to the cloud as well as data security and privacy issues

Explicación

Pregunta 113 de 114

1

Operational Data Store Compound Pattern

Selecciona una o más de las siguientes respuestas posibles:

The data can be imported into fit-forpurpose NoSQL databases, where it can be easily accessed in support of BI
Large data volumes are available and the data itself has not lost its value because it is kept unprocessed in its raw form
Based on the data storage requirements, a distributed file system or a NoSQL database can be used for data storage
Large amounts of transactional data from operational systems across the enterprise are collected

Explicación

Pregunta 114 de 114

1

Cloud-based Big Data Processing

Selecciona una o más de las siguientes respuestas posibles:

This not only helps process voluminous data faster but also helps cater to infrequent or ad-hoc data processing requests that require above-average storage and processing resources
Setting up a cluster in-house may result in under-utilization of processing resources, as it would not be utilized at all times
Is associated with the processing engine, storage device, resource manager and coordination engine mechanisms
Enable the processing of such datasets, which requires the use of a batch processing engine

	Creado por Alveiro Garcia hace más de 8 años

Module 11: Fundamental Big Data Architecture

Module 11: Advanced Big Data Architecture

Pregunta 1 de 114

Operational Data Store (ODS)

Selecciona una de las siguientes respuestas posibles:

Explicación

Pregunta 2 de 114

Enterprise Data Warehouse & Big Data

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 3 de 114

Staging Area

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 4 de 114

Data Warehouse

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 5 de 114

Data Mart

Selecciona una de las siguientes respuestas posibles:

Explicación

Pregunta 6 de 114

Analytical Database

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 7 de 114

EDW & Big Data Comparison

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 8 de 114

EDW & Big Data Integration

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 9 de 114

Series Approach

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 10 de 114

Big Data Appliance Approach

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 11 de 114

Data Virtualization Approach

Selecciona una o más de las siguientes respuestas posibles:

Explicación

Pregunta 12 de 114

To reduce storage cost and speed up operational reporting, an online transaction processing system (OLTP) can be replaced with an operational data store (ODS).

Selecciona uno de los siguientes:

Explicación

Pregunta 13 de 114

In a data warehouse, data is kept in a fully normalized form for easier reporting

Selecciona uno de los siguientes:

Explicación

Pregunta 14 de 114

When compared with an ODS, a data warehouse’s queries are generally more complex, involving multiple tables spanning over a longer range of data. However, data import is less frequent because a data warehouse is not used for operational reporting

Selecciona uno de los siguientes:

Explicación

Pregunta 15 de 114

An analytical database can either be based on a columnar database or in-memory solutions for fast data access

Selecciona uno de los siguientes:

Explicación

Pregunta 16 de 114

To obtain the benefits linked with the adoption of Big Data, an EDW needs to be replaced with Big Data-specific technologies since the EDW cannot store unstructured data

Selecciona uno de los siguientes:

Explicación

Pregunta 17 de 114

The next-generation data warehouse consists of Big Data storage technologies that can store large amounts of structured as well as unstructured data

Selecciona uno de los siguientes:

Explicación

Pregunta 18 de 114

In a Big Data environment, the query workloads are generally unknown because of the adhoc nature of analytical queries

Selecciona uno de los siguientes:

Explicación

Pregunta 19 de 114

In the series approach of EDW and Big Data integration, semi-structured and unstructured data is ingested by the Big Data platform, and only structured data is ingested by the EDW

Selecciona uno de los siguientes:

Explicación

Pregunta 20 de 114

One disadvantage of the series approach is that the Big Data platform cannot be directly accessed for performing analysis on large amounts of raw data