BIG DATA

Description

Mind Map on BIG DATA, created by kalaiyarasi on 01/01/2014.
kalaiyarasi
Mind Map by kalaiyarasi, updated more than 1 year ago
kalaiyarasi
Created by kalaiyarasi about 10 years ago
45
1

Resource summary

BIG DATA

Annotations:

  • Data sets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low-latency.
  • Big data comes from sensors, devices, video/audio, networks, log files, transactional applications, web, and social media - much of it generated in real time and in a very large scale.
  1. Technologies

    Annotations:

    • it needs certain exceptional technologies to efficiently process huge volumes of data in a good span of time
    1. Apache Hadoop

      Annotations:

      • Apache Hadoop is an open-source software framework for storage and large scale processing of data-sets on clusters of commodity hardware.
      1. Pig

        Annotations:

        • Pig(programming tool) was developed at Yahoo! Pig  is a high-level platform for creating MapReduce programs used with Hadoop.
        • Pig provides capabilities in the language for loading, storing, filtering, grouping, de-duplication, ordering, sorting, aggregation, and joining operations on the data
        1. Modules

          Annotations:

          • Hadoop Common - contains libraries and utilities needed by other Hadoop modules
          • Hadoop Distributed File System (HDFS) - a distributed file-system that stores data on the commodity machines
          • Hadoop YARN - a resource-management platform responsible for managing compute resources in clusters
          • Hadoop MapReduce - a programming model for large scale data processing
        2. MapReduce

          Annotations:

          • Pioneered by Google.It uses parallel, distributed algorithm. 'MapReduce' is a framework for processing problems across huge datasets using a large number of computers (nodes), collectively referred to as a cluster.
        3. Characteristics

          Annotations:

          • The McKinsey Global Institute estimates that data volume is growing 40% per year, and will grow 44x between 2009 and 2020
          • Four characteristics that define big data: 1)volume 2)velocity 3)variety 4)value
          • To make the most of big data, enterprises must evolve their IT infrastructures to handle these new high-volume, high-velocity, high-variety sources of data and integrate them with the pre-existing enterprise data to be analyzed.
          1. Volume

            Annotations:

            • Machine-generated data is produced in much larger quantities than non-traditional data. ex:For instance, a single jet engine can generate 10TB of data in 30 minutes.
            1. Velocity

              Annotations:

              • Social media data streams – while not as massive as machine-generated data. ex: Even at 140 characters per tweet, the high velocity (or frequency) of Twitter data ensures large volumes (over 8 TB per day).  
              1. Variety

                Annotations:

                • Traditional data formats tend to be relatively well defined by a data schema and change slowly. In contrast, non-traditional data formats exhibit a dizzying rate of change.
                • As new services are added, new sensors deployed, or new marketing campaigns executed, new data types are needed to capture the resultant information.
                1. Veracity

                  Annotations:

                  • uncertainty of data poor data quality costs US economy 3.1 trillion dollars a year
                2. Architecture & Patterns

                  Annotations:

                  •  "Big data architecture and patterns" series presents a structured and pattern-based approach to simplify the task of defining an overall big data architecture
                  1. Classify big data

                    Annotations:

                    • Business problems can be categorized into types of big data problems. ex:BUSINESS PROBLEM:Utilities: Predict power consumptionBIG DATA TYPE:Machine-generated data
                    1. Defining logical architecture

                      Annotations:

                      • The logical layers help to define and categorize the various components required for a big data solution. 1.Big data sources 2.Data massaging and store layer 3.Analysis layer 4.Consumption layer
                      1. Understanding patterns

                        Annotations:

                        • Addresses the most common and recurring big data problems and solutions. It helps to define a high level solution for a big data problem.
                        1. Atomic Patterns

                          Annotations:

                          • The atomic patterns describe the typical approaches for consuming, processing, accessing, and storing big data.
                          1. Composite patterns

                            Annotations:

                            • Composite patterns, which are comprised of atomic patterns to solve the big data problems.
                          2. Choosing Solution Patterns

                            Annotations:

                            • A specific solution pattern (made up of atomic and composite patterns) is applied to the business scenario. solution patterns are used to architect a big data solution.
                            1. Determining the viability of a business problem

                              Annotations:

                              • Before making the decision to invest in a big data solution, evaluate the data available for analysis.Asking the right questions is a good place to start. ex:Does my big data problem require a big data solution?What insights are possible with big data technologies?
                              1. Selecting the right product for big data solution

                                Annotations:

                                • Products and technologies that form the backbone of a big data solution
                              2. Big data analytics

                                Annotations:

                                • Without analytics, big data is just noise. Big data analytics is the use of advanced analytic techniques against very large, diverse data sets
                                1. Importance

                                  Annotations:

                                  • Analyzing big data allows analysts, researchers, and business users to gain new insights resulting in significantly better and faster decisions.
                                2. Database systems
                                  1. Massively Parallel Processing (MPP)

                                    Annotations:

                                    • A system that parallelizes the query execution of a DBMS, and splits queries and allocates them to multiple DBMS nodes in order to process massive amounts of data concurrently.
                                    • Each part communicates via messaging interface.
                                    1. Stream processing

                                      Annotations:

                                      • A system that processes a constant data (or events) stream, or a concept in which the content of a database is continuously changing over time.
                                      1. Column oriented database

                                        Annotations:

                                        • a database management system (DBMS) that stores data tables as sections of columns of data rather than as rows of data.
                                        1. Key value storage

                                          Annotations:

                                          •  Every single item in the database is stored as an attribute name (or "key"), together with its value.
                                          1. Distributed Database

                                            Annotations:

                                            • They store data across multiple computers to improve performance by allowing transactions to be processed on many machines, instead of being limited to one
                                          Show full summary Hide full summary

                                          Similar

                                          Managing Digital Data Review
                                          Shannon Anderson-Rush
                                          Big Data - Hadoop
                                          Pedro J. Plasenc
                                          SEGURIDAD DIGITAL
                                          Ivonne Montes De Oca Ospina
                                          QLIK Sense - Business Analyst
                                          Abir Chowdhury
                                          Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
                                          Data science council of America
                                          Top 5 Data Science Certifications In-demand By Fortune 500 Firms in 2022
                                          Data science council of America
                                          Big Data
                                          Edgar Reverón
                                          Conceptual Map: Big data
                                          Jeffrey Bedoya
                                          Data lake (warehouse)
                                          Prohor Leykin
                                          Big data in de zorg
                                          neandernijman1