Big Data Analytics

Description

Mind Map on Big Data Analytics, created by chandrikasweety9 on 02/01/2014.
chandrikasweety9
Mind Map by chandrikasweety9, updated more than 1 year ago
chandrikasweety9
Created by chandrikasweety9 over 10 years ago
21
0

Resource summary

Big Data Analytics

Annotations:

  • Examining large amounts of variety types of data to uncover hidden  patterns and unknown correlations and useful information.
  1. Big data

    Annotations:

    • General term used to describe the unstructured and semi-structured data.  Data - specify the term is petabyte and exabyte.
    • Petabyte is a measure of memory or storage capacity & is 2 to the 50th power bytes in decimal approximately a thousand terabytes.
    • Exabyte(EB) is a large unit of computer data storage , 2 to the sixtieth power bytes. Approximately one quintillion bytes. In decimal terms an exabyte is a billion gigabytes.
    1. Unstructured data

      Annotations:

      • It is a general label for describing  any corporate information that does not in database. Two types - Textual and Non-textual. 
      • Textual unstructured data is generated in media like email messages, PowerPoint presentations, Word documents, collaboration software and instant messages. 
      • Non-textual unstructured data is generated in media like JPEG images, MP3 audio files and Flash video files
      1. Primary goal

        Annotations:

        • Is to discover the repeatable business patterns.
      2. Primary goal

        Annotations:

        • Is to help companies make better business decisions by enabling data scientists and other users to analyze huge volumes of transaction data as well as other data sources that may be left untapped by conventional business intelligence (BI)programs.
        • A data scientist is a job title for an employee or business intelligence (BI) consultant who excels at analyzing data, particularly large amounts of data, to help a business gain a competitive edge.  
        • A data scientist possesses a combination of analytic, machine learning, data mining and statistical skills as well as experience with algorithms and coding. They have the ability to explain the significance of data in a way that can be easily understood by others. 
        1. Technologies
          1. NoSQL

            Annotations:

            • NoSQL database, also called Not Only SQL, is an approach to data management and database design that's useful for very large sets of distributed data.  
            • NoSQL is especially useful when an enterprise needs to access and analyze massive amounts of unstructured data or data that's stored remotely on multiple virtual servers in the cloud. 
            • the most popular NoSQL database is Apache Cassandra. Cassandra, which was once Facebook’s proprietary database, was released as open source in 2008. Other NoSQL implementations include SimpleDB, Google BigTable, Apache Hadoop, MapReduce, MemcacheDB, and Voldemort. Companies that use NoSQL include NetFlix, LinkedIn andTwitter.
            1. Hadoop

              Annotations:

              •          Hadoop is created by  Doug Cutting  and Mike Cafarella.         It is a free, Java-based programming framework that supports the processing of large data sets in a distributed computing environment.      
              •      It is part of the Apache project sponsored by the Apache Software Foundation.
              1. MapReduce

                Annotations:

                • MapReduce is a software framework that allows developers to write programs that process massive amounts of unstructured data in parallel across a distributed cluster of processors or stand-alone computers.  It was developed at Google for indexing Web pages and replaced their original indexing algorithms and heuristics in 2004.
                • This framework is divided into two parts :                  1. Map, a function that parcels out work to different nodes in the distributed cluster.                  2. The MapReduce framework is fault-tolerant because each node in the cluster is expected to report back periodically with completed work and status updates. If a node remains silent for longer than the expected interval, a master node makes note and re-assigns the work to other nodes.
              Show full summary Hide full summary

              Similar

              Analytics Terminology
              vasudha s
              Industrial Data Scientist: The New Limb of Industrial Workforce
              Data science council of America
              A Beginners Guide to Predictive Analytics: Turning Data Into Insights
              Data science council of America
              How To Develop An Impressive Data Analyst Portfolio That Will Get You Hired?
              Data science council of America
              Automated Data Analytics: How, When & Why? 
              Data science council of America
              Why Big Data Automation is Important for Your Business
              Data science council of America
              Business Studies Unit 2
              tara.springate
              Romeo and Juliet essay
              Tambo234
              One child policy, China- Population Control Case Study
              a a
              F211- Module 1 Cells, exchange and transport
              eilish.waite
              Physics 1
              Peter Hoskins