Zusammenfassung der Ressource
- Definition:- ‘Big Data’ is similar to ‘small data’, butbigger•…but
having data bigger it requires differentapproaches:• Techniques,
tools and architecture•…with an aim to solve new problems• …or
old problems in a better way
- 4Ways of BigData:-i)Volume-->DataQuantityDataSpeed ii)Velocity-->DataSpeed
DataTypes iii)Variety-->DataTypes iv)Veracity-->Messiness
- Growth:-i)Increase of storage capacities ii)Increase of processing power
iii)Availability of data
- Applications:-i)Homeland Security ii)FinanceSmarter iii)Healthcare
iv)Multi-channel sales v)Telecom vi)Manufacturing vii)Traffic Control viii)Trading
Analytics
- Hadoop Technology:-A scalable fault-tolerant grid operating system for data storage and processing Commodity
hardware HDFS: Fault-tolerant high-bandwidth clustered storage MapReduce: Distributed data processing Works
with structured and unstructured data Open source, Apache license
- Components:-i)PIG-->DataFlow ii)HIVE-->BatchSQL iiii)SQOOP-->DataImport iv)ZOOKEEPER--Coordination
v)CHUKWA-->Display & Monitoring vi)MAPREDUCE-->JobScheduling vii)HBASE-->RealTime Query viii)HDFS
-->HadoopDistributedFileSystemix)AVRO-->Serialization
- Benefits:-Hadoop is designed to run on cheap commodityhardware• It automatically handles data replication and
nodefailure• Handles large volumes of unstructured data easily• Last but not least – its free! ( Open source)