BIG DATA C0728681_OA_17jan Big data is a complex set of huge data that cannot beprocessed by using the application software existed before. Big data contains structured,semi-structured and unstructured data. Unstructured data files contain text andmultimedia content.
They can be videos, photos, emails, presentations, webpagesand many other document types. These are considered unstructured because itdoesn’t fit in a database. More than 80 percent of the data in an organizationis unstructured.
Semi-structured data doesn’t fit in a relational database but theyhave some properties which helps to analyze them easily. The internet of thingsdevices gathers a lot of relevant and irrelevant data which makes the growth ofdata sets to double every 40 months since 1980’s. Big data size has been movedto petabytes of data over these years. The main concern about big data is howto effectively use the collected data to reveal meaningful information fromthat. For that we need advanced tools i.e., analytics and algorithms. The four main dimensions of big data are Volume, Variety,Veracity and Velocity.
Volume is the quantity of data that is collected, processed,analysed and stored. The volume determines whether it can be considered bigdata or not. Variety refers to the type and nature of data, which means thedata can be in different format from different sources. This helps us toanalyse the data and to use them effectively. From the context of velocity, itmeans how fast the data is coming to us. The speed of data generation andprocessing is crucial to meet the high demands and challenges in the field ofgrowth and development.
Veracity refers to the quality of the data that iscaptured to process. The more quality the data has the more value it has. Wecan increase the quality of data while processing or analysing the data.To analyse these big data set we need statistical,mathematical and computational methods. Hadoop is an open-source, java basedprogramming framework.
Hadoop supports the processing and storage of large datasets in distributed computing environment. Hadoop facilitates rapid datatransfer among nodes. Even in case of node failure, system can continue to operate.Certainly big data is the right data, inthe right form, handled in the right manner.