Data that happens to be Big. How big? Probably GIGABYTE, Terabytes or may be Peta or zettaByte? The data that is Increasing each second..to elaborate,nearly 72 hours of video are added to YouTube each hour, 30 + billion pieces of data are added on Facebook each month.. nearly 1.8 trillion digital interactions take place each month..92% of worlds data has been created in the last 2 years & as per the move over Internet,the data limits may cross the present figure to double, an year after.
The question that lies ahead is where is this limit going to stop? How and where to store such Big Data in this growing computing world? What is the solution ?
Thankfully, there is a solution called “BIG DATA”.
Bigdata(R) is a horizontally-scaled, general purpose storage, designed to operate on either a single or a cluster of servers. We Know we have just blown some technical words onto you, but don’t worry, we are going to explain each in detail.. Scaling refers to enlarging the system requirements or RAM to handle multiple sets of data. Bigdata support horizontal scaling along with vertical, where a server can be attached with multiple nodes. Each node contains large amount of data, vertically scaled with the amount of RAM & hard disk required at each node. Lets Say we have 50 petabyte of data, where each node(s) will be storing say 10 Petabyte data. Presently use concepts will resides all ( 50 PetaByte ) data on one server which is vertically scaled with high RAM & hard disk. As a result query resolving will be done from a single server, resulting in slower data rates, on the other hand in BigData concepts, we have multiple nodes attached resulting in faster access to data ( Query redirecting to particular node ) with structured data Orientation.
Big Data Application Segment –
Four distinct application segments requires the BigData market, each with varying levels of need for Performance and scalability this includes – 1) Design ( Architecture ) 2) Discover ( The Node requirement, core simulation) 3) Decide ( Analytic ) and Finally Deploying 4) Deposit / Deployment (Web 2.0 and Data warehousing).
1) Design – The big data Design segment creates value by using data to drive product innovation, including working in sector of design optimization, improving time-to-design, process flow, and it also includes engineering collaboration – whereby groups of engineers have access to a consistent data set in order to reduce the possibility for error.
2) Discover – The big data discover segment creates value by performing core research in science , discovering new horizons, replacing costly physical simulation with innovative computer simulation.
As we know, the input data sets can grow to the tens or hundreds of zettabytes which would choke a traditional NAS system. The file system used ( for eg. HDFS used by hadoop) allows all nodes on a compute cluster to have parallel and direct access to the storage pool, enabling simulations to scale with the work.
3) Decide & Deployment– The big data decide and Deployment segment includes Predictive modeling, processing the design consideration and decision into real setup. It also includes analyzing of the behavior with Performance evaluation, RAID setup ( where RAID levels are assigned on a per-file basis and all files are stored as objects ) and Deploying the nodes in form of High performance cluster.
Optimizing Big data – Includes
1) FileSystem – BigData optimization resulting in higher performance depends widely on file system used for storage of data. For Example – HDFS ( Hybrid Filesystem used by hadoop), Parallel file system ( By panasas & IBM ) that supports –
- High speed, parallel access to a single file system via DirectFlow.
- High Scalability – Linearly scales to over 8PB and 150GB/s
- One unified point of management.
- Supports SSD Technology & Object RAID that delivers the fastest RAID rebuilds available
2) Solid State Drive (SSD) – SSD technology is now broadly used in Enterprise storage. SSDs are extremely fast giving high performance. Devices leverages SSDs to deliver highly optimized, high performance storage for mixed workloads – large file throughput and small file IOPS. The SSD layer is used to store both metadata and file data eliminating file conflicts that delivers faster access of data.
3) Network Protocols – For faster data flow, network protocols are used. For Example –
- DirectFlow : Giving high-performance, parallel protocol access for Linux clients
- NFS v3 : For Linux and Unix clients
- CIFS : For Windows clients.
WIPL is a web hosting company in India providing database solution like MS SQL Web hosting, Windows Web Hosting, mySQL database Hosting & BigData database hosting.