Monday 30 June 2014

Analyzing Big Data

Let us take a quick look at some of the ways in which big data is processed, maintained and analyzed to provide valuable insights.

Quite a good volume of data had been available since a decade, but the extent to which it was used was less than today.  We have started exploring data - that was already available, and new data that is coming in every other milli second (streaming data) - to make valuable decisions.

Types of Data
Data can be categorized as structured, unstructured and semi structured.
  1. Structured data is data that is in a pre-defined format.  Data stored in databases, and spread sheets are examples of structured data.  Structured data can be analyzed easily.  
  2. Unstructured data refers to data that does not have a pre-defined format. Sentences, texts, stories, pictures are all examples of unstructured data.  Text mining tools have to be used to uncover data in unstructured format.
  3. Data in xml files (and other markup languages) are semi-structured - some parts of the data is structured.
Data Cleansing
Analysis performed from inaccurate, erroneous, or duplicate data will reduce the value.  Hence it is required that the data available is cleansed before analysis.

For example, a person in India gives a 5 digit pin code in his address.  The system has to immediately highlight that the data is inaccurate.

Maintaining a Golden Record for every entity
For every entity 'a single version of true data' has to be stored.  Details of operations that are performed on data has to be stored and a copy of the data before modification also needs to be maintained.

For example, a customer of a Bank has a Savings account and a Fixed Deposit.  It is good to maintain one record of the customer, containing all his details (like name, date of birth, gender, address), rather than having two copies of this data.

Analyzing Streaming Data
Data that is transferred at a high speed rate is known as streaming data. An example that we might have noticed is heart beat monitors attached to patients.  Other examples include network signals and transactions over the internet.  In some cases, monitoring streaming data becomes very important.  Product that can easily ingest and analyze data can help when critical decisions have to be made using streaming data.

Data Integration and Governance
A software that can integrated data from multiple systems and provide a complete 360 degree view of each entity involved is a required.  Also at each stage of the processing, managing data quality is important.

For example, let us again consider a bank customer, having Rs. 50,00,000/- Home Loan, Rs.60,000/- balance is credit card and a Savings account with balance less than Rs.1,000/-  If the bank can get a complete view of this customer and drill in to his history records, it is easy for the bank manager to decide, if the customer approaches for a car loan for Rs, 10,00,000/-

Data Exploration
Sometimes, data outside the organization (eg., number of Likes in Facebook, Analysis done by government or third party agencies) also become crucial during analysis.  A software that helps to uncover value from data in internal and external sources and is a key component of big data analysis.

For example, a popular brand wants to compare its performance in various cities in a county.  It also wants to compare itself with its competitors based on the Like votes in Facebook, in those cities.

Predictive Analysis
Predictive Analysis makes use of historical data and current data to make predictions about the future.  This is one of the most frequently used analysis technique.

For example, based on the number of enterprises that have started using big data and analysis, it is possible to predict the number of data scientists required after five years.
 
Some of the products that provide these capabilities are given below.
IBM InfoSphere Information Server
IBM InfoSphere Master Data Management
IBM Information Integration and Governance
IBM InfoSphere BigInsights
IBM Stream Computing
IBM InfoSphere Data Explorer
IBM SPSS Software


Sunday 29 June 2014

Big Data, Analytics and Insights

What is Big Data
Big data is one of the most important topic today.  So what is "big data"?  Does it mean "data" has become big?

For my point of view, I feel that the data that can be stored in digital form has increased.  Information has always been there, from stone inscriptions to those written on books, but the readers were limited, owing to knowledge, access, language, distance and the time period up to which it could be preserved.

Today data that can be stored and accessed has increased in quantity.  All of us here have email ids, send emails, have mobile devices, send SMSes, many of us have installed Whats App and use it, tweet and use Facebook.  We use these to exchange information - data.  All this data is stored somewhere, hence we are able to search for a mail which we received four years ago.  We are able to see a photograph of our friend's child.  We get an SMS from Railways giving the PNR number and requesting not to waste paper, an instant message from a Bank intimating that the EMI for Housing Loan has been deducted, a message in Facebook that a movie that you wanted to see is not so good and that a shop your regularly visit has opened a store in your neighbourhood.  You wish your friend on her Birthday, and Thank Facebook for the timely reminder that it gave you.  You too, tweet, write a blog and post your favourite photo in Facebook.  The newspapers and magazines are available over the Internet, so it is not necessary for you to spend time with them in 'paper' form early in the  morning.  With all this, we get to know that the 'Volume' of data has increased. 

By the time I complete this post, so many others would be also be adding other data by posting photographs, marking 'Like' for a garment brand, tweeting comments on a product, greeting a friend for passing an examination, booking railway tickets, transferring money from a bank account using NEFT etc.  The data that already exists keeps changing very fast (Velocity)

To store all these Varieties of data - written text, photographs, number of Likes for a garment brand, railway ticket booking, managing bank balance and transfer of money securely, the systems should be appropriately equipped.

In some of the above examples, the Veracity (how true the data is) plays a very important role. Incorrectly tagging a person in Facebook is a common example false data.

Hence Big Data is a huge Volume of data, that keeps on increasing, of different Varieties and the Velocity at which this data changes is significant.  Veracity of this data has to be ensured.  We realize its importance based on the Value it provides.

Why Big Data?
Consciously or sub consciously we have been using analyzed Big Data.  Google displays the search results using maximum access as the criteria.  Facebook and LinkedIn do show us your "Could be" Friends.

Big Data is not there just because the extent of storage capability has increased.  It is there because it can be analyzed to provide us valuable Insights. 

For example, after I Publish a Post, Blogger allows me to view valuable Statistics.  It provides details on number of times each post has been read, read count based on countries, browsers, operating systems and the link using which the reader read to the Post, which provides me some insights.  As soon as I publish, the read count starts increasing slowly, it increases sharply once I give the link in Facebook, and it decreases after the post is a week old or so.

When an individual is so much interested in knowing the number of readers, it is obvious that Governments, Banks, Insurance, and Retailers would want to know Insights which would help them improve their services, and / or maximize their profits.

Actionable Insights
1. Some of the actions that Banks take based on the Insights might have already been experienced or observed.  A Bank in which you have your salary account (or a good amount as balance) is  ready to give you a Personal Loan without provision of documents.  The organization is also treats you as a Privileged customer, thereby ensuring that you are satisfied with their service and will remain their customer.

2. When you are regularly consulting a Diabetes hospital, the hospital staff call you up when your next Consultation is due.

3. A reputed food chain has started offering Vegetarian food after getting to know closely that most of the potential customers are Vegetarians.

Given below are other examples.
1. On knowing that a store is running short of a particular drug, the Pharma company can immediately replenish the stock there.  If there are many such stores run short of the same drug, the company may need to decide whether the production of the drug has to be increased.

2. A particular garment brand compared itself with its competitors using details from a social website and found that its performance has dramatically reduced.  It then started replenishing stores with stocks based on the geography and reducing prices to make the clothes suit the style of the locals and affordable to them.  Also it started advertising through the social website and adding new designs.

3. Currently, organizations have started recognizing the importance of big data and analysis and have predicted that the industry would require a number of Data Scientists over the years to come. (Predictive analysis).

With Big Data and Analysis playing a key role in almost all domains, we need to understand the components that are used to obtain the actionable insights and the opportunities that it provides us.

The below links will enable you get a good understanding of Big Data and Insights.
What is Big Data Analysis
Big Data - What it is and how it matters
IBM Big Data Use Cases
IBM Big Data in Action


Monday 23 June 2014

IBM InfoSphere Master Data Management on Cloud

Cloud computing refers to delivery of computing resources over the Internet on a pay for use basis.

Cloud computing services are available in three ways.
1. Software As A Service (SaaS) where the software is installed on servers (in the cloud) and the user can connect through it through the Internet. (Example: IBM Sterling Supply Chain Visibility, Google, Facebook)

2. Platform As A Service (PaaS) where a cloud based environment is provided using which the application can be built and deployed. (Example: IBM BlueMix, IBM SmartCloud Application Services)

3. Infrastructure As A Service (IaaS) where servers, storage space and networking is provided on a pay per use basis. (Example IBM SoftLayer)

Managing Big Data and analysis is becoming critical to businesses and they slowly start realizing the importance of having a Master Data Management solution to meet their objectives.

IBM InfoSphere Master Data Management (from version 11.0.0) can now be deployed on the cloud, thereby providing customers the below advantages.
1. Accelerated deployment
2. Pay per use model
3. Maintenance of the MDM solution on cloud is addressed by IBM

These facilities enable the businesses to obtain value quickly, without initial capital expenditure.  Hence small and medium sized enterprises can also experience the benefits of using IBM Master Data Management.

Please refer the below link for further details.
IBM Master Data Management

Sunday 22 June 2014

IBM InfoSphere Master Data Management

With Big Data and Analysis forming a basis for successful enterprises, we still find many organizations where data is in silos.

For example, there is an organization, which runs Banking, Insurance and other businesses.  A customer have a Savings Account and three insurance policies with this organization.  For each insurance policy, the organization has given a Customer Id.  So this customer, has three Customer Ids and one Savings Bank Account Number.  To update any single detail, the update has to be done four times, one for each Customer Id.  If a customer updates only two of them, the organization is left with inconsistent data.

From the organization's perspective. If they keep holding a separate Customer Id for each insurance policy, the extent of data they have to maintain keeps increasing.  The extent of inconsistent data is unknown.  It is difficult to come up with a single view of a customer.

Having a single view of every customer would help organizations improve their revenue.  Suppose this organization is able to make out that customer A who has an Insurance policy also has a Savings Account, an option of direct debit from the account could be suggested to the Customer.  It would also be easy for the organization to get a complete view of the Customer.

IBM InfoSphere Master Data Management (MDM) is a solution that helps organizations to obtain a single view of each customer.  Master Data refers to data that is shared across the organization.  Hence in our example, master data refers to data about the customer. MDM provides capabilities to match and merge records from multiple sources. This master data can then be used across the organization.

For further details on the features that this product provides, please refer IBM InfoSphere Master Data Management