Chitra Ananthanarayanan

Friday, 22 August 2014

Will the market price of this share increase in the next two months? A peek into IBM Predictive Analysis

Short term Analysis
Before buying some shares of a company, you want to make sure that the company is doing good in the market and that the market price of the share will increase. You will consider the brand value, discuss with your friends regarding the performance of the company, refer to websites that depict the recent trend of this share's market price, compare it with its competitors and also ensure that the company has good reviews from the general public. Based on all these factors you use for analysis, you will be able to predict whether the market value of this share will retain, lose or increase in currency.

So, in order to have its shares sold and to have an increase in profit, the company has to ensure that it develops or maintains a good brand name, good feedback from the public - thereby leading to good reviews in the websites, and also, compare itself with the competitors.

Long term Planning
Most of us may maintain a pension related policy, so that we can remain financially independent and enjoy our retired life. We also invest money in property to make it of use to us after a period of few years. This is long term planning.

Similar long term planning is required for companies too. Some of the companies we see have been there for generations. It is all because of the good will that the company has obtained and it has retained.

A company will have to regularly monitor its progress and if it finds it getting down, immediately come up with remedial measures to retain its position in the market.

Predictive Analysis and Strategies
Analysis of existing and streaming data in structured and unstructured formats, using data mining, text mining, and other analytical techniques help organization in decision making and plan for the future. The IBM Predictive Analysis set of products are user friendly and help the customers in finding out patterns in performance, thereby enabling the customer in planning and coming up with strategies. The products available helps the customers in obtaining customer's views accurately, performing statistical analysis on data, finding patterns and trends and providing optimized decisions. Some of these products are also available on the cloud.

Real World Scenarios and IBM Predictive Analysis Products
Let us have a quick look at where most of these products fit into.

How did you get to know me? - SPSS Data Collection
Has this happened to you? When you visit a garment store, has the representative there provided you a form for providing feedback. In one of my experiences the feedback form had a column asking How I got to know about the shop. It could have been from the social websites in the Internet, from a friend, from the advertisement in the television or newspaper, or just a casual visit after the store had been noticed. Now this detail which may been trivial to the customer is very critical to the company. Based on the percentage of customers that have come from the different sources, they could decide the best source of advertising about themselves. Based on the rating given by the customer, they could plan to provide the best service.

Refer SPSS Data Collection Professional

Customer Service At Stores and Product Quality - Predictive Maintenance and Quality
At each branch of a particular store, there is a sales person for each brand. All of them are so capable that they explain how a particular consumer product will help you and make you buy it. Supposing one of the ten brands in that store does not sell well at all, possibly because the product is not good or because that sales person is not taking adequate time to explain its features, the company has to analyze the proper reason and take remedial measures. Similar analysis has to be done for products that have crossed the expiry date and still lying on the shelves of stores. So feedback that comes directly from the stores also has to taken into consideration.

Feedback from the Internet sites - IBM Social Media Analytics
To decide on a particular brand to get a Refrigerator, we browsed through the Internet to see the Reviews it had obtained. Text analysis and analysis of the rating obtained forms a major source for feedback analysis. Also, it is important to find the number of Likes that this brand has obtained and to compare it with the Likes that competitors have obtained.

Retaining Customers - IBM SPSS Data Collection
After purchasing the Refrigerator, the customer faces some problem with it. He immediately calls up the contact number of the company and a company promises to send a person for servicing. The politeness that the customer representative shows over phone and the speed and quality of services provided by the representative doing the repair determines the feedback this customer may represent over the social networks and discuss with friends and relatives. For a product like a refrigerator, it majorly affects only the brand value, but when it comes to an Internet Service Provider it determines how long a customer will remain your customer. Hence analysis of the quality of conversation between the customer and the representative is a important source for feedback from the customer.

Customer Purchase Pattern - IBM Predictive Customer Intelligence
Assuming I have a Customer Id with a garment store and for the last couple of years I have visited the store and purchased dresses for a few thousand Rupees, the garment store could predict that I would visit again when discount is given this year and could send me an SMS to inform me about the discount. Analyzing each regular customer's purchase pattern could be of immense value to a company.

Moving with the Current Trend - IBM SPSS Statistics
In all fields, it is important to move with the current trend. Without stock of a new release of a mobile, from a highly reputed brand, a mobile store would not run well. If noodles with oats is not available in a grocery shop, I would obviously move to the next shop. Hence, noticing the current trend and introduction of products and planning to obtain the product and stock it at stores is important.

Sunday, 17 August 2014

IBM 'Big Match'

The IBM Big Match provides the customers a way to obtain master data from the Big Data stored in IBM BigInsights using the IBM MDM Probabilistic Matching Engine.

The Probabilistic Matching Engine makes use of standardization, compare, score and link techniques to decide on whether two records map to the same entity. The PME can be used with the BigInsights to provide the Big Match functionality.

Using the IBM MDM Workbench, the Probabilistic Matching Engine can be configured and exported for use with IBM InfoSphere BigInsights.

The Big Match is massively scalable and is capable of performing faster real time matching, which helps the customer obtain a 360 degree view of the entities quickly.

Links
Technical Overview of Big Data Matching
Harness big data and use actionable insights to provide data confidence
IBM Big Match For Hadoop

Example of Probabilistic Matching
A simple example of MDM Probabilistic Matching is provided here

Lakshmi and Sevaal go to a particular showroom, which has dress materials of a specific brand and purchase a couple of dress materials. Both of them are given membership ids. Hence their details including their customer id, mobile number and address are stored in the database the store maintains.

Assuming,
Customer Id: 2342
Name: Lakshmi Srinivasan
DoB: Not provided
PAN: SSCCI222M
Address: 18, Silver Street, Oak Rd, Chennai - 59
Mobile: 091-4423423482

Customer Id: 2343
Name: Sevaal V
Age: 09-07-78
PAN: SSDPG2433V
Address: 19, Silver Street, Oak Road Chennai 600059
Mobile: 9234223143

Both of them like different types of dress materials, mark a Like in Facebook for the particular type and order for dress materials from that brand through online shopping sites too.

Sevaal changes her mobile number and has misplaced her membership card. After few months, she goes to a different branch of the same showroom and ends up getting a new membership card, with a different customer id.

New Details given:

Customer Id: 2545
Name: Sevaal Vasudevan
Age: 09-07-1978
PAN: SSDPG2433V
Address: 19, Silver Rd., Oak Road Chennai 600059
Mobile: 08242479824

Lakshmi migrates to Bangalore and the showroom gives her a different customer id and membership id

Customer Id: 3454

Name: Lakshmi S
DoB: 30-01-1983
PAN: SSCCI222M
Address: 88, Jaya Nagar, Blr 560089
Mobile: 9283425839

What is given here is just details of two customers. The store has thousands of such customers, many of them having only one customer id, but some of them having two customer ids. Most of these customers are present in Facebook and have provided their feedback there. Some of them have filled feedback in the store. Some of them have preferred to go in for online shopping as well.

The store decides to provide a discount of 5% to the regular customers. Without thousands of customer records (volume), lakhs of purchase records (volume), thousands of records from the Internet providing feedback (velocity) that keeps coming in everyday, and a few thousands of customers having more than one customer id (veracity), this is a challenge for the store. Here the variety component of the Big Data is the customer detail, the feedback, and the details that come from the Internet. Now the store has Big Data. It needs to follow to match records and find the customers and their details. Customers having two ids have to be identified and their records merged into one.

The MDM Probabilistic Matching Engine can now be used with BigInsights for Big Match. The Probabilistic Matching Engine does standardization, matching, scoring and linking to get the individual records.

Standardization
We sometimes tend to write Rd. as the short form of Road. In very few situations, we add our country code before our phone number.

Standardization ensures that the data follows certain standards. It follows certain rules and modifies Rd. to Road. It could also be customized to remove the country code and any hyphens in the telephone number.

So the records for Sevaal and Lakshmi will be stored as follows.

Customer Id: 2342
Name: Lakshmi Srinivasan
DoB:
PAN: SSCCI222M
Address: 18, Silver Street, Oak Road, Chennai 600059
Mobile: 4423423482

Customer Id: 3454
Name: Lakshmi S
DoB: 30-01-1983
PAN: SSCCI222M
Address: 88, Jaya Nagar, Bangalore 560089
Mobile: 9283425839

Customer Id: 2343
Name: Sevaal V
DoB: 09-07-1978
PAN: SSDPG2433V
Address: 19, Silver Street, Oak Road Chennai 600059
Mobile: 9234223143

Customer Id: 2545
Name: Sevaal Vasudevan
DoB: 09-07-1978
PAN: SSDPG2433V
Address: 19, Silver Street, Oak Road Chennai 600059
Mobile: 8242479824

Thus standardization makes the comparison process easy.

Matching
When the system tries to compare the name of customer 3454 using equals, to that of customer 2535, the result is unequal and the decision would be that they are different customers.

However, their PAN matches. The PAN of every individual is unique. Sevaal is not a common name in India. All the more, their Date of Birth also matches. Hence there is a strong possibility that the two records refer to the same customer.

This way of matching is very similar to the Probabilistic Matching Engine service provided by IBM Master Data Management. Attributes in records (here the attributes are Name, DoB, PAN, Address and Mobile) are matched. A positive and a negative score are provided for each attribute based on the match. And based on the total score a record obtains, the linking happens.

Scoring
Another important characteristic of Matching is that, the score is based not only on the percentage of match, but also on the frequency of occurrence. Sevaal is not a name commonly found, however Lakshmi is. Hence the score when the first name for two records is Sevaal is higher than that for Lakshmi.

PAN is unique for each individual, hence the score that can be given when PAN for two records match, could be the highest.

In the example we have considered, the records for Sevaal would obviously have a higher score than that for Lakshmi, since the name is not common, the PAN and Date of Birth matches.

The record for Lakshmi would also have a good score, since the PAN matches. However since the Date of Birth is not provided in one record, the score would reduce.

Linking
Now that the scores are obtained, a linking has to take place, that is, deciding whether the two records are the same and consolidating all details into one customer id. In general a threshold value is provided, and when the matching yields a score higher than the threshold value, it is automatically decided that the two records are the same. There would be another range of values wherein the system cannot decide whether the records are the same (potential match) and a data steward needs to decide on it (in our example, Lakshmi's record). And there is a least value, below which the system can automatically decide that the two records are not the same.

Hence after linking, Sevaal has only one customer id - 2545 and all her details would be added to this customer id. The two records for Lakshmi - 2342 and 3454 would be shown to the data steward to decide on whether they point to the same customer.

In the case of Big Match, the engine only performs the automatic linking and does not generate tasks to link records that are a potential match.

Sunday, 10 August 2014

IBM InfoSphere MDM Deployment on PureSystems

When a decision has been made to install a software in an organization, the lesser the time taken for the software to be deployed and gets into production, the more the benefits for the organization.

And that is what the IBM PureSystems does in the Cloud, it reduces the time taken for deployment in the cloud. The Pure Patterns simplify and automate tasks across the lifecycle of the application.

The InfoSphere MDM patterns make use of the virtual systems patterns, that facilitate the deployment of application software topologies, to deliver the benefits of IBM PureSystems. Virtual system patterns enables the user to modify the topology of the pattern to ensure that it resembles the specific topology required for the MDM solution.

The InfoSphere MDM supports the below patterns from IBM PureApplication® System workload console or IBM Workload Deployer.
1. Basic InfoSphere MDM pattern for DB2 for Linux, UNIX, and Windows
2. InfoSphere MDM pattern for DB2 pureScale® or IBM PureData™ System for Transactions
3. InfoSphere MDM pattern for DB2 on ZOS

Deployment of MDM using the using the basic pattern requires the user to select the specific Virtual System Pattern from the IBM PureApplication System or IBM Workload Deployer, provide certain required parameters for the database (DB2), application server (WAS) and MDM and deploy the pattern. The PureSystems ensures that the required version of DB2 and WAS are selected, thereby ensuring quicker deployment of MDM.

The automated deployment of MDM facilitated by the PureSystems which reduces the time to start up is a value add to the customer.

For further details, please refer to the below links:
Pure Systems Deployment
IBM InfoSphere Master Data Management

Thursday, 31 July 2014

From RDBMS to NoSQL to DBaaS

RDBMS
Relational Database Management Systems had been a subject I paid special attention to, when I was at college. Having used DB2 and Oracle, the attention that the NoSQL databases are getting over the past two years or so, made me think why we need it, and whether RDBMS stay for the coming decades.

The ACID (Atomicity, Consistency, Isolation and Durability) properties and referential integrity that RDBMS provides us cannot be compromised in many systems. That means, that for some systems, there were some needs, that led to the emergence and high usage of NoSQL (Not Only SQL ) systems.

NoSQL
An obvious reason that convinced me for the emergence of NoSQL is the increasing volume of un-structured and semi-structured data, and other documents that are used in the social networks which most of us use. Then the other reasons slowly came by. The speed at which we want a post to be published, the number of reads and likes to displayed is one. And the concurrency at which the social networks and other systems are being used, by many of us around the world, is also on the rise. Above all this, we want a 24 X 7 availability of most of the sites, without which our satisfaction rate will drop down

It is to satisfy the above requirements that the use of NoSQL is on the rise. Let us see how this is made possible.

1. Use of distributed databases
Distributed databases can be located at servers at any geographic location. This means that they could be available on servers across the internet. Also they could be located on the cloud infrastructure. Distributed databases support replication and duplication, thereby enabling continuous availability. Since data is available across many locations, concurrent usage is also made possible.

2. Horizontal scaling (sharding)
All users of Facebook want a quick login and quicker updates. With this, a wiser way to store the database of Indian users in servers in India and Canadian users in servers in Canada, than storing data at any place in the globe. This is an example of sharding.

3. Scalability
Many of the NoSQL databases are capable of storing large quantities of data. With BigData the volume of data that is generated every second is on the rise. Hence the ability to store data becomes important.

4. Schema-less databases
To enable storage of semi-structured and unstructured data, the databases do not store data in tables. Data is stored as Documents, Columns, Key Value Store or Graph Databases.

Let us have a look at a couple of ways in which data is stored.

a) Document
Documents that contain semi-structured data are stored. The MongoDB database stores documents. This database is platform independent and holds JSON like documents.

b) Column
A column in a tuple of three arguments (name, column, timestamp)
student_name: {name: "student_name", value: "vishnu", timestamp: 123456789}

A Column Family is a set of Columns. This is in some ways similar to a table, but the main difference is that, the same set of Columns need not be provided for all Column Family objects. Please notice the difference between the column families given below.

{
    student_name: {name: "student_name", value: "aditya", timestamp: 123456789}
    school_name: {name: "school_name", value: "sun shine", timestamp: 123456789}
    city: {name: "city", value: "bangalore", timestamp: 123456789},
}
{
    student_name: {name: "student_name", value: "lily", timestamp: 123456789}
    school_name: {name: "school_name", value: "sun shine", timestamp: 123456789}
    standard: {name: "standard", value: ""IV", timestamp: 123456789},
}

hbase which is like a BigTable for Hadoop uses column type storage. This is an open source database from the Apache Foundation. Hadoop uses hbase to store critical data, the size of which is much smaller when compared to the Big Data that Hadoop can store.

Database As A Service
The name indicates here that the database is provided as a service by a cloud provider. The cloud provider will do the installation, upgrades and maintenance activities on the database and the customers can invoke services on it. DBaaS reduces the time taken for installation and maintenance and manpower required for database management for the customer. With the emergence of Cloud, DBaaS is not a surprise. IBM Cloudant is a DBaaS, that stores JSON documents.

Saturday, 19 July 2014

IBM MDM in the Big Data - Interoperability between products in the Big Data platform

The goal of Big Data is to obtain valuable insights through analysis. With the IBM InfoSphere Master Data Management system serving as the single repository to obtain trusted data, let us discuss some of the key Big Data products with which IBM InfoSphere MDM could be integrated with.

Customer data from a single source or from multiple sources are loaded into MDM. Also, there are downstream systems that receive data from MDM. Some of the MDM APIs are integrated with InfoSphere DataStage for the Extract Transform and Load (ETL) operations, to form the MDM Connector. The MDM Connector can be used for ETL operations using MDM.

It is important to determine the quality of data from a data source before it is data is loaded to the MDM server. The IBM InfoSphere Information Analyzer can be used for accessing the data quality and its structure before loading data into MDM. In addition, MDM can be configured to leverage the standardization and matching features of the IBM InfoSphere QualityStage.

The term Big Data encompasses structured data and unstructured data. The IBM MDM provides a trusted single view of structured data. The IBM InfoSphere Data Explorer, the tool used to derive Insights from Big Data, uses the MDM Connectors to access data from the MDM database to obtain a holistic view of the entities.

InfoSphere MDM has a Probabilistic Matching Engine, that can be used for matching parties to identify suspected duplicates. This Probabilistic Matching Engine can be configured for use by InfoSphere BigInsights. InfoSphere BigInsights is a product that supports storage of large volumes of un-structured, semi-structured and structured data and provides data analysis capabilities on such data. The InfoSphere Data Click can also be used with MDM, to load master data into BigInsights system and other analysis sytems.

While MDM provides a single trusted view of data, business processes are required to ensure that the master data is accurate from the point of creation. IBM Business Process Management Process Center and Process Designer components can be used to create workflows that govern data steward oriented tasks. Master Data Management along with Business Process Management enable organizations to immediately take critical business decisions.

Salesforce.com, a Customer Relationship Management (CRM) solution available in the the Cloud (SaaS) is integrated with IBM MDM, which enables it to obtain a 360 degree view of its customers.

MDM data can be exported and predictive analysis can be performed using the Cognos Business Intelligence reports.

Details on the given integrations and integrations with other products could be obtained in the below links.
IBM InfoSphere Master Data Management v 11.3.0
Master Data Management, Business Process Management and Services Oriented Architecture

Saturday, 12 July 2014

Master Data Management (MDM) in Big Data

"Maintaining a golden record of every entity" - this is precisely what a Master Data Management (MDM) system does.

MDM stores a cleansed, de-duplicated trusted view of structured data and plays a major role amidst big data flowing in from the social networks and streaming data.

I do see many organizations use master data to improve their performance.

A Diabetes clinic calls a patient's mobile number when his/her consultation is due. So they do maintain the master data of their patients. When the system is good enough to store all medical data about the patient, based on the tests he/she undergoes, the Doctor's analysis reports and the medicines prescribed, each time the patient consults a Doctor, then the system becomes capable of providing a complete view of the patient's health.

A retailer sends a customer an SMS, a month before the customer's Birthday, with a greeting and offering a 5% discount for what the customer shops during that one month. So, here the master data of a customer is stored along with the mobile and date of birth and there is a system to send a message a month before his/her birthday. By doing so, the retailer ensures that they maintain a good relationship with their customers. When this retailer stores the list of items that the customer purchases, the total cost he/she pays, the mode of payment (cash or card) along with the Date of purchase, the retailer will be able to predict when the customer may visit again.

An insurance firm informs its customer through an SMS or email that insurance payment is due in a month. This is again a system in which the customer's details are stored along with the payment date of insurance. Hence the company makes sure that they do not lose their customer. The company has to ensure that such a message is being sent each time an installment has to be paid.

Some banks are able to classify their customers as Classic, Premium etc., based on the balance they maintain in their account, over a period of time. This easily indicates that extent to which they maintain big data.

One of the example we consider for Big Data is Facebook. This social website also holds master data of its users. It asks each user for name, city, employment and date of birth. The family relationships, close friends list, friends and Likes of user, along with the other primary details contribute to the master data. Alerts on friend's birthdays, the list of probable friends of a user, the groups in which a user may like to join and the personalities a user likes may be derived based on the master data.

A good Master Data Management system will ensure that the data is cleansed, duplicate data is not present and the data is trusted.

This master data plays a major role in analysis. With a complete view of patient's current health and history, a Doctor who consults, will be able to easily make out the drugs to which this patient is allergic, what medicines would not suit a patient due to the medicines being (or been) taken and prescribe treatment accordingly. A Banker would be able to suggest a Recurring Deposit (or some other plan) to an account holder, based on the balance in his/her account or based on the monthly salary being deposited to that account.

With these being the uses of analysis to a user, the benefits that a customer of MDM would be much more. A retailer can find out the lean seasons and try to give discounts during those periods of a year. During the peak seasons, they can increase the stock. They can also find out which products sell well in a particular geography and increase the stock of that product. By having fast networks, stocks can be replenished as required.

When hospital chains start having Master Data Management System, it would make the life of a patient much easier. This becomes all the more important for patients with critical illnesses. Having such a system could also help in medical research.

With these just being examples I have noticed, further details could be obtained from the below links.
IBM Master Data Management for Big Data
IBM Think Big - Big Data & MDM
IBM Master Data Management: The key to leveraging Big Data
How MDM Fits with Big Data, Mobile and Cloud
IBM Master Data Management - Solutions for Healthcare