Chitra Ananthanarayanan: August 2014

Friday, 29 August 2014

IBM BlueMix - An Introduction And Hands On

IBM Bluemix is a cloud computing Platform As A Service. With terms like Cloud and PaaS very familiar to us now, using Bluemix will make the concepts clear to us. Not only that, we can deliver innovative solutions using the Apps and Services available on Bluemix.

The below steps, which are simple, will enable developing a sample application on Bluemix. And my first try is http://chitra.mybluemix.net/

Pre-Requisites
1. Register for Bluemix at https://ace.ng.bluemix.net
2. An Eclipse instance has to be installed. This can be installed from www.eclipse.org (https://www.eclipse.org/downloads/packages/release/Luna/R)

Installing IBM Eclipse Tools for Bluemix
After launching Eclipse, select Eclipse MarketPlace from the Help menu and search for IBM Eclipse Tools for Bluemix. Select all the components under the IBM Eclipse Tools for Bluemix, accept the License, and Install the plugin. Follow the prompt and restart Eclipse.

Creating a new Server
Right click on the Server tab, select New Server and IBM Bluemix Server, click Next and provide your email and password for logging into the Bluemix server and Validate Account. Click Next, accept the default Organizations and Spaces provided. You would then be able to see the Bluemix server added to the Server tab.

Creating a simple application
As a simple example, create a web application, with the Target runtime as IBM Bluemix Runtime and add a html file in the WebContent folder.

Deploying the application on a Bluemix server
On the IBM Bluemix Server in the Server tab, select Add and Remove.. and add the web project created. Provide the application details, and in the Launch Deployment screen add a value for the Sub domain. The sub domain and domain name will determine the name of the Deployment URL. Since the application is simple, accept the default values and click on Finish.

The deployment of the application will start.

When the application is deployed, you will be able to find your html page at the Deployment URL. Mine is http://chitra.mybluemix.net/

Friday, 22 August 2014

Will the market price of this share increase in the next two months? A peek into IBM Predictive Analysis

Short term Analysis
Before buying some shares of a company, you want to make sure that the company is doing good in the market and that the market price of the share will increase. You will consider the brand value, discuss with your friends regarding the performance of the company, refer to websites that depict the recent trend of this share's market price, compare it with its competitors and also ensure that the company has good reviews from the general public. Based on all these factors you use for analysis, you will be able to predict whether the market value of this share will retain, lose or increase in currency.

So, in order to have its shares sold and to have an increase in profit, the company has to ensure that it develops or maintains a good brand name, good feedback from the public - thereby leading to good reviews in the websites, and also, compare itself with the competitors.

Long term Planning
Most of us may maintain a pension related policy, so that we can remain financially independent and enjoy our retired life. We also invest money in property to make it of use to us after a period of few years. This is long term planning.

Similar long term planning is required for companies too. Some of the companies we see have been there for generations. It is all because of the good will that the company has obtained and it has retained.

A company will have to regularly monitor its progress and if it finds it getting down, immediately come up with remedial measures to retain its position in the market.

Predictive Analysis and Strategies
Analysis of existing and streaming data in structured and unstructured formats, using data mining, text mining, and other analytical techniques help organization in decision making and plan for the future. The IBM Predictive Analysis set of products are user friendly and help the customers in finding out patterns in performance, thereby enabling the customer in planning and coming up with strategies. The products available helps the customers in obtaining customer's views accurately, performing statistical analysis on data, finding patterns and trends and providing optimized decisions. Some of these products are also available on the cloud.

Real World Scenarios and IBM Predictive Analysis Products
Let us have a quick look at where most of these products fit into.

How did you get to know me? - SPSS Data Collection
Has this happened to you? When you visit a garment store, has the representative there provided you a form for providing feedback. In one of my experiences the feedback form had a column asking How I got to know about the shop. It could have been from the social websites in the Internet, from a friend, from the advertisement in the television or newspaper, or just a casual visit after the store had been noticed. Now this detail which may been trivial to the customer is very critical to the company. Based on the percentage of customers that have come from the different sources, they could decide the best source of advertising about themselves. Based on the rating given by the customer, they could plan to provide the best service.

Refer SPSS Data Collection Professional

Customer Service At Stores and Product Quality - Predictive Maintenance and Quality
At each branch of a particular store, there is a sales person for each brand. All of them are so capable that they explain how a particular consumer product will help you and make you buy it. Supposing one of the ten brands in that store does not sell well at all, possibly because the product is not good or because that sales person is not taking adequate time to explain its features, the company has to analyze the proper reason and take remedial measures. Similar analysis has to be done for products that have crossed the expiry date and still lying on the shelves of stores. So feedback that comes directly from the stores also has to taken into consideration.

Feedback from the Internet sites - IBM Social Media Analytics
To decide on a particular brand to get a Refrigerator, we browsed through the Internet to see the Reviews it had obtained. Text analysis and analysis of the rating obtained forms a major source for feedback analysis. Also, it is important to find the number of Likes that this brand has obtained and to compare it with the Likes that competitors have obtained.

Retaining Customers - IBM SPSS Data Collection
After purchasing the Refrigerator, the customer faces some problem with it. He immediately calls up the contact number of the company and a company promises to send a person for servicing. The politeness that the customer representative shows over phone and the speed and quality of services provided by the representative doing the repair determines the feedback this customer may represent over the social networks and discuss with friends and relatives. For a product like a refrigerator, it majorly affects only the brand value, but when it comes to an Internet Service Provider it determines how long a customer will remain your customer. Hence analysis of the quality of conversation between the customer and the representative is a important source for feedback from the customer.

Customer Purchase Pattern - IBM Predictive Customer Intelligence
Assuming I have a Customer Id with a garment store and for the last couple of years I have visited the store and purchased dresses for a few thousand Rupees, the garment store could predict that I would visit again when discount is given this year and could send me an SMS to inform me about the discount. Analyzing each regular customer's purchase pattern could be of immense value to a company.

Moving with the Current Trend - IBM SPSS Statistics
In all fields, it is important to move with the current trend. Without stock of a new release of a mobile, from a highly reputed brand, a mobile store would not run well. If noodles with oats is not available in a grocery shop, I would obviously move to the next shop. Hence, noticing the current trend and introduction of products and planning to obtain the product and stock it at stores is important.

Sunday, 17 August 2014

IBM 'Big Match'

The IBM Big Match provides the customers a way to obtain master data from the Big Data stored in IBM BigInsights using the IBM MDM Probabilistic Matching Engine.

The Probabilistic Matching Engine makes use of standardization, compare, score and link techniques to decide on whether two records map to the same entity. The PME can be used with the BigInsights to provide the Big Match functionality.

Using the IBM MDM Workbench, the Probabilistic Matching Engine can be configured and exported for use with IBM InfoSphere BigInsights.

The Big Match is massively scalable and is capable of performing faster real time matching, which helps the customer obtain a 360 degree view of the entities quickly.

Links
Technical Overview of Big Data Matching
Harness big data and use actionable insights to provide data confidence
IBM Big Match For Hadoop

Example of Probabilistic Matching
A simple example of MDM Probabilistic Matching is provided here

Lakshmi and Sevaal go to a particular showroom, which has dress materials of a specific brand and purchase a couple of dress materials. Both of them are given membership ids. Hence their details including their customer id, mobile number and address are stored in the database the store maintains.

Assuming,
Customer Id: 2342
Name: Lakshmi Srinivasan
DoB: Not provided
PAN: SSCCI222M
Address: 18, Silver Street, Oak Rd, Chennai - 59
Mobile: 091-4423423482

Customer Id: 2343
Name: Sevaal V
Age: 09-07-78
PAN: SSDPG2433V
Address: 19, Silver Street, Oak Road Chennai 600059
Mobile: 9234223143

Both of them like different types of dress materials, mark a Like in Facebook for the particular type and order for dress materials from that brand through online shopping sites too.

Sevaal changes her mobile number and has misplaced her membership card. After few months, she goes to a different branch of the same showroom and ends up getting a new membership card, with a different customer id.

New Details given:

Customer Id: 2545
Name: Sevaal Vasudevan
Age: 09-07-1978
PAN: SSDPG2433V
Address: 19, Silver Rd., Oak Road Chennai 600059
Mobile: 08242479824

Lakshmi migrates to Bangalore and the showroom gives her a different customer id and membership id

Customer Id: 3454

Name: Lakshmi S
DoB: 30-01-1983
PAN: SSCCI222M
Address: 88, Jaya Nagar, Blr 560089
Mobile: 9283425839

What is given here is just details of two customers. The store has thousands of such customers, many of them having only one customer id, but some of them having two customer ids. Most of these customers are present in Facebook and have provided their feedback there. Some of them have filled feedback in the store. Some of them have preferred to go in for online shopping as well.

The store decides to provide a discount of 5% to the regular customers. Without thousands of customer records (volume), lakhs of purchase records (volume), thousands of records from the Internet providing feedback (velocity) that keeps coming in everyday, and a few thousands of customers having more than one customer id (veracity), this is a challenge for the store. Here the variety component of the Big Data is the customer detail, the feedback, and the details that come from the Internet. Now the store has Big Data. It needs to follow to match records and find the customers and their details. Customers having two ids have to be identified and their records merged into one.

The MDM Probabilistic Matching Engine can now be used with BigInsights for Big Match. The Probabilistic Matching Engine does standardization, matching, scoring and linking to get the individual records.

Standardization
We sometimes tend to write Rd. as the short form of Road. In very few situations, we add our country code before our phone number.

Standardization ensures that the data follows certain standards. It follows certain rules and modifies Rd. to Road. It could also be customized to remove the country code and any hyphens in the telephone number.

So the records for Sevaal and Lakshmi will be stored as follows.

Customer Id: 2342
Name: Lakshmi Srinivasan
DoB:
PAN: SSCCI222M
Address: 18, Silver Street, Oak Road, Chennai 600059
Mobile: 4423423482

Customer Id: 3454
Name: Lakshmi S
DoB: 30-01-1983
PAN: SSCCI222M
Address: 88, Jaya Nagar, Bangalore 560089
Mobile: 9283425839

Customer Id: 2343
Name: Sevaal V
DoB: 09-07-1978
PAN: SSDPG2433V
Address: 19, Silver Street, Oak Road Chennai 600059
Mobile: 9234223143

Customer Id: 2545
Name: Sevaal Vasudevan
DoB: 09-07-1978
PAN: SSDPG2433V
Address: 19, Silver Street, Oak Road Chennai 600059
Mobile: 8242479824

Thus standardization makes the comparison process easy.

Matching
When the system tries to compare the name of customer 3454 using equals, to that of customer 2535, the result is unequal and the decision would be that they are different customers.

However, their PAN matches. The PAN of every individual is unique. Sevaal is not a common name in India. All the more, their Date of Birth also matches. Hence there is a strong possibility that the two records refer to the same customer.

This way of matching is very similar to the Probabilistic Matching Engine service provided by IBM Master Data Management. Attributes in records (here the attributes are Name, DoB, PAN, Address and Mobile) are matched. A positive and a negative score are provided for each attribute based on the match. And based on the total score a record obtains, the linking happens.

Scoring
Another important characteristic of Matching is that, the score is based not only on the percentage of match, but also on the frequency of occurrence. Sevaal is not a name commonly found, however Lakshmi is. Hence the score when the first name for two records is Sevaal is higher than that for Lakshmi.

PAN is unique for each individual, hence the score that can be given when PAN for two records match, could be the highest.

In the example we have considered, the records for Sevaal would obviously have a higher score than that for Lakshmi, since the name is not common, the PAN and Date of Birth matches.

The record for Lakshmi would also have a good score, since the PAN matches. However since the Date of Birth is not provided in one record, the score would reduce.

Linking
Now that the scores are obtained, a linking has to take place, that is, deciding whether the two records are the same and consolidating all details into one customer id. In general a threshold value is provided, and when the matching yields a score higher than the threshold value, it is automatically decided that the two records are the same. There would be another range of values wherein the system cannot decide whether the records are the same (potential match) and a data steward needs to decide on it (in our example, Lakshmi's record). And there is a least value, below which the system can automatically decide that the two records are not the same.

Hence after linking, Sevaal has only one customer id - 2545 and all her details would be added to this customer id. The two records for Lakshmi - 2342 and 3454 would be shown to the data steward to decide on whether they point to the same customer.

In the case of Big Match, the engine only performs the automatic linking and does not generate tasks to link records that are a potential match.

Sunday, 10 August 2014

IBM InfoSphere MDM Deployment on PureSystems

When a decision has been made to install a software in an organization, the lesser the time taken for the software to be deployed and gets into production, the more the benefits for the organization.

And that is what the IBM PureSystems does in the Cloud, it reduces the time taken for deployment in the cloud. The Pure Patterns simplify and automate tasks across the lifecycle of the application.

The InfoSphere MDM patterns make use of the virtual systems patterns, that facilitate the deployment of application software topologies, to deliver the benefits of IBM PureSystems. Virtual system patterns enables the user to modify the topology of the pattern to ensure that it resembles the specific topology required for the MDM solution.

The InfoSphere MDM supports the below patterns from IBM PureApplication® System workload console or IBM Workload Deployer.
1. Basic InfoSphere MDM pattern for DB2 for Linux, UNIX, and Windows
2. InfoSphere MDM pattern for DB2 pureScale® or IBM PureData™ System for Transactions
3. InfoSphere MDM pattern for DB2 on ZOS

Deployment of MDM using the using the basic pattern requires the user to select the specific Virtual System Pattern from the IBM PureApplication System or IBM Workload Deployer, provide certain required parameters for the database (DB2), application server (WAS) and MDM and deploy the pattern. The PureSystems ensures that the required version of DB2 and WAS are selected, thereby ensuring quicker deployment of MDM.

The automated deployment of MDM facilitated by the PureSystems which reduces the time to start up is a value add to the customer.

For further details, please refer to the below links:
Pure Systems Deployment
IBM InfoSphere Master Data Management