Monday 30 June 2014

Using Data to Empower an eCommerce business

This article from +Practical Ecommerce goes through some ways to use data to empower an eCommerce business.

You could also analyse baskets - how many were updated, how many were converted into actual orders, Are any baskets modified before being converted into orders and what are those changes (remove items, change quantities, etc.)

You could analyse customers - are there more than one customer at the same physical address, are here more than one customer with the same contact telephone number etc.?

Do customers go to your competitiors web sites after visiting you?

Big Profits in Big Data

This article on +Energy and Capital discusses the large profits that are possible through Big Data.

It uses the MasterCard example of what is possible with Big Data real time.

Sunday 29 June 2014

Unsolicited data scientists solving your problems without using your data

I love this post by +Vincent Granville on Data Science Central.  In the top part he looks at the problems experienced if you post to LinkedIn regularly, but he has links to to other problems that need answers further down the page.

With reference to the discussion about estimating the number of bogus accounts on Facebook there is no mention of the various games played on FB.  I know that many people that play them have multiple accounts in order to help as many FB friends as they can.

17 White Papers on Data Science

In this post on +Big Data Made Simple they give you links to 17 White papers on Data Science.

I particularly like #13 - From Data Scientist to Data Artist. :-)

Saturday 28 June 2014

Data Governance for Big Data - Dataguise to launch a suite to do just that

In this item on +eWEEK.com they cover the announcement that +Dataguise, Inc. will launch a suite to cover Data Governance for Big Data.  We'll just have to wait and see how it's received and how well it does that job.

Fast Data - the next step after Big Data?

In this article from +InfoWorld by John Hugg of Volt DB he talks about processing data as it arrives which can be done via open source software by Apache's Storm (Twitter) or Kafka (LinkedIn).

It definitely makes since for such large volumes of data to process it as you add it to the database.


Friday 27 June 2014

SQL Server Updateable Column Store Indexes

There are two posts of note for Updateable Column Store Indexes which are available in SQL Server.

This blog entry from the Microsoft MVP Award Program Blog takes you though some of the concepts.

This blog from Rusanu Consulting is a handy Q&A on  them.

Between the two of them I think they give a great grounding in them so you can use them better.

What you must read before any Data Science interview

This list of revision items was posted by +Vincent Granville  on Data Science Central.

I would suggest it's actually worth reading it every year or more as a form of revision to make sure you stay up to date.

Thursday 26 June 2014

10 LinkedIn groups every Data Scientist should join

This list is from +Big Data Made Simple and written by Baiju NT.

More groups for some of you to add to the list. :-)

Google launches Cloud Dataflow - says MapReduce is tired.

In this article on +ZDNet it talks abou Google's announcement that it now uses Cloud Dataflow on pipelines not MapReduce which works better on single flows.

Seems that the world is changing again.

Wasting resources with duplicate data records

This blog entry from +Experian Marketing Services  talks about the fact that resources can be wasted by having bad data quality leading to duplicate data.

I have seen this myself when logic and the use of an indicator field can determine if the row should be used or not.  I'm not a fan of deleting data - you need to have integrity between data sources.

Wednesday 25 June 2014

Teradata takes a bigger approach to Big Data

This article  from +Information Management written by Mark Smith of +Ventana Research looks at the steps +Teradata  have taken to bring them into line more with Big Data.

I like the fact that Teradata has advanced its data warehouse appliance and database technologies to unify in-memory and distributed computing with Hadoop, other databases and NoSQL in one architecture - that places them perfectly for the Big Data Market.

Information security discipline needs next generation analytics capabilities to be successful in the age of Big Data

A report by Enterprise Management Associates looks at research which clearly indicates that the information security discipline needs next generation analytics capabilities to be successful in the age of Big Data.

Tuesday 24 June 2014

MasterCard anticipates big growth from Big Data insights

This article from +Reuters written by Emma Thomasson discusses the benefits via their Big Data use to provide real time data on trends and spending patterns.

Solving the Mystery of Hadoop reliability

This article  in +Forbes written by +Dan Woods   looks into the reasons Hadoop jobs are failing.

He's right - services that charge for your use don't care if it was successful or not and that could be a way to waste time and $.  He lists some great areas to look into for reasons why it either didn't finish or gave an error.  These tend to be good practice whether you are having problems or not.

Monday 23 June 2014

How companies can reduce the risk of data errors

In this article  on +Experian Data Quality written by Rachael Wheeler she points out that you need to follow a holistic approach to data quality in order to make it really work.

I agree - data quality is not about process or IT systems - it's about the whole package of process, people and IT systems working together to achieve good data quality.

Why marketers must join the social media conversation

In this article on +Search Engine Watch written by +Nathan Safran  he points out that social media is used by so many people now that it cannot be ignored by marketers.

I would add to that with this increase in the use of social media you can't ignore it from the data analysis and data mining.  There is intelligence in all that information which could give you an competitive edge.

Is there a need for a CDO (Chief Data Officer)

In this +InformationWeek  article written by +Jeff Bertolucci looking forward to the MIT Big Data Symposium he looks at whether there is a need for a Chief Data Officer in organisations now.

I'm sure as technology moves this is a developing role, but there is definitely a role for this type or person.  I'd love to do this kind of role sometime in the future.

Sunday 22 June 2014

Data sources and their effect on Data Strategy

+Enrique Dans wrote this article in +Forbes discussing the many sources of data we now have and the need for a strategy to handle all of that data.

I agree with him that we should not just think of traditional sources of information as there are so many others now that could and should be used to provide intelligence.

Big Data Poster

From +Vincent Granville originally from +CTOvision.com this poster should tell you everything you need to know about Big Data in a single poster.

Saturday 21 June 2014

Top 10 Data Analysis tools for Business

This post from +KDnuggets list them.

All a free.  Some I have used personally - some I have not.  A good list to have so you can pick a new one that may be better than what you currently use.

Revised guidelines and best practices for using SQL Server in Microsoft Azure Virtual Machine

This post in the blog for the Microsoft Customer Service and Support teams tells you where to find them so you can make any necessary changes.

This should help if you are experiencing I/O performance issues.

Friday 20 June 2014

Who are the Big Data players?

This blog post by +SQream Technologies looks at who really are the big players in the world of Big Data.

In it they look at the 5 areas that are making the most use of Big Data so far - and it is no surprise that they are all areas that have large amounts of data so can maximise on the benefits.

If you are ever at a Big Data event I suggest you go investigate their GPU as it sounds exciting.


Data integration essential for maximising business value and performance with Big Data

This article by +Attunity, Inc. discusses the fact that one of the key problems with all the data that organisations now have is that they need to be integrated in order to provide some value.

I would so recommend looking at this page on the +Talend  website as their open source data integration software can be used with Hadoop.

This Whitepaper from +TDWI  looks at it from the Agile Information and Integration Governance (IIG) angle.

What all these links tell us is that data needs to be correctly integrated to provide value and that you will be more successful the greater your level of IIG maturity.

Thursday 19 June 2014

IBM sponsored White Paper on optimising data use by running on DB2

This White Paper gives pointers on how to meet real-world challenges with performance, availability
and unprecedented affordability from IBM's DB2.

I think the Scenarios are interesting and the summary in the Resources give you insights that could be useful for choosing whatever DBMS.

The secret to using Big Data for Recruitment

In this blog post shared by +JobsTheWord they explain how they use Big Data and Data Profiling to provide a unique service to their customers.

Wednesday 18 June 2014

Oracle to buy Micros Systems

This article from +Bloomberg News  talks about the deal.  It will add the Oracle's retail presence in the e-commerce and cloud side of things.

Successfully leveraging Big Data via the 4 V's

This +TDWI white paper sponsored by +IBM Big Data & Analytics looks are the “four Vs” that distinguish big data: variety, volume, velocity and veracity.

If you can ignore the push towards their product then it contains some useful concepts and ideas.

What works in Big Data

This +TDWI  whitepaper contains some great recommendations, tips, advice, best practices and case studies - a great place to start reading about it.

It also has links to some great white papers around the subjects it is talking about - a great starting point.

Tuesday 17 June 2014

Data Governance, Compliance and Security presentation

This is a presentation given by +Joe Caserta which takes the audience through some key concepts for Big Data and shows the need for Data Governance, Data Compliance and Data Security are still needed, they just might be delivered in a way you are not familiar with or using tools or terminology you are familiar with.

Automated Data Warehouse Development

There are tools out there that enable quick Data Warehouse development.  One such tool is by  +WhereScape Data Warehousing which combines their 3D tool which helps with the planning and their RED tool which enables a supportable Data Warehouse to be built fast.  You can learn more about both products here.

+TDWI currently has a Q&A with +Mark Budzinski which talks about these products.

I think these are great but you need the knowledge of what the data is so this does not replace MDM which I think is still needed to document the data.

Monday 16 June 2014

Big Data Strategy

The following excerpt is from Think Bigger: Developing a Successful Big Data Strategy for Your Business, the new book by big data strategist +Mark van Rijmenam via +Data Informed.

I've certainly seen 4 from the except for real in an organisation and I think it helped to give an edge to customers.


Google's Kubernetes is open source for cloud computing

As shown in this article in Wired and written by +Cade Metz Google has a new open source offering called Kubernetes which enables online software to be run across many machines.  This has the potential to make huge in-roads to the Cloud Computing world.

Sunday 15 June 2014

How Amazon uses Big Data and their AWS public datasets

This article by +Bernard Marr contains some useful insights into how they use big data to boost their performance.

He mentions Amazon Web Services - if you are interested in their public datasets you can access them via here.

I've accessed some of them myself using R and SAS and they are very useful.

Saturday 14 June 2014

Buried in Big Data from Bloomberg

In this area from Bloomberg there are three sections:


  • Big Data's 'Dumping Grounds': How Hoarding Hinders Startups to Spy Agencies
  • Retailers Use Big Data to Turn You Into a Big Spender
  • Big Data Is Really About Small Things
All contain good information and views worth reading.

Friday 13 June 2014

Retailers use big data to turn you into a Big Spender

In this news item from +Bloomberg News they look at several retailers and how their use of big data have provided opportunities for increased sales.

Model driven architectures and BI - a perfect pairing

In this article on +TDWI by Bob Potter he explains how divorcing the business logic from the underlying platform as it then enables accelerated BI use.

I have to agree with him - I've seen reporting where complicated logic is implemented in the building of data marts at the database level.  It should be easier with the increased sophistication of BI tools, to leave the data untouched and put any business logic in the BI tool.  This should mean that the data is left to IT and the business logic is left to the business people (where it should be).

Thursday 12 June 2014

50 selected papers and 10 free video tutorials on Data Mining

This list on Big Data Made Simple is a good place to start with reading about all things Data Mining.

They also have a list of free Data Mining tutorial videos here.

Wednesday 11 June 2014

HADOOP - 5 point roadmap for success

 +TDWI have published an article by +Jorge A Lopez containing a 5 point roadmap for HADOOP success. In it he points out that organisations need a longer term roadmap than they may be used to creating for ensuring that this is done successfully.

Jorge also produced an article for them on the business case for HADOOP which is worth reading if you haven't already done so.

He makes and interesting point when he goes through his #2 - Develop an active archive.  Data Warehouses or any reporting database is not an archive but also is an expensive archive if you were using it as that.  HADOOP gives the possibility of flexibility to use cheaper methods to store and process large amounts of data for reporting.

Tuesday 10 June 2014

Teradata 15 and Query Grid

This +TDWI  article by Stephen Swoyer goes though Teradata 15 and it's Query Grid functionality.  In the article he points out that Query Grid is not going to be around until Q3 of this year.

It sounds great and definitely the way to go with big data efficiency is to push the processing down to where the data is.  I just wonder whether this is going to be a multi phase implementation as it's going to take time to put all this into place.

Reset on Big Data or miss the big change

This article on +Information Management  by Brian Hopkins goes through some of the ways Big Data is affecting business.

The interesting thing is that it really has the same problems that data had before. It's not the quantity it's what you have and how you use it.  There is little point in doing a large big data project if the end result is a large database that doesn't help the business and doesn't help answer the questions that the business should be asking.

Monday 9 June 2014

IDC: Unmet demand requires more than HADOOP

In this article on Information Management by Bob Violino looks at the latest IDC report.

I agree that the data technology has to be decided before a firm tool and strategy for data management can be set.  That way wasted time and software purchasing is minimised.


40 Maps that explain the Internet

A walk through the history and development of the internet in maps by +Timothy Lee at VOX.COM.

Fascinating to think back and analyse what you were doing at those dates.  Wonder if we can all think back to when we first found the internet?

Sunday 8 June 2014

DATA Act project management impacts

Interesting article by +Dennis D. McDonald  on the DATA Act Project \management impact.

I agree that how an organisation chooses to implement their obligations will have an impact on the time-scale and cost involved.

How to develop a Big Data Stragegy

This article by Mohit Sharma on PromptCloud lists some steps to doing that.

I would suggest care be taken especially on Finding Data Sources and Involve the Business.  I have seen too many IT led projects that do not have the correct level of business involvement - that means it is bound to fail from the start.

Saturday 7 June 2014

Erwin supporting DaaS in the Cloud

In this White Paper in Information Management by Nuccio Piscopo he goes through how Erwin and Model Manager can together:

1. Ability to create models in heterogeneous database environments;
2. Ability to interface with business intelligence products;
3. Support of diverse cloud architectures (shared-nothing, shared-disk, hybrid);
4. Functioning irrespective of replication over large geographic distance or database parallelisation or time they are executed.

Having used Both Erwin and ModelManager together myself they have a strength of enabling shared models across different platforms with clear check-in/check-out across multiple persons changing the model at different levels at any time.

Friday 6 June 2014

3 weird sources of free data

This article from Daniel Price in CloudTweaks list 3 publicly available datasets that are a little unusual.

SQL on HADOOP?

In this ZDNet article by Toby Wolpe  he discusses adding SQL to HADOOP via an interview with Axtian (Formerly Ingres)

It's not really pure big data but it would open it up for a larger mass market.

Thursday 5 June 2014

Google says half of email is sent unencrypted

Post today from +Naked Security

Worrying figure even though I'm sure most of my emails are not worth reading.

A random walk in Finance

In this article on the Information Management website by Steve Miller he goes through whether something is truly random.

I agree with him completely.  You need to separate the data into a training or test group or do cross-validation to be really sure it is a pattern and not random.

Wednesday 4 June 2014

As data explodes, so do old ways of doing business

In this article on ZDNet by Brian Hopkins he discusses how business needs to change in order to survive in the world of Big Data.

He is quite correct - in order to get a competitive advantage or even to reduce cos you need to use all the data you have and take knowledge or information from it in order to provide an advantage to your business.

Data Steward/Data Manager/Subject Matter Expert - what are they and what is the difference between them?

In this article by Malcolm Chisholm on Information Management he explains the difference between the three oles and why it is important to keep them separate.


Tuesday 3 June 2014

Big Data - why data governance and data management are so important with even more volume and sources of data

In this Aberdeen research brief from TDWI which is sponsored by IBM, they make the point that with the increase in the number data sources and the increase in the size of those implementations more care needs to be taken with data governance to ensure that the data is correct and trustworthy.

In this white paper by EMA from Information Management they make the point that in this world of multiple data sources with more immediate data availability, Data Management is even more needed not less.  That then makes Data Modelling more complex in order to be inclusive.

It makes sense to have more rigour in data governance before you spend time and money on implementing a database otherwise you will not get the ROI (return on investment) you expect.  You may also find that you make bad decisions based on incorrect data which could cost you even more. Add to that the multiple data sources for the same information it is crucial that the same definition is used for the same piece of information. A customer should be a customer no matter where the data came from. This means a data model should be accurate to those definitions also at the conceptual and logical level, with the physical level showing the real implementation.


Monday 2 June 2014

MapReduce - the concept behind Big Data. Links to resources and a summary of what it actually is.

Here is the link to the Google Research Publication on MapReduce.
Link to the Wikipedia page on MapReduce.
There is also an extensive set of documentation about MapReduce here.

Essentially the data has a MAP function (filter and sort) performed on it then a REDUCE (summary) function performed on the result of the MAP function.  It uses parallel processing to do these  (a bit like a multi level tree structure) controlled by a framework which makes it quick, increases fault tolerance and reduces redundancy.  This is very similar to the structure of queries in Teradata using AMPS where one AMP controls the entire query.

The possibilities of this approach to me are exciting as it enables us to process large amounts of data in a short amount of time and give an end result that is worthwhile.



Sunday 1 June 2014

TDWI 7 Tips for Unified Master Data Management

TDWI has produced a Checklist containing 7 Tips for Unified Master Data Management here which is sponsored by SAS.

They perceive Unified Master Data Management (UDM) as coordinated Data Management across the whole organisation on both the technical and business level at the same time.

I completely agree with #5 - regularly apply data quality functions to reference and master data.  If reference or master data is not correct or of a good quality then any information linking to, or any report using it, runs the risk of being complete garbage.