Saturday 31 May 2014

On-demand Analysis vs Real-time Data and when it is needed

On-demand analysis is increasingly needed as we strive to get a competitive edge over rivals or adhere to auditing or regulations.

In this TDWI White Paper Cirro describe scenarios for providing the ability to do on-demand analysis and the potential solutions and costs involved.

Cirro's Data Hub enables you to write a single query that then splits into queries per data source and then knit them back together.  A bit like in Teradata when a query is shared across all amps and then brought together by the controlling amp at the end.

Many people confuse on-demand analysis and real-time data in their mind.

Real-time data can be very expensive to provide for analysis and thought should be taken into if it is necessary.  This TDWI Article David Stodder discusses when it is the right time for using data real-time.  I think his point 3 is very important (Don't assume that application business rules can handle real time).

Friday 30 May 2014

T-SQL- how to find who owns a temp table

In this article on the SQL Performance website Aaron Bertrand explains for several versions of SQL SERVER how to work out ownership of a temp table.

Thursday 29 May 2014

T-SQL - user defined functions - 10 questions you were too shy to ask

In this post on Simple Talk written by Robert Sheldon he goes through some simple solutions to those questions (with code examples) on user defined functions.

I particularly like that a non-deterministic function can be used in a function which I had never realised but makes sense when you think about it.

Wednesday 28 May 2014

Informatica,Virtual Data Machines and Vibe

In this Information Management blog entry by Bruce Guptill he talks about Informatica and their view of MDM.

In this IM blog it mentions Vibe from Informatica.  This is what they describe as a Virtual Data Machine (VDM) where the instruction and specification (business logic) are stored separately from where it will be executed.  This will enable definitions to be standardised and implemented on any machine or platform. Their standard MDM is described as Multidomain MDM so I guess Vibe is just an extension of that idea.


Data protection for big data

In these two white papers they look at data protection from two different angles.  But the overall consensus is that is must be done, should be done properly and is a business necessity with all the data protection legislation worldwide. Big data should not make any difference to the necessity to do something about data protection, just that the method might be slightly different due to the size of that data.

In this TDWI white paper by IBM on data protection for big data they advocate putting data protection in place from the start which is sensible considering the cost of doing anything later is always higher.  Then in this Information Management article by Maria Aspan she discusses why banks still struggle with Big Data.  I have to agree that I suspect it is partially privacy concerns but also a large part of disparate systems as banks merged in the past.

Tuesday 27 May 2014

7 deadly sins of database design

In this white paper in Information Management sponsored by Embarcadero it goes through what they consider to be the 7 deadly sins of database design.

Whilst I do roughly agree with them on their 7 I would either add or update the list:

5.  Data quality can be implemented using alternatives to a foreign key or check constraint on the database. If you are following an object oriented approach to the data you can create a common method that ensures that quality that can be enforced without adding database objects that coulld slow down any update/insert into that table on the database.

8.  Changes in company documentation or modelling standards over time often result in mismatched levels or standards for each artefact.  It would be great if there were time allowed to update older documentation as standards change.  If that was impractical then any project plan touching those items should include time to update artefacts to the new standard.

Monday 26 May 2014

Spurious Correlations website

The Spurious Correlations website by Tyler Vigen is a great site for showing how you must think carefully before you link data together and show a correlation as you can to that for completely unrelated data.

The one on the front of his website at the time I wrote this entry shows US spending on science, space, and technology correlates with Suicides by hanging, strangulation and suffocation.  Of course it doesn't and shouldn't if you think about it, but they look like they do.


Data Governance

In this White Paper on the Information Management website sponsored by Rand Secure Data it explains in clear language what Data Governance is.

Having had to ensure that data is available for SOX reporting in the past it's an area that organisations must do properly.  My blog post on Data Archiving makes the point that a backup is NOT the same as an archiving and that is particularly important for Data Governance.

Wednesday 21 May 2014

Five models for Data Stewardship

In this White Paper on the Information Management website written by Jill Dyche & Analise Polsky for SAS they go through 5 different models for Data Stewardship.

I have experienced models 1,2 and 4 which all work in their own way.  Whilst I agree it IS a business function it has needed IT help often to provide the glue between those functions.

Data Capture for turning Documents into Data

In this White Paper file from Information Management Nick Geddes from AIIM

I do agree that if paper documents are being used then capturing them and extracting information would be a good move.  However it would be better for data content and quality,speed, plus reduced costs and the environment if B2B solutions were used with XML messages.

Data Quality 101

This white paper is on the Information Management website and is written by Elliot King.

I think that he is absolutely right that data will have been correct once, but that as it ages it is likely to become outdated.  Data either needs to be updated or treated with increasing caution as it ages.  I also like his thought processes in thinking that data profiling could be used as part of a data quality programme.


Tuesday 20 May 2014

Azure - Service Tiers and Performance Q&A from Technet

This very handy blog from Technet goes through the Service Tiers and Performance for Microsoft Azure.

I like the Active Geo-Replication which should make it easier to have a robust Disaster Recovery process, and that you can monitor the percentage of available CPU, memory, and read and write IO that is being consumed over time.

D-WAVE quantum computers story on the BBC News website

This BBC News article by Paul Rincon their Science Editor discusses Canadian company D-WAVE and their computers which they claim are quantum computers. There are a lot of sceptical people out there that want to see it proven, but there have been some notable buyers for their computers such as Google and NASA.

I think I'm with the sceptics as I's like to see a bit more proof that it works.

Predictive Analytics in the Cloud - Impact of Big Data

Inside this White Paper on the Information Management website found here, it looks at the opportunities for organizations to adopt cloud-based predictive analytic solutions based on results from research and responses to a survey on the subject.

It concludes that it is a changing and evolving field that businesses need to explore as part of their implementation of big data.

I have to agree with this conclusion.  Just as the big data field is changing so is the area of data analytics on that data and the tolls and methods you can use to achieve that.


Monday 19 May 2014

TDWI free download - Hadoop for Dummies by Robert D. Schneider

The PDF file of the book can be downloaded from here. It is sponsored by IBM.

The explanation of MapReduce is very clear as are the options for implementing it within your organisation.  A great place to start.

Sunday 18 May 2014

BI Architecture needs an XDW architecture and a willingness to change with technology to support new demands from Big Data

This article by Colin White and Claudia Imhoff discusses the need for an XDW architecture to support BI in the future.  When you then add to it the insights from this article by Jack Vaughan where he talks about how the rush to big data could potentially turn data management on it's head you can see that there are a lot of changes hitting the Data Management field at the same time.

It seems to me that we need to keep an open mind and be willing to adapt to whatever new technology hits us.  The trick is to work out what is here to stay and what is only transitory.

Saturday 17 May 2014

A checklist for Data Archiving

TDWI have published a handy checklist for Data Archiving for Big Data, Compliance and Analytics here.

It makes important points about planning for archiving as part of your documentation and classification of the data.  It also reinforces that a backup is NOT an archive.

Friday 16 May 2014

What is Cohort Analysis?

I've been reading about Cohort Analysis in a blog entry by CoolaData.

To my mind it is just grouping data (customers in the case of the blog entry) and then reporting on data that is then grouped by that data grouping.

Thursday 15 May 2014

Recent data articles and conclusions on them

If you look at two different recent articles about data (one from the NY Times and one from Information Management) they are both making a similar point.  Data can interesting and can be very useful to make some kind of conclusion, but you have to think carefully about what you are trying to show and if it is reasonable to make that conclusion.

New York Times article - How not to be misled by the jobs report here from The Upshot.

Information Management Blog - Is Fitness Data not fit for the purpose of use? here.

I can see in the NY Times article that many different conclusions can be made with the data, noise needs to be carefully removed from the data to be able to come to a better line fit using R².  With the Fitness Data care needs to be taken to ensure that there is enough data to come to any sound conclusion at all.


Wednesday 14 May 2014

Interesting White Paper on using CA Erwin Data Modeller and Microsoft SQL Azure to move data into the Cloud

This white paper can be found in Information Management here and is written by Nuccio Piscopo who used to work for CA Technologies.

It uses a case study and identifies key actions, requirements and practices that can support activities to help create a plan for successfully moving data to the Cloud.  It also provides some useful insights into how to achieve a secure implementation.

Tuesday 13 May 2014

Data Integrity/Credibility - what's the difference and are they important?

Data integrity relates to the accuracy and availability of data.  However it mostly relates to the fact that the data matches the source (i.e. is not modified) and it therefore has integrity.  A definition can be seen on the wiseGEEK website here.

In high level terms data credibility relates to how accurate, correct or believable your data is.  This is a very important area to investigate if you want to use that data to generated something - for example a marketing campaign either by post, email or social media.  There is an associated cost with doing these things and if your data was not accurate you could be sending information to the wrong person or address.  This article by Malcolm Chisholm in Information Management discusses it here.

I think it is very important to have both data integrity and data confidence rated and measured on any data in a system or data warehouse.  This should be recorded in the metadata along with any data mapping and other important information.

Monday 12 May 2014

Big Data 2.0

TDWI BI This Week has a very insightful article about Big Data 2.0 written by By Allen Bonde, Vice President of Product Marketing and Innovation, Actuate Corporation. You can read it here.

In it he points out that big data is very expensive and that some organisations may have spent a large amount of money on it and not realised all of the promised benefits - or maybe missed out on benefits they could have had if only then had realised they were possible.

I have to agree with him that it was very trendy very quickly and many organisations were in a rush to get something in the big data area into their IT systems mix, but in their rush to achieve this there are steps or opportunities that they may well have been missed.


Saturday 10 May 2014

Information Management article on 10 BI trends/expectations for 2014

The article can be read here.

I'm pleased to see that they think that there will be more on Operational BI.  Doing some focussed BI on low level data on a mirror of a production system can give some major wins particularly on the custome experience side of things.  If that can be mixed with the use of text or unstructured content I can see the possibility of some big wins.

It's sad that they still see failed BI projects being a challenge.  We need to learn to work out clear business requirements for any BI project, find the right technology to support that requirement.  Then implement it as a clear partnership between business and IT staff - after all there is an increasingly larger group of people that sit in the middle of those groups as time goes on.

Thursday 8 May 2014

TDWI Checklist for using analytics for text and unstructured content

There is a great checklist from TDWI and SAS for using analytics for text and unstructured content here.

Traditionally we tend to focus on data fields from databases like order management systems or product files.  But there is a wealth of text based data which is there for the taking and analysis if we only have the business need and the ability to extract it.

Imagine what you could do if you could load comments left by customers against your company Facebook page or to your Twitter feed into your customer service system and even allocate it to a customer record.  It would enable you to potentially give outstanding customer service without having many real people monitoring it and separately doing everything manually.

Wednesday 7 May 2014

Another problem for Dropbox with "secret" links

In Sophos's Naked Security blog they give details of the latest problem here.

I have to agree that it is wise to encrypt data BEFORE you send it to the cloud to make sure you know it has been done properly and who has the de-encryption key.

Something to think about with any intention to embark on any software as a service (SaaS), platform as a service (PaaS), or infrastructure as a service (IaaS)  implementation to utilise the benefits of cloud computing.

Tuesday 6 May 2014

TDWI White Paper - Hadoop as a Service

Today I have been reading the TDWI white paper about Hadoop as a service.

I think Hadoop as a Service (HaaS) is the way to go for cost effective IT Solution.

Read the White Paper and see what you think.

Saturday 3 May 2014

Oracle's Customer 2 Cloud programme helps move HCM and CRM solutions customers move to full or part cloud solutions

With the recent announcement of the Oracle Customer 2 Cloud programme any organisation using any of the Siebel, PeopleSoft, JD Edwards, and Oracle E-Business Suite product lines can move all or part of their seats from on-premises to cloud.  This should enable organisations to save money on their IT costs for running these systems.  Oracle does of course offer cloud computing including software as a service (SaaS), platform as a service (PaaS), and infrastructure as a service (IaaS) although I'm not sure you should put all your eggs in one basket and be tied quite so completely to one company.

Thursday 1 May 2014

Surprise as smaller Embarcadero buys CE ERWIN

It was a bit of a surprise to read that Embarcadero has bought CA Erwin considering that they are smaller. However I think that it fits in with their existing product mix of tools.  The deal includes the staff so I'm hoping there is a good level of continuity there.  I remember when Erwin was at Logic Works, then Platinum Technologies and finally CA Technologies so it will be interesting to see how the offering is going to be enhanced.

My guess is that Erwin will be integrated with ER/Studio and CONNECT to make a seamless link between the data model, metadata, information management and data governance.  That sounds pretty powerful to me in the area of Data Architecture (which fits in with Embarcadero's aim to be No1 in Data Architecture.

Teradata Database release 15 allows language choice

The latest release of Teradata allows a choice of language to query the database as well as the standard SQL.  The ones I'm most excited about are:


  • Java
  • Perl
  • Ruby
  • Python
  • R

Here is a link to the guide for TeradataR  - TeradataR 1.0.1 Guide

It also provides support for XML, JSON and weblog data.  This means that Java and most internet based code can use Teradata data seamlessly due to the object similarities.

Finally support is added for solid state drives.

Teradata Release 15 Teradata 15