Is poor data governance and slow data prep really a problem? It is when it erodes confidence in the quality of your data.
I completely agree with Lindsay - you need to have complete trust in your data in order to make business decisions based on it. She has some good suggestions for areas to comcentrate on.
This is a blog containing data related news and information that I find interesting or relevant. Links are given to original sites containing source information for which I can take no responsibility. Any opinion expressed is my own.
Showing posts with label DATA ACCURACY. Show all posts
Showing posts with label DATA ACCURACY. Show all posts
Tuesday, 26 December 2017
Saturday, 27 May 2017
The information management challenges of data monetisation by David M. Raab via @infomgmt
Organisations must adjust to blurred lines between known and anonymous customer identity information.
I agree with this article - nowadays the data available will contain data that can but also cannot be connected to known data. All organisations need to have a strategy on this data . I recall in the past the unknown data was ignored and deleted. Nowadays this is missing a valuable source of information and potential customers. My advice would be to have default values for all fields so that you do not have any empty fields and do not find that joins will not work or have to be outer ones. That way you can use everything as much as you can.
I agree with this article - nowadays the data available will contain data that can but also cannot be connected to known data. All organisations need to have a strategy on this data . I recall in the past the unknown data was ignored and deleted. Nowadays this is missing a valuable source of information and potential customers. My advice would be to have default values for all fields so that you do not have any empty fields and do not find that joins will not work or have to be outer ones. That way you can use everything as much as you can.
Thursday, 22 September 2016
Beware of the gaps in Big Data by Edd Gent via @TheIET
As we entrust ever more of our lives to ‘big data’, how can we protect against the gaps and mistaken assumptions used to handle the information?
Great article pointing out a lot of things that can go wrong with relation to the loading of and use of Big Data (or any data really).
Great article pointing out a lot of things that can go wrong with relation to the loading of and use of Big Data (or any data really).
Monday, 18 July 2016
Balancing the Demands of Big Data With Those Of Accurate Data by Mike Azevedo via @infomgmt
But what happens if you need to handle millions of these types of transactions at once? To ask it another way, what happens when the universes of high-value transactions and Big Data collide?
This is more about new databases having more mainstream functionality around transactions and integrity.
This is more about new databases having more mainstream functionality around transactions and integrity.
Tuesday, 9 February 2016
The failure to replicate scientific findings by Kaiser Fung
The failure to replicate scientific findings by @junkchart (Kaiser Fung) - Scientific reproducibility has been much discussed of late. Quartz goes so far as to say "Nearly all of our medical research is wrong." Much of this lack of reproducibility comes from cherry-picking statistically significant results from a larger batch of experimental results, a practice that has become known as P-hacking. If you find yourself falling into this trap, make sure to apply corrections to avoid it.
I love Randall's description of the problem.
I love Randall's description of the problem.
Tuesday, 24 November 2015
Goodbye Big Data, Hello Thick Data via @GreenBook @scribbett
Big Data is here to stay, but it’s only half the job - Thick Data fills the gaps and enables truly people-shaped or human-centred development and visceral business.
Interesting blog by Stephen Cribbett on Green Book I can definitely see the need for data to be more party focussed - that's how to get the best value from it for sure.
Interesting blog by Stephen Cribbett on Green Book I can definitely see the need for data to be more party focussed - that's how to get the best value from it for sure.
Thursday, 5 November 2015
WEBINAR: Best Practices for Delivering End-user Governed Data - 11 November 2015
Best Practices for Delivering End-user Governed Data
Gain a competitive advantage and improve data quality,
all while managing risk with Forrester Analyst, Michele Goetz
It is no longer enough for companies to integrate, blend and optimize their data for analytics. Today, it is equally important to ensure that data is delivered to end-users in a highly governed way. But, many organizations are scrambling to just keep up with evolving business demands for data and are stretched for time to focus on quality and security.
Based a Pentaho-commissioned study of 164 business and IT leaders, guest Forrester Analyst, Michele Goetz will discuss what is required to ensure data quality, accuracy, consistency, and ultimately usability of data with the right mix of data governance, technology and process. Join our webinar on 11/11 at 8 am PST/ 11 am EST to hear about the four key factors that your business needs to consider to create a streamlined process for delivering data to your end-users, including:
- Managing the various data sources involved
- Maintaining data quality and properly securing data
- Keeping with the needs of your business
- Collaborating with key business stakeholders cross-functionally
Guest Speaker: Michele Goetz, Forrester Analyst
Moderator: Chuck Yarbrough, Pentaho Director of Product Marketing
Date: Wednesday, November 11 at 8 am PST/ 11 am EST
Register here
Wednesday, 2 September 2015
A “bottom-up” approach to data unification via @radar
How Toyota used machine learning plus expert sourcing to unify customer data at scale. Great write-up from O'Reilly Radar.
I'm sure we have all struggled to reduce data from many sources to have just one record per customer across the data. It's interesting to see how someone else has tried to solve the problem.
I'm sure we have all struggled to reduce data from many sources to have just one record per customer across the data. It's interesting to see how someone else has tried to solve the problem.
Friday, 14 August 2015
How Data Management Best Practices can enhance the Quality of your Data? via @habiledata
Blog discussing how Data Management can affect your Data Quality.
I completely agree that a vast amount of money is wasted by making business decisions based on bad data.
I completely agree that a vast amount of money is wasted by making business decisions based on bad data.
Thursday, 23 July 2015
Consumers are ‘dirtying’ databases with false details via Call Week
People are deliberately giving brands false data about themselves to protect their privacy, and are ignoring brands’ efforts to empower them to take control of their data, according to a study of more than 2,400 UK consumers by research company Verve.
I have to say I have been one of those consumers because it was made so difficult to say I didn't want to be collected it was the only way I could see to stop it.
I have to say I have been one of those consumers because it was made so difficult to say I didn't want to be collected it was the only way I could see to stop it.
Friday, 12 September 2014
How to cope with the big data variety problem
This article on +TechRepublic discusses a combination of machine learning and advanced algorithms that seek confidence levels in data quality via the task of cross-referencing and connecting data from a variety of sources.
Interesting thoughts on the complex web of data we now weave in our thirst for information.
Interesting thoughts on the complex web of data we now weave in our thirst for information.
Wednesday, 21 May 2014
Data Capture for turning Documents into Data
In this White Paper file from Information Management Nick Geddes from AIIM
I do agree that if paper documents are being used then capturing them and extracting information would be a good move. However it would be better for data content and quality,speed, plus reduced costs and the environment if B2B solutions were used with XML messages.
I do agree that if paper documents are being used then capturing them and extracting information would be a good move. However it would be better for data content and quality,speed, plus reduced costs and the environment if B2B solutions were used with XML messages.
Thursday, 15 May 2014
Recent data articles and conclusions on them
If you look at two different recent articles about data (one from the NY Times and one from Information Management) they are both making a similar point. Data can interesting and can be very useful to make some kind of conclusion, but you have to think carefully about what you are trying to show and if it is reasonable to make that conclusion.
New York Times article - How not to be misled by the jobs report here from The Upshot.
Information Management Blog - Is Fitness Data not fit for the purpose of use? here.
I can see in the NY Times article that many different conclusions can be made with the data, noise needs to be carefully removed from the data to be able to come to a better line fit using R². With the Fitness Data care needs to be taken to ensure that there is enough data to come to any sound conclusion at all.
New York Times article - How not to be misled by the jobs report here from The Upshot.
Information Management Blog - Is Fitness Data not fit for the purpose of use? here.
I can see in the NY Times article that many different conclusions can be made with the data, noise needs to be carefully removed from the data to be able to come to a better line fit using R². With the Fitness Data care needs to be taken to ensure that there is enough data to come to any sound conclusion at all.
Tuesday, 13 May 2014
Data Integrity/Credibility - what's the difference and are they important?
Data integrity relates to the accuracy and availability of data. However it mostly relates to the fact that the data matches the source (i.e. is not modified) and it therefore has integrity. A definition can be seen on the wiseGEEK website here.
In high level terms data credibility relates to how accurate, correct or believable your data is. This is a very important area to investigate if you want to use that data to generated something - for example a marketing campaign either by post, email or social media. There is an associated cost with doing these things and if your data was not accurate you could be sending information to the wrong person or address. This article by Malcolm Chisholm in Information Management discusses it here.
I think it is very important to have both data integrity and data confidence rated and measured on any data in a system or data warehouse. This should be recorded in the metadata along with any data mapping and other important information.
In high level terms data credibility relates to how accurate, correct or believable your data is. This is a very important area to investigate if you want to use that data to generated something - for example a marketing campaign either by post, email or social media. There is an associated cost with doing these things and if your data was not accurate you could be sending information to the wrong person or address. This article by Malcolm Chisholm in Information Management discusses it here.
I think it is very important to have both data integrity and data confidence rated and measured on any data in a system or data warehouse. This should be recorded in the metadata along with any data mapping and other important information.
Subscribe to:
Posts (Atom)