Showing posts with label DATA CLEANSING. Show all posts
Showing posts with label DATA CLEANSING. Show all posts

Wednesday, 18 May 2022

Data Cleaning Toolbox by Olivia Tanuwidjaja via @TDataScience

Compiling the aspects to look out for before analyzing your data.

This is a great checklist and I think could help you not to forget a step or even do something in the wrong order which could also produce wrong results.

Friday, 17 July 2020

Data Prep Still Dominates Data Scientists’ Time, Survey Finds by Alex Woodie via @datanami

Data scientists spend about 45% of their time on data preparation tasks, including loading and cleaning data, according to a survey of data scientists conducted by Anaconda. The company also analyzed the gap between what data scientists learn as students, and what the enterprises demand.

Yes, it does take time, but if you prepare your data right then the results will be good.

Saturday, 19 May 2018

Using Big Data Analytics To Improve Production by Rob Consoli via @MBTwebsite

Manufacturing remains a critically important part of the world’s economic engine, but the roles it plays in advanced and developing economies has shifted dramatically. In developing countries, manufacturing operations deliver unprecedented new employment opportunities that are transforming societies.

I definitely think that manufacturing is going to be improved greatly as soon as there is a larger use of IoT and the big data and analytics is covering far more of the manufacturing process. Hopefully the efficiencies can be vastly improved. Anything that can be automated is good - I remember having to take snapshots of data from a source system and importing it into a spreadsheet so I could use sheets, pivot tables, etc to work out where the largest delay was in the whole passage of orders from input to delivery to the customer - took hours and the benefit was reduced just because of the time to produce and timing.

Saturday, 25 November 2017

Dirty Data Is OK, How You Cleanse It Matters by Chirag Shivalker via @DZone

It has been an unsolved mystery for companies if they should get their data cleansed first to opt for data analytics or if they should opt for data analytics to conclude whether their data is dirty.

There are some really good points in this article.  I cannot emphasise enough the single source of truth point.  We must all have worked for organisations where department A's figures don't match department B's.  You cannot run an organisation if the numbers in your reporting don't match, and even worse you have no idea why they don't match. You need data management, agreed definitions for data, and just the one source of the truth across the entire company.

Thursday, 21 September 2017

My Neural Network isn't working! What should I do? by/via @anorangeduck

11 things you probably screwed up and how to fix them.

This is a vital list of things to check and you should bookmark it so you can refer to it in the future.

Tuesday, 12 September 2017

Building a data science team for the enterprise by Madison Moore via @sdtimes

Data scientists are no magicians, but they are in high demand.

Researchers and analysts in this space recognise the diversity and explosion of Big Data, but the only way enterprises are going to be able to prepare for the future of Big Data is with a data science team capable of working with dirty data, complex problems, and open-source languages, experts in the field say.

Nice look at this increasingly common problem.

Wednesday, 12 July 2017

5 Ways Businesses Can Cultivate a Data-Driven Culture by @Ronald_vanLoon via @LinkedIn

The pressure on organisations to make accurate and timely business decisions has turned data into an important strategic asset for businesses. In today’s dynamic marketplace, a business's ability to use data to identify challenges, spot opportunities, and adapt to change with agility is critical to its survival and long-term success.

Some interesting points on what to look out for.  As I often say, you need to make sure the data is clean, tidy and well understood.  If you can't guarantee that the data is up to data and clean I see little point in collecting it let alone using it.  You need to be able to guarantee it is clean in order to guarantee the results of any analysis or reporting is reliable.

Thursday, 15 September 2016

Separating the Good from the Bad in the World of Big Data by Karen Peters via @infomgmt

As our world becomes more connected and the amount of data that is available increases, companies must make sure they are developing their own processes for collecting the best data.

Whilst this may read as cleaning data and only taking good data I would like to suggest a caution - you need integrity between your reporting and source so you cannot modify or delete data from your reporting or analytics as it will be incorrect. You have to develop a strategy to handle incorrect data as well as doing more to make sure the data is correct in the first place.   This also reminds me of the problems with reporting from a Data Warehouse and so in this aspect I don't believe that what we do with Big Data will be so different from this standpoint.

Monday, 1 August 2016

IT Pros under Pressure to Accelerate Information-Driven Decision-Making by Bob Violino via @infomgmt

Data quality issues continue to plague the large majority of businesses, according to a new report from research firm TDWI.

I've seen for myself that if data quality isn't a key part of it's point of input/creation you will fight a losing battle to ensure it.

Sunday, 24 July 2016

Free Alternatives to Excel for Data Cleaning by Lee Baker via @DataScienceCtrl

Pretty much every data rookie starts with Excel. It is a wonderful program for storing, cleaning and analysing (yes, you read that correctly) your data.

Very interesting.

Saturday, 16 April 2016

The Data Quality Tipping Point via @Datafloq

The Data Quality Tipping Point by Martin Doyle via +Datafloq - Whatever your business sector, data is your most valuable asset. Along with the machinery and stock you hold, data and insights hold the key to profit and growth. It can reveal problems in processes, drive productivity among your staff and ensure everyone is ‘singing from the same hymn sheet’. However, like any asset, you need to invest in maintenance and management. Data that is not prioritised and nurtured will cause more problems than it solves. But how much do you need to spend to achieve a healthy ROI?

I agree - you need to do what it takes to only use good quality data for any decisions.

Monday, 4 January 2016

Will 2016 be the Year you Clean up your Dirty Data? via @dqmartindoyle

Will 2016 be the Year you Clean up your Dirty Data? via @dqmartindoyle on +Datafloq - in this article Martin Doyle says it feels like forever, we've been warning about the dangers of low quality data. Our warnings have been reinforced and echoed by some of the world’s biggest think tanks. However, despite this, some organisations still haven't acted to improve the quality of their data.  Will 2016 be the year that organisations will clean their dirty data?

I have to say the only place to fix this is at source.  Limit the allowed values in data fields, Use drop down lists to limit values, use free format text sparingly.  Bad data pollutes any reporting or intelligence you look to gain from data and removing bad data from reporting stops it matching the system of record that it came from (which is almost as bad).

Monday, 7 September 2015

The Importance of Data Cleansing and Data Maintenance via @Datafloq

There are two aspects to data quality improvement. Data cleansing is the one-off process of tackling the errors within the database, ensuring retrospective anomalies are automatically located and removed. Data maintenance describes ongoing correction and verification – the process of continual improvement and regular checks. But, which process is the most important?

Great article from Datafloq.  I completely agree - if you don't clean and maintain your data then you will et garbage results fro it.

Thursday, 23 July 2015

Consumers are ‘dirtying’ databases with false details via Call Week

People are deliberately giving brands false data about themselves to protect their privacy, and are ignoring brands’ efforts to empower them to take control of their data, according to a study of more than 2,400 UK consumers by research company Verve.

I have to say I have been one of those consumers because it was made so difficult to say I didn't want to be collected it was the only way I could see to stop it.

Friday, 10 July 2015

Data Cleansing 101: Why It’s Important in Business

Keep your business database in perfect shape by employing efficient data cleansing processes. Interesting blog from Infinit Datum which points out some things that should be obvious but is often not.