Showing posts with label DATA CLEANING. Show all posts
Showing posts with label DATA CLEANING. Show all posts

Wednesday, 18 May 2022

Data Cleaning Toolbox by Olivia Tanuwidjaja via @TDataScience

Compiling the aspects to look out for before analyzing your data.

This is a great checklist and I think could help you not to forget a step or even do something in the wrong order which could also produce wrong results.

Wednesday, 15 December 2021

Mito: One of the Coolest Python Libraries You Have Ever Seen by Ismael Araujo via @TDataScience

Here is Ismael Araujo's take on this cool Python library and why you should give it a try.

It does look interesting, saves so much time and I certainly want to play more with it as I already can see how useful it is but I'm sure I could achieve much more if I understood it better.

Friday, 17 July 2020

Data Prep Still Dominates Data Scientists’ Time, Survey Finds by Alex Woodie via @datanami

Data scientists spend about 45% of their time on data preparation tasks, including loading and cleaning data, according to a survey of data scientists conducted by Anaconda. The company also analyzed the gap between what data scientists learn as students, and what the enterprises demand.

Yes, it does take time, but if you prepare your data right then the results will be good.

Friday, 13 March 2020

The Ultimate Beginner’s Guide to Data Scraping, Cleaning, and Visualisation by @annebonnerdata via @TDataScience

How to take your model from unremarkable to amazing simply by cleaning and preprocessing your data.

Anne is right - if you get the underlying data right, the results and process to get to there is so much easier.  That is NOT to say you do something to skew the results, just that you make sure that it is not a case of garbage in/garbage ou.

Friday, 12 October 2018

5 Data Science Projects That Will Get You Hired in 2018 by John Sullivan via @kdnuggets

A portfolio of real-world projects is the best way to break into data science. This article highlights the 5 types of projects that will help land you a job and improve your career.

As one of the comments on the article points out these are skills that you need to be able to show. My suggestion is that you use Kaggle to provide a project or at least the data for it., do the things in this as part of a project, and store the code and results on Github so that it can easily be seen.

Sunday, 24 June 2018

SLIDESHOW: 7 top challenges to working with data by David Weldon via @infomgmt

Data pros are dealing with a skyrocketing amount of data, created and gathered by ever-more devices. Here are the top challenges this is creating, according to a new study by Nexla.

From my own perspective these are a good list of pain points to the use of data. I would add to this  list:

1.. Data Sources - do you know the best place to get your data from - there could be better alternatives do get the data from.

2. System of Record - related to 1. make sure you understand where your data really comes from and if the data is clean and pure of has been altered in some way.

3. Change control - I've been using a systems data to feed in some of the data I was using, but they have missed it in their change control and I've suddenly had different or no data arrive.

4.  Data Management - are fields with the same name really the same?

Sunday, 13 May 2018

Datasets for data cleaning practice by/via @rctatman

Here's a collection of datasets for data cleaning practice, including tips on what needs to be done or fixed in order for it to fit easily into a data analysis pipeline.

This is an incredibly useful resource and should be used as I think we all could do with practice.

Wednesday, 12 July 2017

5 Ways Businesses Can Cultivate a Data-Driven Culture by @Ronald_vanLoon via @LinkedIn

The pressure on organisations to make accurate and timely business decisions has turned data into an important strategic asset for businesses. In today’s dynamic marketplace, a business's ability to use data to identify challenges, spot opportunities, and adapt to change with agility is critical to its survival and long-term success.

Some interesting points on what to look out for.  As I often say, you need to make sure the data is clean, tidy and well understood.  If you can't guarantee that the data is up to data and clean I see little point in collecting it let alone using it.  You need to be able to guarantee it is clean in order to guarantee the results of any analysis or reporting is reliable.

Wednesday, 14 September 2016

How the Bureau of Labor Statistics Analyzes Data by Brian McDonough via @infomgmt

In an era of big data, the bureau increasingly relies on the Internet, databases and analytics to produce the reports that government officials and business leaders use to gauge the health of the economy and labor force.

Really interesting and good to understand the process they follow.  Please note this is a 2 page article.

Monday, 1 August 2016

IT Pros under Pressure to Accelerate Information-Driven Decision-Making by Bob Violino via @infomgmt

Data quality issues continue to plague the large majority of businesses, according to a new report from research firm TDWI.

I've seen for myself that if data quality isn't a key part of it's point of input/creation you will fight a losing battle to ensure it.

Sunday, 24 July 2016

Free Alternatives to Excel for Data Cleaning by Lee Baker via @DataScienceCtrl

Pretty much every data rookie starts with Excel. It is a wonderful program for storing, cleaning and analysing (yes, you read that correctly) your data.

Very interesting.

Saturday, 16 April 2016

The Data Quality Tipping Point via @Datafloq

The Data Quality Tipping Point by Martin Doyle via +Datafloq - Whatever your business sector, data is your most valuable asset. Along with the machinery and stock you hold, data and insights hold the key to profit and growth. It can reveal problems in processes, drive productivity among your staff and ensure everyone is ‘singing from the same hymn sheet’. However, like any asset, you need to invest in maintenance and management. Data that is not prioritised and nurtured will cause more problems than it solves. But how much do you need to spend to achieve a healthy ROI?

I agree - you need to do what it takes to only use good quality data for any decisions.

Monday, 4 January 2016

Will 2016 be the Year you Clean up your Dirty Data? via @dqmartindoyle

Will 2016 be the Year you Clean up your Dirty Data? via @dqmartindoyle on +Datafloq - in this article Martin Doyle says it feels like forever, we've been warning about the dangers of low quality data. Our warnings have been reinforced and echoed by some of the world’s biggest think tanks. However, despite this, some organisations still haven't acted to improve the quality of their data.  Will 2016 be the year that organisations will clean their dirty data?

I have to say the only place to fix this is at source.  Limit the allowed values in data fields, Use drop down lists to limit values, use free format text sparingly.  Bad data pollutes any reporting or intelligence you look to gain from data and removing bad data from reporting stops it matching the system of record that it came from (which is almost as bad).

Monday, 7 September 2015

The Importance of Data Cleansing and Data Maintenance via @Datafloq

There are two aspects to data quality improvement. Data cleansing is the one-off process of tackling the errors within the database, ensuring retrospective anomalies are automatically located and removed. Data maintenance describes ongoing correction and verification – the process of continual improvement and regular checks. But, which process is the most important?

Great article from Datafloq.  I completely agree - if you don't clean and maintain your data then you will et garbage results fro it.

Friday, 10 July 2015

Data Cleansing 101: Why It’s Important in Business

Keep your business database in perfect shape by employing efficient data cleansing processes. Interesting blog from Infinit Datum which points out some things that should be obvious but is often not.

Sunday, 14 June 2015

Data Cleansing 101: Why It’s Important in Business

Keep your business database in perfect shape by employing efficient data cleansing processes.

Something to remind us how important it is to use data cleansing posted by Infinit Datum

Wednesday, 18 February 2015

Free eBook - Practical Data Cleaning

19 Essential Tips to Scrub Your Dirty Data

Collecting and cleaning data can be very time consuming, but following a few simple rules can make the process much less painful.

Get this free e-book by +Lee Baker here