Data: September 2015

Wednesday 30 September 2015

Behind American Express’ Machine Learning Effort via @infomgmt

American Express has plenty of data and analytics experience. But machine learning has allowed the company's scientists to harness the full power of data. Here are interviews discussing how with two members of their team.

Interesting insight from a real company's perspective in Information Management.

Turning Hadoop Into an Analytics Platform for the Enterprise via @infomgmt

How Hadoop can be used as a valuable business intelligence tool for enterprise organizations - with step-by-step considerations.

Interesting article from Information Management.

Tuesday 29 September 2015

Analytics and SaaS Fuel Enterprise Software Spending via @infomgmt

Worldwide spending on enterprise application software will grow to more than $201 billion by 2019, fuelled by SaaS solutions and analytics software, among other things, according to Gartner.

Interesting numbers in this article on Information Management.

All About “Power BI” Dashboards via @7wdata

Power BI dashboards present your latest data in one consolidated view, regardless of where the data lives. Here are some tips for working with dashboards that you can put into action right now.

Great tips and ideas.

Monday 28 September 2015

Overview of Analytics Industry in India (some notes and views) via @AnalyticsVidhya

Interesting set of facts and notes from Kunal Jain on Analytics Vidhya. This gives you some insight into how the industry is expanding and where it could be going.

This Is How You Build Products for the New Generation of 'Data Natives' via @firstround

They don't want fancy infographics, or even charts. Monica Rogati, VP of data at Jawbone, defines a data native as "someone who expects their world to not just be digital, but to be smart and to adjust immediately to their taste and habits."

Interesting article from First Round Review. Contains a lot of salient points that we all need to consider. All I ask is that we don't forget the older people who are not data natives.

Sunday 27 September 2015

Facebook ‘Likes’ Mean a Computer Knows You Better Than Your Mother via @wsjd

New research shows computers, using only 10 "likes," are better than co-workers at judging personalities. With 70 "likes" it can judge your personality better than your friends, and by 250 "likes" it can out-predict your spouse.

Very interesting blog from WSJ.D I recommend following the link in the article to look at the Proceedings of the National Academy of Sciences where they published their findings.

Understanding Analytics Maintenance via @infomgmt

Take a closer look at your software, and you'll understand the simultaneous needs of both maintenance and new application development.

A great blog from Information Management about a topic that is often overlooked.

Saturday 26 September 2015

Solving the Big Data “Abandonment” Problem via @infomgmt

Organizations often fail to see justifiable ROI from their big data investments because no clear blueprint exists for how to take a project from inception to completion with delivering value in mind. Here's how to overcome those challenges.

Great article from Information Management.

Where’s The Money in Data? (Part II) via @infomgmt

As we progress further in to the age of digitization many executives are asking “How do we use data to drive revenue?” or “Where’s the money in data?” Here's part two of the answer.

Read here on Information Management.

Friday 25 September 2015

Analytics of Republican Debate and network percolation via Wolfram Community

Wolfram Community forum discussion about Analytics of Republican Debate and network percolation.

Interesting to read and understand whether you are a US resident or not.

“One mass shooting per day” tells an important story that’s still wrong via @heap

How a metric is defined can shape an entire narrative. In this week's post, we explore how inconsistent definitions of terms like "mass shooting" or "unemployment rates" can hugely affect how statistics are interpreted

Great post on Heap by Jordana Cepelewicz

Thursday 24 September 2015

WEBINAR: 100 Years of Data Visualization – It’s Time to Stop Making the Same Mistakes - 29th September 2015

Overview

Title: 100 Years of Data Visualization – It’s Time to Stop Making the Same Mistakes

Date: Tuesday, September 29, 2015

Time: 09:00 AM Pacific Daylight Time

Duration: 1 hour

Summary

100 Years of Data Visualization – It’s Time to Stop Making the Same Mistakes

In 1914, New Yorker Willard Brinton wrote Graphic Methods for Presenting Facts, the first book on telling stories through data and communicating information visually. Today, the volume of data in the world is exponentially increasing, the tools to transform analysis into stories are evolving—and 100 years later, Brinton’s lessons still hold true.

In this next DSC webinar event, we will explore:

Visualization basics that withstand the test of time
The right charts for telling the right stories
Brinton’s checklist for communicating data

Speaker: Andy Cotgreave, Senior Technical Evangelist Manager --Tableau Software

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

15 Books every Data Scientist Should Read via @DataScienceCtrl @BernardMarr

A list of 15 physical books that Bernard Marr thinks every Data Scientist should read.

Great blog from Bernard on Data Science Central. Some more books to add to my Amazon wishlist for sure.

Wednesday 23 September 2015

WEBINAR: How LinkedIn Scales NoSQL for 300 Million+ Users - 28 September 2015

How LinkedIn Scales NoSQL for 300 Million+ Users

Complimentary Web Seminar
September 28, 2015
2 PM ET/11 AM PT
Brought to you by Information Management

LinkedIn has more than 300 million members around the world, generating massive amounts of user activity – all of which needs to be logged, monitored, and analyzed.

To successfully manage all of this data, their database needs to be fast and scale quickly on-demand and LinkedIn’s legacy storage systems proved difficult to keep up with such demands. A strong caching technology was crucial for LinkedIn to provide the performance its users required.

This Information Management webcast, featuring Shane Johnson of Couchbase, will dive intoLinkedIn’s use of a high-performing, scalable database that powers their metric visualization engine, ultimately delivering 400K operations/second on just four server nodes.

In this webinar, you’ll learn:

The six key requirements companies like LinkedIn look at when using a high-performance, low-cost caching technology
Advantages and disadvantages of common solutions, including Oracle Coherence and memcached
How to implement and deploy a caching technology within your existing environment

Featured Presenters:

10 tools and platforms for data preparation via @DataScienceCtrl

10 tools and platforms for preparing and joining disparate data.

Great blog by Zygimantas Jacikevicius on Data Science Central

5 Stages of Big Data Maturity (And What They Mean) via @infomgmt

Regardless of where your business lands on the big data maturity model, the key is to maximize potential at each stage and build on these tangible milestones.

Great article from Information Management.

Tuesday 22 September 2015

Where’s The Money in Data? (Part I) via @infomgmt

As we progress further in to the age of digitization many executives are asking “How do we use data to drive revenue?” or “Where’s the money in data?” Here's part one of the answer.

Read it here on Information Management.

Top 20 Data Science MOOCs via @KDnuggets

Looking out for the next data science MOOC? Checkout from this extensive list of MOOCs which covers all data science disciplines which are offered by leading organizations.

I would also recommend this specialisation on Coursera.

Monday 21 September 2015

24 Ultimate Data Scientists To Follow in the World Today via @AnalyticsVidhya

Here's a league of ultimate Data Scientists to follow, in the world today from team at Analytics Vidhya. See it here.

Big Data-as-a-Service Solutions Will Revolutionize Big Data via @Datafloq

Big Data services offered in the cloud is nothing new. In the past years we have seen many Big Data vendors that have created Big Data solutions that can be accessed via the web to crunch and analyse your data. Recently however, we have seen the rise of a new type of offering: Big Data-as-a-Service solutions. These solutions offer a new perspective on Big Data and can disrupt the Big Data industry.

Interesting article on Datafloq

Sunday 20 September 2015

30 tweetable quotes about Data Science via @manujeevaan

To inspire you, Manu has chosen some of his favourite tweetable quotes about Data Science to share.

Nice collection published here on Big Data Made Simple.

When your data science activities can send you to prison... via @DataScienceCtrl

Great blog post by Laetitia Van Cauwenberge on Data Science Central. I had no idea all these things were classified.

Algorithm Optimizes Big Data Clusters for Medical Breakthroughs via @infomgmt

Researchers at Rice University have developed a big data technique that could have a significant impact on healthcare through “clustering” and the ability to reveal information in complex sets of data like electronic health records.

Interesting article from Information Management.

Saturday 19 September 2015

How Big Data May Bring Some Sanity to the Holiday Shopping Rush via @infomgmt

Ahead of the holiday sales rush, big data startups have sprung up to help manage the complicated flow of data involved with the movement of goods. Interesting article here.

8 Objectives for Your MDM, Data Governance Strategy via @infomgmt

As we look ahead to MDM & Data Governance Summit in New York, here are eight ways to wrap your arms around master data management (MDM).

Friday 18 September 2015

Some Important Streaming Algorithms You Should Know About via @mapr

Ted Dunning describes some algorithms you should know about.

Sick of memorizing passwords? A Turing Award winner came up with this algorithmic trick via @pcworld

A Turing Award winner came up with this algorithmic trick.

Thursday 17 September 2015

WEBINAR: Email Compliance: How Analytics Helps Stave Off Violations - 23 September 2015

Sponsored News from Data Science Central

Email Compliance: How Analytics Helps Stave Off Violations

Live Webinar

Wednesday, September 23, 2015
10:00 AM PDT/1:00 PM EDT
Duration: 1 hour

With over 100 billion business emails in circulationdaily, the criticality of keeping your messages compliant can’t be overstated. To avoid litigation, steep financial penalties, and human resource issues – and to keep the integrity of your brand intact – it’s important for organizations to ensure their communications abide by established rules and protocols.

Join us for the Email Compliance-How Analytics Helps Stave Off Violations webinar to learn how the Teradata Email Compliance Application can help you:

Easily track your communications regularly, across all emails and attachments
Use all your email data to implement rules engine that identifies the likelihood of compliance violations
Better understand and address the challenges of compliance monitoring

Discover how advanced analytics can ensure you stay email compliant. Register for Email Compliance-How Analytics Helps Stave Off Violations now!

SPEAKERS

David Gebala
Senior Manager, Teradata Aster Center of Excellence

Matt Mazzarell
Data Scientist, Teradata Field Applications

WEBINAR: Building Modern Cross-Platform Web Apps in Java - 24 Sept 2015

Building Modern Cross-Platform Web Apps in Java

WEBINAR DATE: Thursday, September 24, 2015

TIME: 1:00-2:00PM ET

Developers are increasingly feeling the pressure to deliver great-looking, highly-performant, web applications that can run on multiple device types - faster. Despite advances in Javascript frameworks, the robustness of Java continues to appeal to large teams and/or large applications. Sencha GXT builds on the open source GWT compiler to enable Java developers to build complex desktop-like user interfaces that run in the browser.

Please join us as David Chandler (Developer Advocate) and Gautam Agrawal (Senior Director of Product Management) discuss how you can leverage advancements in Sencha GXT to:

Build rich user interfaces with tree controls, filtering grids, charts, and more that run in all popular browsers.
Extend your Java web apps to tablets and leverage touch events, gestures and momentum scrolling.
Present complex data more effectively by leveraging data loaders, stores and charts.
Accelerate your overall application, design, delivery and deployment efforts.

FEATURED SPEAKERS:

David Chandler, Developer Advocate for GXT, Sencha

Gautam Agrawal, Senior Director of Product Management, Sencha

Use Data to Survive Service Disruptions and Retain Customers via @Data_Informed

A service outing can alienate customers and threaten your brand. Laks Srinivasan of Opera Solutions discusses how big data analytics can help organizations survive a service outage with minimal damage to the business.

Interesting article from Data Informed

Spark versus MapReduce: which way for enterprise IT? via @computerweekly

Interesting comparison between the two. I can't disagree with the conclusion.

Wednesday 16 September 2015

WEBINAR: How to Combine BI with In-memory Computing for True Data Insights - 22 Sept 2015

How to Combine BI with In-memory Computing for True Data Insights

Complimentary Web Seminar
September 22, 2015
2 PM ET/11 AM PT

Brought to you by Information Management

As your data volumes grow and data becomes more complex, companies often struggle with gaining a reliable, up-to-date view of what’s currently happening in the business. To avoid such setbacks and turn data into opportunity, savvy businesses are combining BI software with in-memory computing. Join us to learn how.

Three Things You Will Learn:

Independent Findings from Blue Hill Research Analyst James Haight: Why clients are using Cognos BI together with DB2 with BLU Acceleration. Including key business benefits.
Key Pieces to the Puzzle: How Cognos BI -- a purpose-built, enterprise-class platform -- supports global deployments for all BI and performance management needs, while still delivering scalability and cost-effectiveness. Plus, how BLU Acceleration -- a next generation in-memory computing technology -- delivers results at breakthrough speeds through a series of advanced processing techniques.

The Total Solution: Matthew Mikell of IBM will describe how combining those co-optimized capabilities enables leaders to glean insights from big data in near real time and in ways that can be easily visualized and consumed.

Join us to learn how this solution can help you take confident action to realize more opportunity in your business.

Featured Presenters:


Speaker James Haight Research Analyst Blue Hill Research	Speaker Matthew Mikell Portfolio Marketing Manager, Information Management and Business Intelligence on Cloud IBM	Moderator Jim Ericson Consultant Editor Emeritus, Health Data Management

Sponsored by:
Sponsor

Time to Clean Up Your Master Data via @CFO

An old article but still as relevant today as it was when it was written. Sorting out master data means any result of analysis on your data will generally be more accurate and therefore you will have the right environment to make better decisions based upon it.

How Many Types of KPIs Are There? via @infomgmt

Finally, some guidance for your scorecards and dashboards. Here are five areas that potentially deserve your attention.

Interesting blog from Information Management. I think we can all recognise KPIs we are familiar with in these classifications.

Tuesday 15 September 2015

Live Q&A: Capture Real-Time Operational Intelligence from Big Data - 29 Sept 2015

Never before has so much data, from so many sources, been available to business. Java developers, DevOps, and IT Ops personnel need to acquire rapid insight into this data in order to keep the business running. This streaming data in motion can hold the key to valuable insights, but if you can’t act on those insights – what Forrester Research calls “perishable insights” – in the moment, they can become yesterday’s news in minutes.

Join us on Tuesday, September 29 at Noon EST / 9AM PST for a Live Q&A with Albert Mavashev, CTO at jKool.

Get answers to your specific questions on how to glean insight from your machine data using in-memory analytics. Albert will share tips that will enable you to quickly understand your customers’ needs, improve diagnostics, identify trends as they are happening, be predictive, and take action in real-time.

Join us and ask your specific questions, including:

How can I identify where my company is missing opportunities in improving operational intelligence?
How can I reduce the amount of time our developers spend diagnosing application problems and get them back to developing?
What are the financial and resource investments involved with gaining real-time insights into our data?
How is streaming data analytics different from Business Intelligence?
What kind of technical expertise does my staff need to take advantage of real-time data insight?

Bonus: Everyone who registers will be entered into a drawing for a new Samsung Galaxy Tab 4 tablet computer.

10 Indian Data Scientists you should know via @Mastufa

An interesting list of data scientists - only issue I have with it is that there are no women on that list. I hope that is corrected in the next few years.

SlideShare Presentations on Data Science via AnalyticsVidhya

Great list of Slideshare presentations pulled together by the Analytics Vidhya team. Recommended for a bookmark.

Monday 14 September 2015

How to train your mind for analytical thinking? via @AnalyticsVidhya

Still as useful an article today as it was when it was originally written. Read it here on Analytics Vidhya.

Creating Line Charts and Bar Charts in GGPLOT2 via Maths user

Step by step instructions on creating bar charts and line charts in R using GGPLOT2 by Maths user.

Great tutorials and should definitely be bookmarked for future reference.

Sunday 13 September 2015

22 easy-to-fix worst mistakes for data scientists via @DataScienceCtrl

I think these apply to anyone not just data scientists. Great blog entry from Data Science Central.

Partitioning cluster analysis: Quick start guide - Unsupervised Machine Learning via STHDA

Clustering is a data exploratory technique used for discovering groups or pattern in a dataset. There are two standard clustering strategies: partitioning methods and hierarchical clustering.

Great tutorial via STHDA. Well worth a bookmark.

Saturday 12 September 2015

Cross validation done wrong via @mottalrd

Cross validation is an essential tool in statistical learning 1 to estimate the accuracy of your algorithm. Despite its great power it also exposes some fundamental risk when done wrong which may terribly bias your accuracy estimate.

Great blog post explaining this crucial part of predictive analytics.

Data Science with Python & R: Dimensionality Reduction and Clustering via @DataScienceCtrl

An important step in data analysis is data exploration and representation. In this tutorial we will see how by combining a technique called Principal Component Analysis (PCA) together with Cluster Analysis we can represent in a two-dimensional space data defined in a higher dimensional one while, at the same time, being able to group this data in similar groups or clusters and find hidden relationships in our data.

Great tutorial originally written by Jose A Dianes, PhD and shared via a blog on Data Science Central - definitely one to bookmark and keep.

Friday 11 September 2015

Round 1 of the Big Data Analytics World Championships 2015 (Business and Enterprise) - Saturday September 25, 2015

Thousands of the best Data Scientists, Engineers, Statisticians, Computer Science and Data Analysts compete in two Online Qualification Rounds (4 hours each). The top performers are flown to Austin, Texas USA to compete in the Live World Finals. The focus is on Business, Mobility and Enterprise data skills with real-world case studies, multiple-choice and short-answer questions.

Register here.

WEBINAR:5 Things Your Organization Needs to Succeed in Data Science - 15 Sept 2015

Overview

Title: 5 Things Your Organization Needs to Succeed in Data Science

Date: Tuesday, September 15, 2015

Time: 09:00 AM Pacific Daylight Time

Duration: 1 hour

Summary

Please join us on September 15, 2015 at 9am PT for our latest Data Science Central Webinar Event: 5 Things Your Organization Needs to Succeed in Data Science sponsored by Teradata.

What does it take to succeed in the world of Data Science and Analytics? It takes the right culture, people, process and governance, the ability to operationalize analytics, and special weapons and tactics.

Join John Thuma in this latest DSC Webinar as he discusses his strategy to conquer the 5 challenges to succeed in data science.

Culture: Is your organization a dinosaur looking at the pretty light in the sky, unknowing of what is to come? In today’s world, you are either an innovator or slowing fading away. Learn how organizations must embrace data science to survive and flourish as the market leader.
People: Do you have the right people for advanced analytics? Of course it takes statistics, programming and hard work. But it takes much more! Do you have the following traits in your team to succeed in advanced analytics? Learn the traits for success: The Pioneer, The Cattle Herder, The Muscle, and the Story Teller.
Process and Governance: It takes process and governance to succeed in data science and analytics. John will share his 10 step decision making process for advanced analytics.
Get Operational: If you can’t change the business process and the people acting in it with analytics then you have successfully built a science project, a bunch of technology no one uses. Not good. Start with a business problem, solve business problem, and embed analytics into the process and script out what people are going to do with it.
Special Weapons and Tactics: John will share his secrets weapon to remove technology barriers and succeed in data science. Come to the webinar and find out.

Speaker: John Thuma, Director, Aster Strategy and Analytics

Hosted by: Bill Vorhies, Editorial Director, Data Science Central

WEBINAR: Breaking Down the Barriers of Ineffective Data Governance - 17 Sept 2015

Breaking Down the Barriers of Ineffective Data Governance

Complimentary Web Seminar
September 17, 2015
1 pm ET/10 am PT

Brought to you by Information Management

Many Business Analysts today are often frustrated and perplexed by IT bureaucracy getting in the way of their analytic projects. Analysts want quick and easy access to data and to explore comprehensive datasets to achieve new levels of insights. On the other hand, IT’s job is to ensure that the organization’s data is accurate, complete and secure. With the explosive rate of data growth, analysts are excited to explore these new resources for potential competitive advantage. But this growth of data places burdens on limited IT resources to store, manage and protect this data. IT is required to maintain high standards for data while satisfying the needs of those trying to utilize this data.

Organizations that take a collaborative approach to data governance meet the dual demands of analysts and IT to successfully accomplish their analytic projects.

Please join us September 17th at 1 pm ET/10 am PT for a webcast to learn:

What are the three key criteria of collaborative data governance
What are the three benefits of collaborative data governance
What successful collaboration between analysts and IT looks like
How collaborative data governance can ensure the success of your analytical projects

Featured Presenters:


Moderator: Jim Ericson Consultant Editor Emeritus Information Management	Speaker: Sreevani Abbaraju Product Consultant Information Management Group Dell Software

Sponsored by:
Sponsor Logo

Start with Good Science on Good Data, Then we’ll Talk ‘Big Data’ via @WorldOfDataSci

We are currently witnessing a land rush of investment in Big Data architectures promising companies that they can turn their data into gold using the latest in distributed computing and advanced analytical methods.

Great article from Big Data Made Simple by Sean McClure.

I agree - more data that is bad is just more bad data - no reason to conclude it will be any more useful than it was before.

Big Data File Transfers: Solving the Challenge via @infomgmt

Organizations need to move large unstructured data sets across the world quickly and easily for big data analytics using Hadoop. Classic methods like FTP and HTTP aren't designed for such use cases. Here's how to move forward.

Article from Information Management.

Thursday 10 September 2015

WEBINAR: Elegant, modern, lightweight integration using enterprise integration patterns - 16th Sept 2015

Elegant, modern, lightweight integration using enterprise integration patterns

WEBINAR DATE: Wednesday, September 16, 2015

TIME: 1-2 PM ET

Enterprise environments are now more complex—connecting APIs across Software-as-a-Service (SaaS) apps, cloud apps, partner apps, and on-premise apps. Business is also demanding more innovative and faster services, which requires a modern and lightweight integration platform that easily scales with business requirements and is flexible to adapt to different use cases.

Apache Camel is a powerful integration framework that provides a POJO-based implementation of the enterprise integration patterns (EIPs) using an extremely powerful domain specific language (DSL) to configure routing and mediation rules. Apache Camel facilitates simple, flexible, and straightforward integration of a wide array of technologies and stacks (expressed as URIs) using common, well-defined enterprise integration patterns.

In this webinar, you'll learn about the foundational building blocks of Apache Camel, including:

The CamelContext
Domain specific language (DSL)
Enterprise integration patterns (EIPs)
Routes, pipelines, and RouteBuilders
Components and endpoints

You'll also learn how all these components work together to deliver an easy-to-use and powerful integration framework in Red Hat JBoss Fuse, a lightweight integration platform. JBoss Fuse can be flexibly deployed and dynamically provisioned across an enterprise for a variety of use cases to integrate everything, everywhere.
FEATURED SPEAKER:

Ashwin Karpe, Lead, Enterprise Integration Practice, Red Hat Consulting

Register here

R for Data Science with Hadley Wickham via @infomgmt

Like many, prolific developer Hadley Wickham acknowledges that R isn't the perfect language, but argues convincingly for its functional capabilities.

Great article from Information Management.

Wednesday 9 September 2015

BI Professionals Spend 50-90% of Their Time ‘Cleaning’ Raw Data for Analytics via @BDAnalyticsnews

Last year, the NYT shined a light on big data's “janitor” problem – that data scientists and business intelligence pros spend too much time cleaning, not evaluating data. But how big of an issue is it, really?

Great article from Big Data Analytics News.

I have to admit the cleaning and preparation of data does take time that could be spent doing more productive things. However until data sources are as rigorous as the data professionals that use the data we are stuck in this scenario as we have to clean the data to make useful conclusions from it.

Frequent backups of Laptop/Desktop are essential

Just done my monthly full backup of my laptop. Just like a business backs up their data so should you do the same with your personal data. With the cheap cost of storage no one has the excuse not to do backups frequently (and with the issues I read about going to Windows 10 it is always wise to have a backup before you upgrade in case of problems).

My recommendations for a cheap solution are:

Backup software: Paragon Backup and Recovery does the job well and for personal use you can get their 14 edition free here

External Hard Disk: External hard disks and storage are cheap these days so there is no excuse not to have one to backup your data. Currently I am using a Toshiba 3Tb external hard drive which can be purchased from Amazon for about £69.96 (details of the exact one I am using here)

So be safe and back up your data.

Don't forget your mobile phone and tablet too. You often find your phone manufacturer has a facility to back them up. Certainly Samsung do via their Kies software.

Big Data Fades to the Algorithm Economy via @forbes

Peter Sondergaard of Gartner recently wrote in Forbes, “Big data is the oil of the 21st century. But for all of its value, data is inherently dumb. It doesn't actually do anything unless you know how to use it.

Read his great article on Forbes here.

Tuesday 8 September 2015

IU scientists use Instagram data to forecast top models at New York Fashion Week via @IndianaResearch @EurekAlertAAAS

Researchers at Indiana University have predicted the popularity of new faces to the world of fashion modelling with over 80 percent accuracy. Interesting article here

Apache Foundation promotes Ignite via @sdtimes

The Apache Foundation has promoted Apache Ignite to become a top-level project. This open-source effort to build an in-memory data fabric was primarily driven by GridGain Systems and WANdisco.

Exciting move for anyone interested in processing real time data. Article in SD Times

Monday 7 September 2015

The Importance of Data Cleansing and Data Maintenance via @Datafloq

There are two aspects to data quality improvement. Data cleansing is the one-off process of tackling the errors within the database, ensuring retrospective anomalies are automatically located and removed. Data maintenance describes ongoing correction and verification – the process of continual improvement and regular checks. But, which process is the most important?

Great article from Datafloq. I completely agree - if you don't clean and maintain your data then you will et garbage results fro it.

How Apache Spark Is Transforming Big Data Processing, Development via @eWEEKNews

Apache Spark speeds up big data processing by a factor of 10 to 100 and simplifies app development to such a degree that developers call it a "game changer."Great article from eWEEK.

Sunday 6 September 2015

4 Tricky R interview questions via @AnalyticsVidhya

Great set of 4 interview question that can question your understanding of R. Well worth a read, before you bookmark and memorise them. From AnalyticsVidhya

Cohort Analysis with Python via @gjreda

A cohort is a group of users who share something in common, such as a sign-up date, first purchase month, birth date, acquisition channel, etc. This tutorial provides a good foundation for tracking these groups over time, which help you spot trends and understand repeat behaviours.

Great tutorial by Greg Reda well worth a read and bookmark.

Saturday 5 September 2015

How does a relational database work via Christophe Kalenzaga

An in-depth article that explains how a relational database handles an SQL query and the basic components inside a database.

Well worth a read and a bookmark even if you think you know everything there is to know about them.

Analytics Startups Fill Healthcare Void via @infomgmt

Healthcare analytics are getting the call to support the industry through massive change, and new companies are hoping to fill gaps in technology and provide needed capabilities.

Interesting article from Information Management.

Friday 4 September 2015

How IoT and Analytics Reshape Vertical Markets via @infomgmt

When you roll the Internet of Things (IoT), big data analytics and cloud computing together, vertical markets like manufacturing start to look dramatically different, according to new research.

Great article from Information Management.

It’s hard to be a data-driven organization via Big Data Made Simple

Do you work for a data-driven organization, or one that claims to be a data-driven organization, or one that wants to be a data-driven organization?

Great article by Charlie Kufs on Big Data Made Simple.

Thursday 3 September 2015

Ultimate guide for Data Exploration in Python using NumPy, Matplotlib and Pandas via @AnalyticsVidhya

Exploring data sets and developing deep understanding about the data is one of the most important skill every data scientist should possess. People estimate that time spent on these activities can go as high as 80% of the project time in some cases.

Great guide from the folks at Analytics Vidhya. Well worth a bookmark.

Developing with Data: How to Save Time & Get Amazing Results - Pipl via @billyattar

You should know what you’re dealing with before developing with data. How you approach your data queries and the inputs you use can have significant impact on your data output and match rates.

Great post by Billy Attar on the Pipl website.

Wednesday 2 September 2015

Finance giants partner on data company via @Reuters

J.P. Morgan, Goldman Sachs, and Morgan Stanley are working together to create a company that will pull together and clean data used to determine pricing and transaction costs. The Wall Street Journal reports that the project, dubbed "SPReD" (Securities Product Reference Data), will launch in 6 to 12 months.

A “bottom-up” approach to data unification via @radar

How Toyota used machine learning plus expert sourcing to unify customer data at scale. Great write-up from O'Reilly Radar.

I'm sure we have all struggled to reduce data from many sources to have just one record per customer across the data. It's interesting to see how someone else has tried to solve the problem.

Tuesday 1 September 2015

Taboo Data via Ben Rothfield

There is a class of data that we can derive easily—but can only use very, very carefully. As Ben Rothfeld explains: "I heard you went to Victoria's Secret today" is OK, but "So, you like push-up bras" isn't. Complicating things: consumers aren't really all that comfortable with the idea that you know enough to target them, but when you do target them, you'd better get it right.

Great article from Ben Rothfield

Basics of SQL and RDBMS – must have skills for data science professionals via @AnalyticsVidhya

SQL - One of the most sought and must known skills for Data Science Professionals, here's a simplified guide explaining basics of SQL, focusing for RDBMS.

Interesting high level guide for SQL from AnalyticsVidhya.