Data: 2015

Thursday, 31 December 2015

5 Ways Machine Learning Reinvents IT Root Cause Analysis via @Data_Informed

5 Ways Machine Learning Reinvents IT Root Cause Analysis via Data Informed - Rob Markovich of +Moogsoft Inc. discusses how machine learning can automate early detection of service failures and improve IT situational awareness.

A use I hadn't thought of before reading this article.

All I Want for Christmas is Improved Analytics via @Data_Informed

All I Want for Christmas is Improved Analytics via Data Informed - Ann Ponder of Teradata discusses how her holiday shopping experience has been – or, in some cases, should have been – improved by analytics.

I completely agree with her - intelligent integration of data could a) save money and b) give real results.

Wednesday, 30 December 2015

Data Visualization, Analytics, and Paralysis by Analysis via @Data_Informed

Data Visualization, Analytics, and Paralysis by Analysis via Data Informed - Collin Sebastian of YouEye discusses the role of visualization in the context of data analytics and the need to spend less time analysing data and more time making decisions.

Interesting thoughts and well worth a read.

Quick Introduction to Boosting Algorithms in Machine Learning via @AnalyticsVidhya

Quick Introduction to Boosting Algorithms in Machine Learning by Sunil Ray on +Analytics Vidhya - For anyone who is getting puzzled with Boosting algorithms in machine learning - a simple guide that explains what they are and how to use them.

Tuesday, 29 December 2015

A Complete Tutorial on SAS Macros For Faster Data Manipulation via @AnalyticsVidhya

A Complete Tutorial on SAS Macros For Faster Data Manipulation by Sunil Ray on +Analytics Vidhya - Macros in SAS provide an incredible way to automate a process. Read his blog for how you can use them to make your life so much easier.

A great blog from Sunil and well worth reading.

Monday, 28 December 2015

Our Berkeley Data Science Capstone Project: Rap Analysis via @DataScienceCtrl

Our Berkeley Data Science Capstone Project: Rap Analysis via @DataScienceCtrl -Great guest blog from Data Science Central containing a data science exploration of rap lyrics and what it takes to make it onto the billboard charts.

Love it and it is a great example of machine learning and predictive analytics.

File Formats in Apache HIVE via @acadgild

File Formats in Apache HIVE via @acadgild - This blog from AcadGild discusses the different file formats available in Apache Hive. After reading this Blog you will get a clear understanding of the different file formats that are available in Hive and how and where to use them appropriately.

It contains great examples and should prove very useful to people.

Sunday, 27 December 2015

Top 20 Data Science Skills via @Dan81989

Top 20 Data Science Skills via @Dan81989 - Great article by Daniel Levine on Smart Data Collective going through a more reasoned list of skills for Data Science. - Data science is a mashup of skills ranging from computer science and statistics, to machine learning and strong communication.

It was great for me to look at the first chart and realise I actually knew some of the top 8 - yes I know I need to improve them for sure (not many can't find an area they need to improve their own skills in), but it still gave me some cheer. Look at yourself against the charts and think about what you are missing or need to improve upon.

Beyond the Pill: Data Is the New Drug via @recode @medableinc

Beyond the Pill: Data Is the New Drug via @recode @medableinc - Great and slightly scary article by Michelle Longmire, MD In the very near future, most drugs will have both a chemical and digital component, as every pill will have a companion mobile app that collects patient-specific data. As millions of people use these apps, there's going to be an incredible new data stream to mine.

I find this a mixture of fascinating, scary and something to be wary of all at the same time.

Saturday, 26 December 2015

Data Science for Losers, Part 7 - Using Azure ML via @brakmic

Data Science for Losers, Part 7 - Using Azure ML via @brakmic In this part 7 Harris is taking us further into coding with Azure for machine learning. Make sure you have gone through part 6 first. Includes Python code but I think if you don't know python you can still understand what is happening.

Data serialization with avro in hive via @acadgild

Data serialization with avro in hive via @acadgild - This blog from AcadGild focuses on providing in depth information of Avro in Hive. Here we have discussed about the importance and necessity of Avro and how to implement it in Hive. Through this blog you will get a clear idea about Avro and its implementation in your Hadoop projects.

Very useful blog and gives great examples.

Friday, 25 December 2015

Merry Christmas

No blog/twitter posts today - remember those less fortunate than yourself during this time of year and enjoy your Christmas!!

Thursday, 24 December 2015

What happens when everyone can connect the dots via @radar

What happens when everyone can connect the dots via @radar - Everyone loves data, so it's no surprise that we've been innovating by orders of magnitude in data storage. But has analytics innovation kept up?

I think it was behind for a while but is trying it's hardest to make up for any progress it missed.

Understanding Support Vector Machine algorithm from examples (along with code) via @AnalyticsVidhya

Understanding Support Vector Machine algorithm from examples (along with code) via +Analytics Vidhya - Did you know amongst all machine learning algorithms, Support Vector Machine (SVM) is the one that can deal with smaller datasets & create powerful models?

Great article with Python code.

Wednesday, 23 December 2015

Exploring the process of insight generation via @radar @esimoudis

Exploring the process of insight generation via @radar @esimoudis - Sometimes insight arrives as a brilliant idea in the middle of the night or a lightning-bolt aha! moment. But Evangelos Simoudis says that for true insight you're better off creating an insight-generating process - one that includes a measurable action plan and domain knowledge. Here's how to deliver insight as a service.

Analytics without actions - why bother? via @radar and Akmal Chaudhri

Analytics without actions - why bother? via @radar - Streaming analytics are only worthwhile if the data leads to action. Akmal Chaudhri describes how the architectural choices you make can help ensure your fast data streams can be tied to data analysis.

Tuesday, 22 December 2015

Predictive Analytics Requires a Customer-Obsessed Innovation Culture via @infomgmt

Predictive Analytics Requires a Customer-Obsessed Innovation Culture via +Information Management and Frederic Golin - There is a palpable excitement around predictive analytics these days, but I see a risk that, beyond the excitement of the demo and first implementations, a number of these advanced analytic tools remain shelfware.

5 Metaphors for Big Data and Why They Matter via @BernardMarr @DataInformed

5 Metaphors for Big Data and Why They Matter via +Bernard Marr +Ian Murphy (DataInformed) @DataInformed - Bernard Marr addresses some metaphors that are commonly used within the world of big data and whether they are an apt shorthand for the phenomena they describe.

We all need to be clear what we and others mean when all these metaphors are used - apple could mean a fruit but could also mean a computer/phone/tablet/watch manufacturer.

Monday, 21 December 2015

How Bad Data Management Kills Revenue via @infomgmt

How Bad Data Management Kills Revenue via +Information Management - Not one to normally publically gripe on a vendor, but a recent customer experience with an online purchase is a great example of why organizations can't ignore data management investments.

Great article and makes a great point.

Apache Spark speeds up big data decision-making via @RT_LClark @computerweekly

Apache Spark speeds up big data decision-making via @RT_LClark and @computerweekly - Spark, the open-source cluster computing framework from Apache, promises to complement Hadoop batch processing

Sunday, 20 December 2015

Getting More Insights from Data: Nine Facts about the Practice of Data Science via @bobehayes

Getting More Insights from Data: Nine Facts about the Practice of Data Science via @bobehayes - The value of data is measured by what you do with it, and organizations are relying on data scientists to extract that value.

Interesting article by Bob Hayes on Business over Broadway.

The 37 Best Tools For Data Visualization via

The 37 Best Tools For Data Visualization via @7wData - Creating charts and infographics can be time-consuming. But these tools make it easier. It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it’s everywhere from labels on food packaging design to World Health Organisation reports.

For 31 (R) there are several ways - base, lattice and ggplot2.

Saturday, 19 December 2015

R In Browser Coding Tutorials via @DataCamp

R In Browser Coding Tutorials via @DataCamp - three in-browser R coding tutorials from the DataCamp team:

Introduction to R programming
Intermediate R programming
A Hands-on Introduction to Statistics

Well worth a try.

Data Science for Losers, Part 6 - Azure ML via @brakmic

Data Science for Losers, Part 6 - Azure ML via @brakmic In this part 6 Harris is taking us all through Azure Machine Learning. He points to a great course on edX too.

Friday, 18 December 2015

22 data experts share their predictions for 2016 via @importio

22 data experts share their predictions for 2016 via +import.io Some very interesting predictions - not sure I can disagree with any of them.

5 Trends That Will Drive Big Data in 2016 via @infomgmt

5 Trends That Will Drive Big Data in 2016 via +Information Management - MapR CEO and Co-founder John Schroeder sees an acceleration in big data deployments, and has crystallized his view of market trends into these five major predictions for 2016.

I have to agree with him - it's definitely going to become less centralised.

Thursday, 17 December 2015

k-Fold Cross Validation made simple via @AnalyticsVidhya

k-Fold Cross Validation made simple via +Analytics Vidhya - This article introduces the science behind k-fold cross validation & its use in simple terms and explains its implementation in Python.

Nice tie-in with Kaggle competition entries.

Cracking The Hadoop Developer Interview via @simplilearn

Cracking The Hadoop Developer Interview via @simplilearn - Prepping for a hadoop developer interview? This handy guide lists out the most common questions asked on Hadoop Developer interviews and model answers.

Wednesday, 16 December 2015

Machine Intelligence In The Real World via @techcrunch

Machine Intelligence In The Real World via @techcrunch Shivon Zilis divides ML companies into 8 groups: panopticons, lasers, alchemists, gateways, magic wands, navigators, agents, and pioneers (and provides examples of each) in order to define the landscape of machine learning companies.

I can't wait to read the next article on this subject.

Beyond the Venn diagram via @radar

Beyond the Venn diagram via @radar - Daniel Tunkelang identifies the essential skills for data scientists.

I have to agree that you are not going to find someone perfect for everything that you want.

Tuesday, 15 December 2015

Recognising and rescuing a failing big data project via @radar

Radar has a very interesting article - Common pitfalls and best practices every manager should know.

I completely agree with this sentence near the end - Promptly implementing small pieces of a plan tends to lead to more successful implementations down the road, as those pieces can quickly prove business value.

Using Python and R together: 3 main approaches via @KDnuggets

Great article from KDnuggets - Well if Data Science and Data Scientists can not decide on what data to choose to help them decide which language to use, here is an article to use BOTH. I also recommend reading the article with the link at the end of this one - Integrating Python and R into a Data Analysis Pipeline, Part 1

Monday, 14 December 2015

Big Data: 20 Free Big Data Sources Everyone Should Know via @BernardMarr

From Bernard Marr via Big Data Collective a great list of free data sources with descriptions.

I think a few others should be added to that list:

KDNuggets data repository
Kaggle competition datasets
Data Science Central datasets - go look for them in the Apprenticeship area
Google Public Data Explorer (may be some crossover)
KDNuggets article on pubic datasets

Go find some that interest you and have fun playing with it.

Where Does Big Data Stop and Big Brother Start? via @Data_Informed and @BernardMarr

Bernard Marr discusses in Data Informed revelations of Russia’s data-enabled surveillance of its citizens and the potential for governments to use big data to erode civil rights.

He discusses something which could be quite frightening for us all - the rise of cars that are connected to the internet. It's kind of scary to think about the information that could be provided from accessing that data. Privacy is becoming less and less possible with our data and lives..

Sunday, 13 December 2015

50 Useful Machine Learning & Prediction APIs via @KDnuggests

KDnuggets have created a list of 50 APIs selected from areas like machine learning, prediction, text analytics & classification, face recognition, language translation etc.

I've only done some limited things with APIs - this list gives me the impetus to go and play with some of them more.

How to avoid Over-fitting using Regularization? via @AnalyticsVidhya

Great post from AnalyticsVidya on how you can ensure the model does not consume more than enough attributes, avoiding Overfitting using Regularisation. Simplified explanation for quick & basic framework.

I recommend you read it as it may well help.

Saturday, 12 December 2015

Big Data at NASA via @DataScienceCtrl and @BernardMarr

Big Data at NASA via @DataScienceCtrl and @BernardMarr Great article looking at the quantities of data generated at NASA and the transfer rates. Reads like a major implementation of Big Data that could be used as a template for anyone.

SLIDESHOW: 25 Top Degree Programs for Data Analytics Professionals via @infomgmt

Via Information Management - Want to cash in on the current hiring craze for big data and data analytics talent? These top college and university degree programs will give you a great start. Split into 2 slideshows:

Slideshow 1

Slideshow 2

Friday, 11 December 2015

WEBINAR: Use Predictive Analytics and Hadoop to Turn the Promise of Big Data into Business Impact - 15 December 2015

Turn the Promise of Big Data into Business Impact

Use Predictive Analytics and Hadoop to Transform Insight into Action

December 15th, 11:00 am ET

As Hadoop continues to grow in popularity, organizations struggle to turn the promise of Big Data into business value, as shown in recent Gartner surveys. Predictive analytics is a natural platform to extract business value from Big Data. But there’s a major skills gap that inhibits adoption, because the Hadoop architecture requires specialized coding skills in addition to data science expertise.

In our program, experts from Gartner and RapidMiner will help you close that gap. They'll discuss:

Hadoop growth and trends
The challenges with production deployments
The pros and cons of four different methods of extracting predictive analytics value from Hadoop
And how to solve the skills gap issue by automatically translating predictive analytics processes into the many languages of Hadoop.

You’ll also find out how a code-free, drag-and-drop predictive analytics product makes creating and executing predictive analytics on Hadoop a fast and simplified process.

Whether you’re just beginning to work with predictive analytics or already have an experienced data science team, this webinar will show you it’s indeed possible to turn the promise of Big Data into business value today.

Analyzing the world’s news: Exploring the GDELT Project through Google BigQuery via @radar

From O'Reilly Radar - How to analyse, visualize, and even forecast human society using global news coverage - exploring the GDELT Project through Google BigQuery.

Very interesting article. I had no idea the inventory existed and that Google BigQuery was there to help these kinds of analyses.

What to look for in a data scientist via @radar

By Jerry Overton from O'Reilly "What's commonly expected from a data scientist is a combination of subject matter expertise, mathematics, and computer science. This is a tall order...[however] the skillset you need to be effective, in practice, tends to be more specific and much more attainable. This approach changes both what you look for from data science and what you look for in a data scientist."

A great summary of a difference between a Data Scientist and a Machine.

Thursday, 10 December 2015

WEBINAR: Make Beautiful Interactive Data Visualizations Without Writing JavaScript - 15 December 2015

Hassle Free Data Science Apps

Build Richly Interactive Visualizations on Streaming & Big Data with Open Source

Make Beautiful Interactive Data Visualizations Without Writing JavaScript
Date: December 15th, 2015
Time: 2 PM CT, 12 Noon PT, 3 PM ET

Data visualization is where your work comes to fruition - without communication, your insights don't turn into action, and your organization won't realize the value of your analytical work.

But creating and deploying data science apps is hard. You're a data scientist - not a web developer or designer. There has to be a better way.

That's why we created Bokeh, an interactive visualization framework for Python. Over the past 6 months, we've added a ton of powerful features and dramatically improved ease of use. On December 15th, Continuum Analytics CTO Peter Wang & Bokeh lead developer Bryan Van de Ven will present a webinar and show you how to create rich, interactive visualizations in the browser - without writing a line of JavaScript or HTML.

Register here

In the webinar, you'll learn to:

Use the Bokeh Visualization Framework to Easily Make Data Science Apps
Reproduce the Famous GapMinder Example - No JavaScript or HTML Required
Transform & Visualize Streaming Data with Scikit-Learn and Bokeh
Join us and learn to create beautiful, interactive visualizations, without the hassle.

Peter and Bryan will also conduct a live Q&A session after the presentation, so you can get answers to your toughest data visualization questions.

Presenters

PETER WANG

Peter Wang is the CTO and Co-founder of Continuum Analytics and the creator of Bokeh.

He has been developing commercial scientific computing and visualization software for over 15 years. He has software design and development experience across a broad variety of domains.

As a creator of the PyData conference, he devotes time and energy to growing the Python data community, and advocating and teaching Python at conferences worldwide.

Everyone who signs up will receive the recording, slides, and links to the notebooks on Anaconda Cloud.

BRYAN VAN DE VEN

Bryan Van de Ven is the lead developer on the Bokeh project.

He holds an undergraduate degree in Computer Science & Mathematics form UT Austin, and a Masters degree in Physics from UCLA.

Previously Bryan developed data exploration and visualization software for sonar feature detection, financial risk modeling, and fluid mixing simulation

WEBINAR: Testing the waters of today’s data lake - 16 December 2015

Data Lake – Five Tips to Navigate the Dangerous Waters
Date: Wednesday, December 16 Time: 11 a.m ET (60 min)

Data is inherently fast. It flies into your data warehouse in milliseconds, it’s altered in nanoseconds. And yet when it comes to transforming dark nebulous data into consumable and actionable insight you’re moving at the pace of days and hours.

And it’s not just the speed to insight. Analysis should be responsive, ready to shift at a moment’s notice not tied down by legacy infrastructure ill-equipped to handle the data of today let alone the future.

Data Lakes are the newest method for storing and managing data. It offers improved speed, accessibility, and agility leading to improved insights. But without the proper approach, a data lake quickly becomes a data swamp. Join us for a live webinar designed to help you:

Learn about what a data lake is and what it isn’t
ŸOptimize your data lake for speed and agility to insight
Ensure even those without programming skills can leverage the data lake
Understand the new approach to governance that the data lake is driving

Presenter:

Mark Marinelli
Chief Technology Officer, Lavastorm

WEBINAR: How Flextronics Uses Data Visualization and Analytics to Improve Customer Satisfaction - 15 December 2015

Overview

Title: How Flextronics Uses Data Visualization and Analytics to Improve Customer Satisfaction

Date: Tuesday, December 15, 2015

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary
How Flextronics Uses Data Visualization and Analytics to Improve Customer Satisfaction

Flexibility in adapting to changing global markets and customer needs is necessary to stay competitive, and the Flextronics analytics team is tasked with making sure the Flex management team has accurate and up-to-date analytics to optimize performance, efficiency, and customer service.

In our latest DSC webinar series, Joel Woods from Flextronics’ Global Services and Solutions will share success stories around analytics on repairs and refurbishment of customer products utilizing analytics and data visualization from Tableau and Alteryx.

You will learn how to:

Use data analytics to improve cost savings
Resolve common data challenges such as blending disparate data sources
Deliver automated and on-demand reporting to clients
Provide visualizations that surface the analytics that matter to both internal teams and customers

About Flextronics:

Flextronics is an industry leading end-to-end supply chain solutions company with $26 billion in sales, generated from helping customers design, build, ship, and service their products through an unparalleled network of facilities in approximately 30 countries and across four continents.

Speakers:

Ross Perez, Sr Product Manager - Tableau
Joel Woods, Advanced Analytics Lead - Flex Inc.
Maimoona Block, Alliance Manager - Alteryx

Hosted by: Bill Vorhies, Editorial Director - Data Science Central

Image result for alteryx logo

WEBINAR: Engaging the Business: Agile, Collaborative Approaches to Data Usability - 15 December 2015

Engaging the Business: Agile, Collaborative Approaches to Data Usability

TDWI Speaker:David Loshin, President of Knowledge Integrity

Date: Tuesday, December 15, 2015

Time: 9:00 a.m. PT, 12:00 p.m. ET

Webinar Abstract

The term “data-driven” has become an accepted principle for modern organizations, but to drive modern, agile businesses, each data consumer’s view of enterprise data must both align with individual data quality and usability criteria and remain consistent with other data users in the organization. While traditional data quality/data preparation tools were intended to ensure accuracy and trust, the conventional wisdom centred on a technical, IT-centric usage model.

However, two emerging realities are rapidly changing the way we think about business data usability: many data sources and exploding data volumes, and a growing sophistication regarding information and analytics among business users. Empowering business users to control and satisfy their own data usability needs implies changes to the requirements for data discovery, profiling, and quality tools. It means providing tools allowing users to eliminate the dependence on the IT department while letting them examine the data, define their own data expectations, and monitor the conformance to their unique sets of business rules.

We are beginning to see the segregation of duties as part of an agile, collabortive activity. Business owners are increasingly expected to be accountable for their data sets’ compliance with business rules, while, the IT role has transitioned to infrastructure management to ensure that business rules can be executed. In this webinar we examine the changing face of data preparation and data quality tools.

Attendees will learn about:

Changing requirements for data preparation, data profiling, and data quality
Collaboration between business and IT staff
Shifts in methods of interaction for data usability tools
Pushing data stewardship to the business edge
Accountability for managing data risks

20 Big Data Companies Leading the Way via @Datamation

Article from Datamation - These Big Data companies are pushing the nascent analytics market toward greater adoption, with a wildly diverse array of tools and solutions. Please note the list is alphabetical and not an indication of anything else.

Mostly companies that I would have expected to be on this list.

To MDM Or Not To MDM? That Is The Question… via @infomgmt

Interesting article from Information Management - Most have us have argued before now that MDM is a must, but it turns out that there are exceptions.

I find it an interesting article but there can be a halfway house to MDM - you still need documentation and you still need as part of that documentation to understand the interfaces between the data - where does it come from and what is the system of record.

Wednesday, 9 December 2015

The Internet of Things Invigorates Operational Intelligence via @infomgmt

From Information Management - Operational intelligence is a set of event-centred information and analytic processes operating across an organization that enable people to use that event information to take effective actions and make optimal decisions.

Interesting article which looks at one possible use of IOT (which is very exciting as there are so many potential uses with lots of data to match)

Gawker Media’s Data Guru Presents the Case for Deleting Data via @ellisbooker.@Data_Informed

Great article from Data Informed - Josh Laurito of Gawker Media Group explained Gawker’s decision to limit data collection and the implications this decision has had for the business.

A great idea if you can find anyone brave enough to do it. I completely agree that we have and collect too much data (which then is not always good data) but it's often a safety blanket so many wouldn't be without it.

Tuesday, 8 December 2015

7 Steps to Mastering Machine Learning With Python via @KDnuggets

From KDnuggets - There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps!

I love this article. I'm more of an R person as my Python is very basic but this gives me the confidence to try to do more in Python.

Bridging The Big Data Skills Gap With Online Training via @InformationWeek

From Information Week - Big data technology vendors are helping to close the skills gap when it comes to training the next crop of data scientists. Here's what they are offering.

A great guide to all the free training out there.

Monday, 7 December 2015

50 enterprise startups to bet your career on in 2016 via @BI_Europe

Business Insider UK has a great list of the top 50 startups which contain a surprisingly large number of company names many of us will be familiar with. Look down it and see just how many you use or recognise.

Skill-Based Approach to Improve the Practice of Data Science via @bobehayes

From Bob Hayes - One way to improve the practice of data science is to learn data skills that are essential for good analytic project outcomes.

Great an comprehensive analysis of the skills and proficiency of those skills on Business over Broadway.

Sunday, 6 December 2015

Ten Top Business Intelligence Trends to Expect in 2016 via @infomgmt

Business intelligence continues to be one of the fastest-moving areas in the enterprise, and the techniques that organizations are using to drive adoption and get value from their data are multiplying. Here are ten top business intelligence trends to expect in 2016.

Interesting list on Information Management.

The Future of Big Data: How Data Lakes Open New Possibilities for Your Organization via @VanRijmenam

Via Mark van Rijmenam a data lake enables organizations to bring together all kinds of data, combine the data sources, generate meaning from it and, of course, derive value out of it. But what can you do with a data lake? What are the advantages of a data lake and what are the challenges of a data lake? In this extensive post we took a deep dive into the opportunities of data lakes and we explored why they are the future of big data and what it means for your organization.

Interesting article and well worth read if you are wondering what they provide and what are he advantages. I'm looking forward to the planned articles on Data Lakes too.

Saturday, 5 December 2015

Cheatsheet – Python & R codes for common Machine Learning Algorithms via @AnalyticsVidhya

From AnalyticsVindhya a collection of 10 most commonly used machine learning algorithms with their codes in Python and R.

A must have cheatsheet for everyone if for nothing that to remind you of all of the ones you could easily use.

The Art of Analytics, Or What the Green-Haired People Can Teach Us via @datanami

It’s been said that a picture is worth a thousand words. But in this big data age, perhaps we should say that a picture is worth 1,000 terabytes.

Interesting article by Alex Woodie on Datanami

Friday, 4 December 2015

Virtual Reality Analytics Keynote Presentation via @Beraterzeitung

Amazing keynote presentation from Joerg Osarek about Virtual Reality Analytics.

I recommend you read this and start to think about and imagine what could be done with this kind of data/information. He rightly points out that there has to be an element of responsibility with this data as this really is crossing traditional privacy boundaries.

The top 12 Apache Hadoop challenges via @kumarchinnakali

via @kumarchinnakali Hadoop is a large-scale distributed batch processing infrastructure. While it can be used on a single machine, its true power lies in its ability to scale to hundreds or thousands of computers, each with several processor cores.

Great article by Kumar Chinnakali. I particularly find 8 a worry as sometimes you need to go live quickly before everything changes to make it worth less as a system/solution.

Thursday, 3 December 2015

WEBINAR: Big Data Blueprints - 8 December 2015

Pentaho Big Data Blueprints

Pentaho has developed a set of big data blueprints that empower organizations to generate real business value with Hadoop, NoSQL, and other emerging technologies.

Join Pentaho for this webinar for an overview of how to take advantage of three key design patterns - and the use cases they enable:

Streamlined Data Refinery to integrate and access blended, enriched data sets for high performance analytics
Creating a 360-Degree View of your customer to provide on-demand analytical views across key customer touch points
Internet of Things: Harnessing Machine and Sensor Data

Speaker: Wael Elrifai, Director for Enterprise Solutions, EMEA
Date: Tuesday, 8th December 2015
Time: 11am GMT / 12pm CET
(the session will take 45 minutes, and will be followed by a brief Q&A session)

Register here

WEBINAR: Data Scientist Workbench Accelerates Predictive Analytics - 8 December 2015

Overview

Title: Data Scientist Workbench Accelerates Predictive Analytics

Date: Tuesday, December 08, 2015

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

Data Scientist Workbench Accelerates Predictive Analytics

New technologies, Big Data and increasingly complex and multi-dimensional data relationships create a challenging environment for modern data scientists. In today’s DSC Webinar we will discuss a modern Predictive Analytics platform to help manage and accelerate all phases of your predicative analytic process lifecycle, from source to result.

In this latest DSC webinar you will learn how this agile digital workbench allows you to:

Streamline tedious and repetitive tasks
Access and prepare data for predictive analytics
Rapidly prototype and validate your models
Leverage your existing R and Python
Extract maximum value from Hadoop
Share results with the rest of the business
Quickly embed analytic results into business processes

Speakers: Dr. Martin Schmitz -- RapidMiner GmbH

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

WEBINAR: Hybrid Cloud and Big Data: A Perfect Marriage - 10 December 2015

Data Informed: Analytics on the Cloud 3-part webinar series

We are excited to announce this 3-part webinar series, Analytics on the Cloud, that will help you build high-performance analytics and Big Data solutions.

Part 3: Hybrid Cloud and Big Data — A Perfect Marriage
Thursday, December 10, 2015

In this session, uncover common enterprise use cases for big data, both on premise and in the cloud. You’ll also learn how IBM Cloud Infrastructure for Analytics addresses hybrid use cases through VPN connectivity and data transfer, as well as consistency and control over the software stack both on premise and in the IBM cloud.

Register here

WEBINAR: Trusted Techniques to Ensure the Security and Compliance of Big Data in the Cloud - 8 December 2015

We are excited to announce this 3-part webinar series, Analytics on the Cloud, that will help you build high-performance analytics and Big Data solutions.

Part 2: Trusted Techniques to Ensure the Security and Compliance of Big Data in the CloudTuesday, December 8, 2015

Business discussions around storing and accessing data — particularly huge amounts of data — in the cloud inevitably will hone in on security. Before any organization signs off on deploying big data in the cloud, critical security and compliance concerns will need to be addressed.

In this session, review enterprise requirements for data security and compliance in the cloud. Understand how IBM Cloud Infrastructure for Analytics addresses these requirements from end to end — for data at rest and data in motion — round-trip through Hadoop in the cloud.

Data Science for Losers, Part 5 via @brakmic

In this part 5 Harris is taking us all through Apache Spark DataFrames. If you know DataFrames in Python or R you definitely have the advantage as he quite rightly points out. I'm looking forward to the next topic hopefully in a later blog entry.

You can find the blog article for part 5 here

Data Science for Losers, Part 4 via @brakmic

In this part 4 Harris is taking us all through Machine Learning. He explains the different types of machine learning and takes us though the very basics of SCIKIT-LEARN. I look forward to his next blog.

You can find the blog article for part 4 here

Wednesday, 2 December 2015

WEBINAR: Redefining the Economics of Analytics - 8 December 2015

Complimentary Web Seminar
December 8, 2015
2 PM ET/11 AM PT
Brought to you by Information Management

As organizations struggle to make use of the ever growing volumes of data, they often confront legacy systems that are tough to work with, internal and external data silos that are messy and complex to integrate, and a corporate culture that may be slow to embrace analytics. On top of all this, you may be paying top dollar for an outdated analytics system that is mostly used to push data around. Is there a better way?

Please join this webcast to learn about how collective intelligence, native distributed analytics architecture, and enabling an analytics culture can help you embed analytics everywhere, innovate faster, and empower more people within your business to make better decisions.

Featured Presenters:


Moderator: Jim Ericson Consultant Editor Emeritus Information Management	Speaker: Tim Wassman Director, Global Field Operations Dell Statistica

Sponsored by:
Sponsor

The Role of KPIs in Managing Big Data via @infomgmt

KPIs have a dynamic relationship - information from one set of performance indicators can suddenly draw attention to the key role of another indicator – so we need to access them in real time, not in a historical report.

Interesting article which hopefully will make you think and maybe change the way you see it.

5 Data Trends That Will Impact You in 2016 via @infomgmt

Hortonworks Chief Technology Officer Scott Gnau shared his thoughts on what will be five of the most important data trends to impact the CIO in the coming year.

Interesting thoughts from Scott in Information Management.

Tuesday, 1 December 2015

Industrial IoT Outlook 2016 via @Intersog

Intersog provides a summary of the recent IoT research to better feel the pulse of the industry and see where it's headed in the future.

Interesting well thought out article - definitely worth a read.

Cutting Through the Buzz of Big Data; 5 Big Data Myths Debunked via @bigcloudteam @Datafloq

The industry is developing at a rapid pace, with the technology improving month-on-month instead of year-on-year. There is such a buzz about Big Data that the narrative has almost taken on a life of its own – it has become this mythical being that can slay uncertainty and save any business from an untimely end. It's time to debunk some of these myths.

Interesting article well worth a read.

All The Best Big Data Tools And How To Use Them

There are thousands of Big Data tools out there. All of them promising to save you time, money and help you uncover never-before-seen business insights. And while all that may be true, navigating this world of possible tools can be tricky when there are so many options.

Useful guide to understand what does what.

Monday, 30 November 2015

Microsoft's Graph wants to turn user data into business intelligence it can sell via @pcworld

How does data become information? Through context. And that’s what Microsoft’s new Microsoft Graph aims to do: Collect data points about you, then turn around and sell it to apps and services–with your permission, of course.

Interesting article from PCWorld.

Analyzing 1.1 billion NYC taxi and Uber trips, with a vengeance via @todd_schneider

An open-source exploration of the city's neighbourhoods, nightlife, airport traffic, and more, through the lens of publicly available taxi and Uber data by Todd Schneider

Great example of an analysis using public data well worth going through.

Sunday, 29 November 2015

Secrets from winners of the @AnalyticsVidhya best ever Data Hackathon!

An excellent learning for beginners, this compilation is an exciting reference for all aspiring & professional Data Scientists to gear up! Well worth a read.

Beginner’s guide to Web Scraping in Python (using BeautifulSoup) via @AnalyticsVidhya

In this post on AnalyticsVidhya Sunil Ray tkes us through how to web scrape in Python using BeautifulSoup. Nice cleat article with code - a great place to start if you ever need to do this but don't know how.

A recommendation system for blogs: Setting up the prerequisites [1/3] via @m__technologist

This is the first in a series of three blog posts where Thom Hopmans will elaborate on how we can build a recommendation engine for the readers on The Marketing Technologist (TMT). TMT currently has over fifty blog posts covering varying topics from Data Science to coding in ReactJS. Browsing through all the blog posts is time consuming, especially as the number of posts is still increasing. Also chances are readers are only interested in a select few blog posts that lie in their area of interest. If a recommendation engine is able to select those articles an user is interested in then this can definitely be classified as creating value from data and preventing information overload.

Nice start to the set of three blogs which takes you through some of the thought steps to follow as well comments on how to do it in Python. Good to look at even if you are not into Python.

Saturday, 28 November 2015

Kaggle Bike Sharing Demand Prediction – How I got in top 5 percentile of participants? via @AnalyticsVidhya

From AnalyticsVidhya here's one of the Top 5 percentile Solution of Kaggle Bike Sharing Demand Prediction, take it as a reference for your next competition.

Included R code. Definitely worth a bookmark and a look next competition you enter on Kaggle.

Stream Processing with Apache Flink via @brakmic

Great blog entry from Harris explaining what Apache Flink is and how to process stream data with it. Includes examples and GUTHUB contains the source code.

Friday, 27 November 2015

Get ignited with Apache Spark – Part 2 via @kumarchinnakali

Great post by Kumar Chinnakali discussing the Basics of Spark - Concepts like Resilient Distributed Datasets, Shared Variables, SparkContext, Transformations, Action, and Advantages of using Spark along with examples and when to use Spark.

The Data Science Industry: Who Does What via @DataCamp

Great Infographic from DataCamp.

Laetitia Van Cauwenberge has made the point on Data Science Central that that one of the core competencies of the data scientist is to automate the process of data analysis, as well as to create applications that run automatically in the background, sometimes in real-time, e.g. to find and bid on millions of Google keywords each day (eBay and Amazon do that, and most of these keywords have little or no historical performance data, so keyword aggregation algorithms - putting keywords in buckets - must be used to find the right bid based on expected conversion rates), buy or sell stocks,
monitor networks and generate automated alerts sent to the right people (to warn about a potential fraud, etc.) or to recommend products to a user, identify optimum pricing, manage inventory, or identify fake reviews (a problem that Amazon and Yelp have failed to solve to this day)

Great blog post from DataCamp.

Thursday, 26 November 2015

WEBINAR: Understand the Use Cases and Enterprise Requirements for Big Data in the Cloud - 2 December 2015

We are excited to announce this 3-part webinar series, Analytics on the Cloud, that will help you build high-performance analytics and Big Data solutions.

Part 1: Understand the Use Cases and Enterprise Requirements for Big Data in
the Cloud
Wednesday, December 2, 2015

Cloud computing is an deal match for big data projects — in the cloud, computing time and data storage are commodities. Yet, decision makers in charge of big data projects face important decisions, especially around which cloud option is the right choice for their needs.

This session will provide an overview of various use cases for big data on cloud, including a detailed review of enterprise requirements. You’ll then take a deep dive through a rich set of advanced analytics capabilities that allow you to analyze massive volumes of structured and unstructured data in their native formats.

WEBINAR: Data Sharing & Governance - 2 December 2015

Data Sharing & Governance

Complimentary Web Seminar
December 2, 2015
12 PM ET/9 AM PT
Brought to you by Information Management

Data sharing and governance … is your enterprise’s data getting shared everywhere without any controls?

Sharing data is one of the great challenges that organizations face today. We all know that this data is valuable, and that it has uses in many different scenarios. However, the reputational risk and possible legal exposure that comes with using and sharing data inappropriately cannot be underestimated. These issues exist in all types of organizations, and even between parts of the same organization, especially as they cross national boundaries. Governing the sharing of data is one of the critical challenges for today’s data professionals.

Attend this session to learn how the Chief Data Steward at one of High Tech’s largest and fastest growing companies rolled out its data sharing and governance program. Learn how Peggy McCoy and the NetApp Enterprise Data Management team developed a successful data sharing process along with insight into the critical tasks, relationships and policies that make this program successful and sustainable.

Featured Presenters:


Speaker: Peggy McCoy Chief Data Steward NetApp Inc.	Speaker: Aaron Zornes Chief Research Officer The MDM Institute & Conference Chairman at MDM & Data Governance Summit

Introduction to Spark with Python via @KDnuggets

Get a handle on using Python with Spark with this hands-on data processing tutorial. By Srini Kadamati, Data Scientist at Dataquest.io on KDnuggets.

Interesting tutorial. Please note it is 3 pages long.

The Power, Promise and Pitfalls of Big Data via @infomgmt

Big data’s limitless potential only can be realized if people are capable of managing information, interpreting it correctly, and acting wisely.

Interesting viewpoint article by Joe Lodewyck on Information Management. I think that even with all the tools, software and staff with the skills to use the data, we still need to make sure that the data is of good quality, and well organised, so we can actually use it and make correct conclusions/decisions based upon it.

Wednesday, 25 November 2015

WEBINAR: Big Data and the Connected Car - 2 December 2015

Azure is the most open, broad and flexible cloud platform for every customers needs, regardless of the application, framework, data source or operating system their solution may require. Whether you’re interested in various Linux flavors, Docker, MongoDB, Hadoop or languages like Java, Python, PHP and Ruby, you will find first-class support for all.

Recent innovations in the Internet-enabled connected cars that we drive today have spawned a whole new set of opportunities and challenges for automakers. The opportunities come from the ability to capture detailed, current data on how drivers operate their vehicles and how those vehicles respond to that use. Join this webinar to learn how this data can become critical in uses such as preventative maintenance, product development, manufacturing optimization, infotainment & paid content, as well as recall avoidance. As usual, very hands-on approach, with lots of demos!

Dave Russell

Solution Engineer, Horton

WEBINAR: Jump over the Data Preparation Hurdle with Spark - 1 December 2015

Overview

Title: Jump over the Data Preparation Hurdle with Spark

Date: Tuesday, December 01, 2015

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

Jump over the Data Preparation Hurdle with Spark

Data scientists don’t scale. In using them to do manual data preparation, you’re missing a huge opportunity to extract the most value from your intellectual assets.

The good news? By automating and accelerating much of this raw data crunching and ETL work, you enable non-data scientists to do data preparation rapidly and simply—and ask their own questions and find their own answers. What’s more, in this new Big Data Discovery environment, answers come in minutes, not months. Data scientists are able to focus on Spark-driven advanced analytics that yield game-changing answers.

In this next DSC webinar, you will learn:

How to automate your data integration process to set up your organization to be truly data-driven
How to manage your data as a self-service feature at the speed of thought
How to effectively unearth big insights that effectively impact the bottom line in the most efficient cycles.

Speaker: Josh Och -- Platfora

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Is the 'Internet of Things' the Most Over-Hyped Trend In IT? via @infomgmt

Beecham Research analysts are warning companies planning to get into the Internet of Things (IoT) market “not to believe all the hype and over optimistic predictions.”

Interesting article by David Weldon on Information Management. From my perspective there potential is good for this technology and analytics, but IoT items are not well enough established to make the use of this data effective enough yet.

7 Must Watch Documentaries on Statistics and Machine Learning via @AnalyticsVidhya

This list is released by Manish Saraswat on DataVidhya. These movies reveal the smart use of data and machine learning to make our lives better.

It's a good idea to watch these as they might make you think more or even confirm something you've always believed.

Tuesday, 24 November 2015

How Engineering Company Siemens Creates Value for Their Customers Using Big Data Analytics via @Datafloq

Siemens is a 168-years old engineering company that has prepared itself for the future. Recently, have really moved forward and combined their engineering capability with great new analytical capabilities to really help their customers perform better. In this article they look at three examples of how they are changing the game and creating lasting for their customers.

Goodbye Big Data, Hello Thick Data via @GreenBook @scribbett

Big Data is here to stay, but it’s only half the job - Thick Data fills the gaps and enables truly people-shaped or human-centred development and visceral business.

Interesting blog by Stephen Cribbett on Green Book I can definitely see the need for data to be more party focussed - that's how to get the best value from it for sure.

Monday, 23 November 2015

“Shrinking bull’s-eye” algorithm speeds up complex modeling from days to hours via @mit

Algorithm may be applied to a broad range of complicated problems.

Great article from MIT News. This sounds very exciting - I can't wait to see the results of it's use in a wider context.

Pinterest And Facebook Take Big Data To Another Level via @Forbes @BernardMarr

Big data is a critical cornerstone of most social media businesses and Pinterest is no exception. The underlying algorithms that make Pinterest successful and fun - the ones that suggest new pins you might like based on things you've liked before, for example - are an example of big data at its finest.

Great article (as always) from Bernard Marr on Forbes. Please note this is a 2 page article.

Sunday, 22 November 2015

Business Intelligence and The Curve of Predictive Analytics via @intelligent_app

Business Intelligence basically connects the computers and intelligent algorithms (IA) to work in tandem and produce predictions using historic data. They help the business owners make good decisions. It used to be a simple task at some point in the past, and making such analyses were easy to do.

Continue reading here on the Intelligent Analytics website

Mind your Database Ps and Qs via @drsql

I heartfelt plea to all those that create or share code/databases/anything else - documentation, structure and comments are vital if you want anyone else to be able to understand what you have done.

Great blog by Louis Davidson. I would go a bit further - if you have an ego and want everyone to think the best of you and that what you have done is great/brilliant/amazing then structure it well, leave comments in the code, provide clear documentation and then it will be easy and obvious to see just how clever you really are.

Saturday, 21 November 2015

11 Essential Tips for Effective Data Collection @chi2innovations

We live in an increasingly rich world of data – the amount of data that currently exists doubles every 18 months. That’s a phenomenal rate of growth and we’re just at the beginning of an incredible journey creating awesome intelligent applications that can handle these unimaginable amounts of data automatically.

Interesting blog. I don't necessarily agree with everything (not sure you need to start on paper), but it points out a number of things that should be obvious but might not be.

Facebook M — The Anti-Turing Test via @arikaleph

Facebook's new AI, called M, is said to have capabilities that far exceed those of competing AIs. Some people claim that it's actually human-assisted but M insists it's an AI. A Turing Test won't work here because M's objective is precisely to not pass a Turing test. This is a fun exploration of how to test what's really going on behind the scenes.

Very interesting post from Arik Sosman which can be found here.

Friday, 20 November 2015

Computer, respond to this email via @googleresearch

Last week, Google announced that the Inbox by Gmail mobile apps for Android and iOS will now include a free tool, Smart Reply, which uses AI to scan the contents of messages, pick three of a possible 20,000 common responses and suggest them to you. Sadly, Smart Reply no longer has an overwhelming tendency to suggest the response "I love you" to almost anything.

Great research blog from Google. I love the way the use of machine intelligence is going.

Needed: More women in data science via @Stanford

A recent gathering at Stanford on the emerging science of big data turned the usual gender ratio of science conferences on its head. What big data needs now is for more women to move into the field, said Persis Drell, dean of Stanford's School of Engineering. Drell and other female scientists said they've long had the experience of being surrounded by men on all sides at science conferences. So the inaugural Women in Data Science conference held at the Arrillaga Alumni Center Nov. 2 was noteworthy not only for its depth of experts but also because it was a rare all-female science meeting.

Great report on the conference from Stanford.

Thursday, 19 November 2015

A Call For An Analytics Web API Standard via @infomgmt

Wouldn't it be a dream to assemble Big Data, IoT and analytics in the cloud using different software vendors?

Interesting article from Information Management pointing out we have a pressing need for a standard for web based APIs. Lets hope one can be agreed on soon to avoid rework for essentially the same thing.

Analytics Challenge Celebrates Top College Data Analytics Talent and Luring Top Campus Talent to the Field of Data Analytics via @infomgmt

Many universities and technology vendor companies are taking steps to make the field of data analytics more appealing, and offering students more real-world exposure to the power of data and business intelligence. A case in point is the Adobe Analytics Challenge.

A two part article from Information Management focussing on how to get students into data and analytics.

Analytics Challenge Celebrates Top College Data Analytics Talent can be found here

Luring Top Campus Talent to the Field of Data Analytics can be found here

Wednesday, 18 November 2015

WEBINAR: Best Fit Engineering for SQL on Hadoop - 24 November 2015

Overview

Title: Best Fit Engineering for SQL on Hadoop

Date: Tuesday, November 24, 2015

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary

Best Fit Engineering for SQL on Hadoop

Join us for our latest DSC Webinar series as we discuss how enterprises have increasingly large volumes of structured and semi-structured data generated by all sorts of applications. Much of that data is increasingly finding its way into Hadoop clusters for analytics because of its versatility and the economical, linear scalability of both data storage and compute. And SQL is still the best option for querying it:

SQL is the universal connector to many BI tools and technologies
Prevalent SQL skills overcome the Hadoop skills gap
Hadooponomics enables more analytics on more data at a much lower cost

Forrester recently concluded that organizations need to choose more than one SQL-on-Hadoop tool to satisfy all requirements. Hortonworks and Teradata agree in this “best fit engineering” approach designed to match the benefits of each tool set to map to actual workload requirements, while remaining true to 100% open source innovation.

You will learn about SQL on Hadoop best practices, including:

A brief history of SQL on Hadoop
Architecture and use cases for Hive and Presto
Technical deep dive and futures for Hive and Presto

Speakers:

Mark Shainman, Program Manager -- Teradata
Mark Lochbihler, Director, Partner Engineering -- Hortonworks

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central