Thursday 31 December 2015

5 Ways Machine Learning Reinvents IT Root Cause Analysis via @Data_Informed

5 Ways Machine Learning Reinvents IT Root Cause Analysis via Data Informed - Rob Markovich of +Moogsoft Inc.  discusses how machine learning can automate early detection of service failures and improve IT situational awareness.

A use I hadn't thought of before reading this article.

All I Want for Christmas is Improved Analytics via @Data_Informed

All I Want for Christmas is Improved Analytics via  Data Informed - Ann Ponder of Teradata discusses how her holiday shopping experience has been – or, in some cases, should have been – improved by analytics.

I completely agree with her - intelligent integration of data could a) save money and b) give real results.

Wednesday 30 December 2015

Data Visualization, Analytics, and Paralysis by Analysis via @Data_Informed

Data Visualization, Analytics, and Paralysis by Analysis via Data Informed - Collin Sebastian of YouEye discusses the role of visualization in the context of data analytics and the need to spend less time analysing data and more time making decisions.

Interesting thoughts and well worth a read.

Quick Introduction to Boosting Algorithms in Machine Learning via @AnalyticsVidhya

Quick Introduction to Boosting Algorithms in Machine Learning by Sunil Ray on +Analytics Vidhya - For anyone who is getting puzzled with Boosting algorithms in machine learning - a simple guide that explains what they are and how to use them.

Tuesday 29 December 2015

A Complete Tutorial on SAS Macros For Faster Data Manipulation via @AnalyticsVidhya

A Complete Tutorial on SAS Macros For Faster Data Manipulation by Sunil Ray on +Analytics Vidhya - Macros in SAS provide an incredible way to automate a process. Read his blog for how you can use them to make your life so much easier.

A great blog from Sunil and well worth reading.

Monday 28 December 2015

Our Berkeley Data Science Capstone Project: Rap Analysis via @DataScienceCtrl

Our Berkeley Data Science Capstone Project: Rap Analysis via @DataScienceCtrl -Great guest blog from Data Science Central containing a data science exploration of rap lyrics and what it takes to make it onto the billboard charts.

Love it and it is a great example of machine learning and predictive analytics.

File Formats in Apache HIVE via @acadgild

File Formats in Apache HIVE via @acadgild - This blog from AcadGild discusses the different file formats available in Apache Hive.  After reading this Blog you will get a clear understanding of the different file formats that are available in Hive and how and where to use them appropriately.

It contains great examples and should prove very useful to people.

Sunday 27 December 2015

Top 20 Data Science Skills via @Dan81989

Top 20 Data Science Skills via @Dan81989 - Great article by Daniel Levine on Smart Data Collective going through a more reasoned list of skills for Data Science. -  Data science is a mashup of skills ranging from computer science and statistics, to machine learning and strong communication.

It was great for me to look at the first chart and realise I actually knew some of the top 8 - yes I know I need to improve them for sure (not many can't find an area they need to improve their own skills in), but it still gave me some cheer. Look at yourself against the charts and think about what you are missing or need to improve upon.

Beyond the Pill: Data Is the New Drug via @recode @medableinc

Beyond the Pill: Data Is the New Drug via @recode @medableinc -  Great and slightly scary article by Michelle Longmire, MD In the very near future, most drugs will have both a chemical and digital component, as every pill will have a companion mobile app that collects patient-specific data. As millions of people use these apps, there's going to be an incredible new data stream to mine.

I find this a mixture of fascinating, scary and something to be wary of all at the same time.

Saturday 26 December 2015

Data Science for Losers, Part 7 - Using Azure ML via @brakmic

Data Science for Losers, Part 7 - Using Azure ML via @brakmic In this part 7 Harris is taking us further into coding with Azure for machine learning.  Make sure you have gone through part 6 first. Includes Python code but I think if you don't know python you can still understand what is happening.

Data serialization with avro in hive via @acadgild

Data serialization with avro in hive via @acadgild - This blog from AcadGild focuses on providing in depth information of Avro in Hive. Here we have discussed about the importance and necessity of Avro and how to implement it in Hive. Through this blog you will get a clear idea about Avro and its implementation in your Hadoop projects.

Very useful blog and gives great examples.

Friday 25 December 2015

Merry Christmas

No blog/twitter posts today - remember those less fortunate than yourself during this time of year and enjoy your Christmas!!

Thursday 24 December 2015

What happens when everyone can connect the dots via @radar

What happens when everyone can connect the dots via @radar - Everyone loves data, so it's no surprise that we've been innovating by orders of magnitude in data storage. But has analytics innovation kept up?

I think it was behind for a while but is trying it's hardest to make up for any progress it missed.

Understanding Support Vector Machine algorithm from examples (along with code) via @AnalyticsVidhya

Understanding Support Vector Machine algorithm from examples (along with code) via +Analytics Vidhya - Did you know amongst all machine learning algorithms, Support Vector Machine (SVM) is the one that can deal with smaller datasets & create powerful models?

Great article with Python code.

Wednesday 23 December 2015

Exploring the process of insight generation via @radar @esimoudis

Exploring the process of insight generation via @radar @esimoudis - Sometimes insight arrives as a brilliant idea in the middle of the night or a lightning-bolt aha! moment. But Evangelos Simoudis says that for true insight you're better off creating an insight-generating process - one that includes a measurable action plan and domain knowledge. Here's how to deliver insight as a service.

Analytics without actions - why bother? via @radar and Akmal Chaudhri

Analytics without actions - why bother? via @radar - Streaming analytics are only worthwhile if the data leads to action. Akmal Chaudhri describes how the architectural choices you make can help ensure your fast data streams can be tied to data analysis.

Tuesday 22 December 2015

Predictive Analytics Requires a Customer-Obsessed Innovation Culture via @infomgmt

Predictive Analytics Requires a Customer-Obsessed Innovation Culture via +Information Management and Frederic Golin - There is a palpable excitement around predictive analytics these days, but I see a risk that, beyond the excitement of the demo and first implementations, a number of these advanced analytic tools remain shelfware.

5 Metaphors for Big Data and Why They Matter via @BernardMarr @DataInformed

5 Metaphors for Big Data and Why They Matter via +Bernard Marr +Ian Murphy (DataInformed) @DataInformed - Bernard Marr addresses some metaphors that are commonly used within the world of big data and whether they are an apt shorthand for the phenomena they describe.

We all need to be clear what we and others mean when all these metaphors are used - apple could mean a fruit but could also mean a computer/phone/tablet/watch manufacturer.

Monday 21 December 2015

How Bad Data Management Kills Revenue via @infomgmt

How Bad Data Management Kills Revenue via +Information Management - Not one to normally publically gripe on a vendor, but a recent customer experience with an online purchase is a great example of why organizations can't ignore data management investments.

Great article and makes a great point.

Apache Spark speeds up big data decision-making via @RT_LClark @computerweekly

Apache Spark speeds up big data decision-making via @RT_LClark and @computerweekly - Spark, the open-source cluster computing framework from Apache, promises to complement Hadoop batch processing

Sunday 20 December 2015

Getting More Insights from Data: Nine Facts about the Practice of Data Science via @bobehayes

Getting More Insights from Data: Nine Facts about the Practice of Data Science via @bobehayes - The value of data is measured by what you do with it, and organizations are relying on data scientists to extract that value.

Interesting article by Bob Hayes on Business over Broadway.

The 37 Best Tools For Data Visualization via

The 37 Best Tools For Data Visualization via @7wData - Creating charts and infographics can be time-consuming. But these tools make it easier. It’s often said that data is the new world currency, and the web is the exchange bureau through which it’s traded. As consumers, we’re positively swimming in data; it’s everywhere from labels on food packaging design to World Health Organisation reports.

For 31 (R) there are several ways - base, lattice and ggplot2.

Saturday 19 December 2015

R In Browser Coding Tutorials via @DataCamp

R In Browser Coding Tutorials via @DataCamp - three in-browser R coding tutorials from the DataCamp team:

Introduction to R programming
Intermediate R programming
A Hands-on Introduction to Statistics

Well worth a try.

Data Science for Losers, Part 6 - Azure ML via @brakmic

Data Science for Losers, Part 6 - Azure ML via @brakmic In this part 6 Harris is taking us all through Azure Machine Learning. He points to a great course on edX too.

Friday 18 December 2015

22 data experts share their predictions for 2016 via @importio

22 data experts share their predictions for 2016 via +import.io Some very interesting predictions - not sure I can disagree with any of them.

5 Trends That Will Drive Big Data in 2016 via @infomgmt

5 Trends That Will Drive Big Data in 2016 via +Information Management - MapR CEO and Co-founder John Schroeder sees an acceleration in big data deployments, and has crystallized his view of market trends into these five major predictions for 2016.

I have to agree with him - it's definitely going to become less centralised.

Thursday 17 December 2015

k-Fold Cross Validation made simple via @AnalyticsVidhya

k-Fold Cross Validation made simple via +Analytics Vidhya - This article introduces the science behind k-fold cross validation & its use in simple terms and explains its implementation in Python.

Nice tie-in with Kaggle competition entries.

Cracking The Hadoop Developer Interview via @simplilearn

Cracking The Hadoop Developer Interview via @simplilearn - Prepping for a hadoop developer interview? This handy guide lists out the most common questions asked on Hadoop Developer interviews and model answers.

Wednesday 16 December 2015

Machine Intelligence In The Real World via @techcrunch

Machine Intelligence In The Real World via @techcrunch Shivon Zilis divides ML companies into 8 groups: panopticons, lasers, alchemists, gateways, magic wands, navigators, agents, and pioneers (and provides examples of each) in order to define the landscape of machine learning companies.

I can't wait to read the next article on this subject.

Beyond the Venn diagram via @radar

Beyond the Venn diagram via @radar - Daniel Tunkelang identifies the essential skills for data scientists.

I have to agree that you are not going to find someone perfect for everything that you want.

Tuesday 15 December 2015

Recognising and rescuing a failing big data project via @radar

Radar has a very interesting article - Common pitfalls and best practices every manager should know.

I completely agree with this sentence near the end - Promptly implementing small pieces of a plan tends to lead to more successful implementations down the road, as those pieces can quickly prove business value.

Using Python and R together: 3 main approaches via @KDnuggets

Great article from KDnuggets - Well if Data Science and Data Scientists can not decide on what data to choose to help them decide which language to use, here is an article to use BOTH.  I also recommend reading the article with the link at the end of this one - Integrating Python and R into a Data Analysis Pipeline, Part 1

Monday 14 December 2015

Big Data: 20 Free Big Data Sources Everyone Should Know via @BernardMarr

From Bernard Marr via Big Data Collective a great list of free data sources with descriptions.

I think a few others should be added to that list:

KDNuggets data repository
Kaggle competition datasets
Data Science Central  datasets - go look for them in the Apprenticeship area
Google Public Data Explorer (may be some crossover)
KDNuggets article on pubic datasets

Go find some that interest you and have fun playing with it.

Where Does Big Data Stop and Big Brother Start? via @Data_Informed and @BernardMarr

Bernard Marr discusses in Data Informed revelations of Russia’s data-enabled surveillance of its citizens and the potential for governments to use big data to erode civil rights.

He discusses something which could be quite frightening for us all - the rise of cars that are connected to the internet. It's kind of scary to think about the information that could be provided from accessing that data. Privacy is becoming less and less possible with our data and lives..

Sunday 13 December 2015

50 Useful Machine Learning & Prediction APIs via @KDnuggests

KDnuggets have created a list of 50 APIs selected from areas like machine learning, prediction, text analytics & classification, face recognition, language translation etc.

I've only done some limited things with APIs - this list gives me the impetus to go and play with some of them more.

How to avoid Over-fitting using Regularization? via @AnalyticsVidhya

Great post from AnalyticsVidya on how you can ensure the model does not consume more than enough attributes, avoiding Overfitting using Regularisation. Simplified explanation for quick & basic framework.

I recommend you read it as it may well help.

Saturday 12 December 2015

Big Data at NASA via @DataScienceCtrl and @BernardMarr

Big Data at NASA via @DataScienceCtrl and @BernardMarr  Great article looking at the quantities of data generated at NASA and the transfer rates. Reads like a major implementation of Big Data that could be used as a template for anyone.

SLIDESHOW: 25 Top Degree Programs for Data Analytics Professionals via @infomgmt

Via Information Management - Want to cash in on the current hiring craze for big data and data analytics talent? These top college and university degree programs will give you a great start. Split into 2 slideshows:

Slideshow 1

Slideshow 2

Friday 11 December 2015

WEBINAR: Use Predictive Analytics and Hadoop to Turn the Promise of Big Data into Business Impact - 15 December 2015

Logo

Turn the Promise of Big Data into Business Impact 

Use Predictive Analytics and Hadoop to Transform Insight into Action 

December 15th, 11:00 am ET


As Hadoop continues to grow in popularity, organizations struggle to turn the promise of Big Data into business value, as shown in recent Gartner surveys. Predictive analytics is a natural platform to extract business value from Big Data. But there’s a major skills gap that inhibits adoption, because the Hadoop architecture requires specialized coding skills in addition to data science expertise.

In our program, experts from Gartner and RapidMiner will help you close that gap. They'll discuss:

  • Hadoop growth and trends 
  • The challenges with production deployments 
  • The pros and cons of four different methods of extracting predictive analytics value from Hadoop 
  • And how to solve the skills gap issue by automatically translating predictive analytics processes into the many languages of Hadoop.


You’ll also find out how a code-free, drag-and-drop predictive analytics product makes creating and executing predictive analytics on Hadoop a fast and simplified process.

Whether you’re just beginning to work with predictive analytics or already have an experienced data science team, this webinar will show you it’s indeed possible to turn the promise of Big Data into business value today.

Register here

Analyzing the world’s news: Exploring the GDELT Project through Google BigQuery via @radar

From O'Reilly Radar - How to analyse, visualize, and even forecast human society using global news coverage - exploring the GDELT Project through Google BigQuery.

Very interesting article. I had no idea the inventory existed and that Google BigQuery was there to help these kinds of analyses.

What to look for in a data scientist via @radar

By Jerry Overton from O'Reilly "What's commonly expected from a data scientist is a combination of subject matter expertise, mathematics, and computer science. This is a tall order...[however] the skillset you need to be effective, in practice, tends to be more specific and much more attainable. This approach changes both what you look for from data science and what you look for in a data scientist."

A great summary of a difference between a Data Scientist and a Machine.

Thursday 10 December 2015

WEBINAR: Make Beautiful Interactive Data Visualizations Without Writing JavaScript - 15 December 2015




Hassle Free Data Science Apps

Build Richly Interactive Visualizations on Streaming & Big Data  with Open Source

Make Beautiful Interactive Data Visualizations Without Writing JavaScript
Date: December 15th, 2015
Time: 2 PM CT, 12 Noon PT, 3 PM ET

Data visualization is where your work comes to fruition - without communication, your insights don't turn into action, and your organization won't realize the value of your analytical work.

But creating and deploying data science apps is hard. You're a data scientist - not a web developer or designer. There has to be a better way.

That's why we created Bokeh, an interactive visualization framework for Python. Over the past 6 months, we've added a ton of powerful features and dramatically improved ease of use. On December 15th, Continuum Analytics CTO Peter Wang & Bokeh lead developer Bryan Van de Ven will present a webinar and show you how to create rich, interactive visualizations in the browser - without writing a line of JavaScript or HTML.

Register here

In the webinar, you'll learn to:

  • Use the Bokeh Visualization Framework to Easily Make Data Science Apps
  • Reproduce the Famous GapMinder Example - No JavaScript or HTML Required
  • Transform & Visualize Streaming Data with Scikit-Learn and Bokeh
  • Join us and learn to create beautiful, interactive visualizations, without the hassle.

Peter and Bryan will also conduct a live Q&A session after the presentation, so you can get answers to your toughest data visualization questions.

Presenters

PETER WANG

Peter Wang is the CTO and Co-founder of Continuum Analytics and the creator of Bokeh.

He has been developing commercial scientific computing and visualization software for over 15 years. He has software design and development experience across a broad variety of domains.

As a creator of the PyData conference, he devotes time and energy to growing the Python data community, and advocating and teaching Python at conferences worldwide.

Everyone who signs up will receive the recording, slides, and links to the notebooks on Anaconda Cloud.

BRYAN VAN DE VEN

Bryan Van de Ven is the lead developer on the Bokeh project. 

He holds an undergraduate degree in Computer Science & Mathematics form UT Austin, and a Masters degree in Physics from UCLA. 

Previously Bryan developed data exploration and visualization software for sonar feature detection, financial risk modeling, and fluid mixing simulation

WEBINAR: Testing the waters of today’s data lake - 16 December 2015




Data Lake – Five Tips to Navigate the Dangerous Waters
Date: Wednesday, December 16 Time: 11 a.m ET (60 min) 
Data is inherently fast. It flies into your data warehouse in milliseconds, it’s altered in nanoseconds. And yet when it comes to transforming dark nebulous data into consumable and actionable insight you’re moving at the pace of days and hours.
And it’s not just the speed to insight. Analysis should be responsive, ready to shift at a moment’s notice not tied down by legacy infrastructure ill-equipped to handle the data of today let alone the future.
Data Lakes are the newest method for storing and managing data. It offers improved speed, accessibility, and agility leading to improved insights. But without the proper approach, a data lake quickly becomes a data swamp. Join us for a live webinar designed to help you:
  • Learn about what a data lake is and what it isn’t
  • ŸOptimize your data lake for speed and agility to insight
  • Ensure even those without programming skills can leverage the data lake
  • Understand the new approach to governance that the data lake is driving
Presenter:
Mark Marinelli
Chief Technology Officer, Lavastorm 

Register here

WEBINAR: How Flextronics Uses Data Visualization and Analytics to Improve Customer Satisfaction - 15 December 2015




Overview

Title: How Flextronics Uses Data Visualization and Analytics to Improve Customer Satisfaction

Date: Tuesday, December 15, 2015

Time: 09:00 AM Pacific Standard Time

Duration: 1 hour

Summary
How Flextronics Uses Data Visualization and Analytics to Improve Customer Satisfaction

Flexibility in adapting to changing global markets and customer needs is necessary to stay competitive, and the Flextronics analytics team is tasked with making sure the Flex management team has accurate and up-to-date analytics to optimize performance, efficiency, and customer service.

In our latest DSC webinar series, Joel Woods from Flextronics’ Global Services and Solutions will share success stories around analytics on repairs and refurbishment of customer products utilizing analytics and data visualization from Tableau and Alteryx.

You will learn how to:

  • Use data analytics to improve cost savings
  • Resolve common data challenges such as blending disparate data sources
  • Deliver automated and on-demand reporting to clients
  • Provide visualizations that surface the analytics that matter to both internal teams and customers


About Flextronics:

Flextronics is an industry leading end-to-end supply chain solutions company with $26 billion in sales, generated from helping customers design, build, ship, and service their products through an unparalleled network of facilities in approximately 30 countries and across four continents.


Speakers:

Ross Perez, Sr Product Manager - Tableau
Joel Woods, Advanced Analytics Lead - Flex Inc.
Maimoona Block, Alliance Manager - Alteryx

Hosted by: Bill Vorhies, Editorial Director - Data Science Central

Image result for alteryx logo

Register here

WEBINAR: Engaging the Business: Agile, Collaborative Approaches to Data Usability - 15 December 2015



Engaging the Business: Agile, Collaborative Approaches to Data Usability

TDWI Speaker:David Loshin, President of Knowledge Integrity

Date: Tuesday, December 15, 2015

Time: 9:00 a.m. PT, 12:00 p.m. ET



Webinar Abstract
The term “data-driven” has become an accepted principle for modern organizations, but to drive modern, agile businesses, each data consumer’s view of enterprise data must both align with individual data quality and usability criteria and remain consistent with other data users in the organization. While traditional data quality/data preparation tools were intended to ensure accuracy and trust, the conventional wisdom centred on a technical, IT-centric usage model.
However, two emerging realities are rapidly changing the way we think about business data usability: many data sources and exploding data volumes, and a growing sophistication regarding information and analytics among business users. Empowering business users to control and satisfy their own data usability needs implies changes to the requirements for data discovery, profiling, and quality tools. It means providing tools allowing users to eliminate the dependence on the IT department while letting them examine the data, define their own data expectations, and monitor the conformance to their unique sets of business rules.
We are beginning to see the segregation of duties as part of an agile, collabortive activity. Business owners are increasingly expected to be accountable for their data sets’ compliance with business rules, while, the IT role has transitioned to infrastructure management to ensure that business rules can be executed. In this webinar we examine the changing face of data preparation and data quality tools.
Attendees will learn about:
  • Changing requirements for data preparation, data profiling, and data quality
  • Collaboration between business and IT staff
  • Shifts in methods of interaction for data usability tools
  • Pushing data stewardship to the business edge
  • Accountability for managing data risks

20 Big Data Companies Leading the Way via @Datamation

Article from Datamation - These Big Data companies are pushing the nascent analytics market toward greater adoption, with a wildly diverse array of tools and solutions.  Please note the list is alphabetical and not an indication of anything else.

Mostly companies that I would have expected to be on this list.


To MDM Or Not To MDM? That Is The Question… via @infomgmt

Interesting article from Information Management - Most have us have argued before now that MDM is a must, but it turns out that there are exceptions.

I find it an interesting article but there can be a halfway house to MDM - you still need documentation and you still need as part of that documentation to understand the interfaces between the data - where does it come from and what is the system of record.

Wednesday 9 December 2015

The Internet of Things Invigorates Operational Intelligence via @infomgmt

From Information Management - Operational intelligence is a set of event-centred information and analytic processes operating across an organization that enable people to use that event information to take effective actions and make optimal decisions.

Interesting article which looks at one possible use of IOT (which is very exciting as there are so many potential uses with lots of data to match)

Gawker Media’s Data Guru Presents the Case for Deleting Data via @ellisbooker.@Data_Informed

Great article from Data Informed - Josh Laurito of Gawker Media Group explained Gawker’s decision to limit data collection and the implications this decision has had for the business. 

A great idea if you can find anyone brave enough to do it.  I completely agree that we have and collect too much data (which then is not always good data) but it's often a safety blanket so many wouldn't be without it.

Tuesday 8 December 2015

7 Steps to Mastering Machine Learning With Python via @KDnuggets

From KDnuggets - There are many Python machine learning resources freely available online. Where to begin? How to proceed? Go from zero to Python machine learning hero in 7 steps!

I love this article. I'm more of an R person as my Python is very basic but this gives me the confidence to try to do more in Python.

Bridging The Big Data Skills Gap With Online Training via @InformationWeek

From Information Week - Big data technology vendors are helping to close the skills gap when it comes to training the next crop of data scientists. Here's what they are offering.

A great guide to all the free training out there.

Monday 7 December 2015

50 enterprise startups to bet your career on in 2016 via @BI_Europe

Business Insider UK has a great list of the top 50 startups which contain a surprisingly large number of company names many of us will be familiar with.  Look down it and see just how many you use or recognise.


Skill-Based Approach to Improve the Practice of Data Science via @bobehayes

From Bob Hayes - One way to improve the practice of data science is to learn data skills that are essential for good analytic project outcomes. 

Great an comprehensive analysis of the skills and proficiency of those skills on Business over Broadway.

Sunday 6 December 2015

Ten Top Business Intelligence Trends to Expect in 2016 via @infomgmt

Business intelligence continues to be one of the fastest-moving areas in the enterprise, and the techniques that organizations are using to drive adoption and get value from their data are multiplying. Here are ten top business intelligence trends to expect in 2016.

Interesting list on Information Management.

The Future of Big Data: How Data Lakes Open New Possibilities for Your Organization via @VanRijmenam

Via Mark van Rijmenam a data lake enables organizations to bring together all kinds of data, combine the data sources, generate meaning from it and, of course, derive value out of it. But what can you do with a data lake? What are the advantages of a data lake and what are the challenges of a data lake? In this extensive post we took a deep dive into the opportunities of data lakes and we explored why they are the future of big data and what it means for your organization.

Interesting article and well worth  read if you are wondering what they provide and what are he advantages. I'm looking forward to the planned articles on Data Lakes too.

Saturday 5 December 2015

Cheatsheet – Python & R codes for common Machine Learning Algorithms via @AnalyticsVidhya

From AnalyticsVindhya a collection of 10 most commonly used machine learning algorithms with their codes in Python and R.

A must have cheatsheet for everyone if for nothing that to remind you of all of the ones you could easily use.

The Art of Analytics, Or What the Green-Haired People Can Teach Us via @datanami

It’s been said that a picture is worth a thousand words. But in this big data age, perhaps we should say that a picture is worth 1,000 terabytes.

Interesting article by Alex Woodie on Datanami

Friday 4 December 2015

Virtual Reality Analytics Keynote Presentation via @Beraterzeitung

Amazing keynote presentation from Joerg Osarek about Virtual Reality Analytics.

I recommend you read this and start to think about and imagine what could be done with this kind of data/information. He rightly points out that there has to be an element of responsibility with this data as this really is crossing traditional privacy boundaries.

The top 12 Apache Hadoop challenges via @kumarchinnakali

via @kumarchinnakali Hadoop is a large-scale distributed batch processing infrastructure. While it can be used on a single machine, its true power lies in its ability to scale to hundreds or thousands of computers, each with several processor cores.

Great article by Kumar Chinnakali. I particularly find 8 a worry as sometimes you need to go live quickly before everything changes to make it worth less as a system/solution.

Thursday 3 December 2015

WEBINAR: Big Data Blueprints - 8 December 2015

logo

Pentaho Big Data Blueprints


Pentaho has developed a set of big data blueprints that empower organizations to generate real business value with Hadoop, NoSQL, and other emerging technologies.

Join Pentaho for this webinar for an overview of how to take advantage of three key design patterns - and the use cases they enable:


  1. Streamlined Data Refinery to integrate and access blended, enriched data sets for high performance analytics 
  2. Creating a 360-Degree View of your customer to provide on-demand analytical views across key customer touch points 
  3. Internet of Things: Harnessing Machine and Sensor Data


Speaker: Wael Elrifai, Director for Enterprise Solutions, EMEA
Date: Tuesday, 8th December 2015
Time: 11am GMT / 12pm CET
(the session will take 45 minutes, and will be followed by a brief Q&A session)

Register here

WEBINAR: Data Scientist Workbench Accelerates Predictive Analytics - 8 December 2015



Overview
Title: Data Scientist Workbench Accelerates Predictive Analytics
Date: Tuesday, December 08, 2015
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary
Data Scientist Workbench Accelerates Predictive Analytics
New technologies, Big Data and increasingly complex and multi-dimensional data relationships create a challenging environment for modern data scientists.  In today’s DSC Webinar we will discuss a modern Predictive Analytics platform to help manage and accelerate all phases of your predicative analytic process lifecycle, from source to result. 
In this latest DSC webinar you will learn how this agile digital workbench allows you to:
  • Streamline tedious and repetitive tasks
  • Access and prepare data for predictive analytics
  • Rapidly prototype and validate your models
  • Leverage your existing R and Python
  • Extract maximum value from Hadoop
  • Share results with the rest of the business
  • Quickly embed analytic results into business processes

Speakers: Dr. Martin Schmitz -- RapidMiner GmbH
Hosted by: Bill Vorhies, Editorial Director -- Data Science Central
Register here

WEBINAR: Hybrid Cloud and Big Data: A Perfect Marriage - 10 December 2015

Data Informed: Analytics on the Cloud 3-part webinar series

We are excited to announce this 3-part webinar series, Analytics on the Cloud, that will help you build high-performance analytics and Big Data solutions.

Part 3: Hybrid Cloud and Big Data — A Perfect Marriage 
Thursday, December 10, 2015 

In this session, uncover common enterprise use cases for big data, both on premise and in the cloud. You’ll also learn how IBM Cloud Infrastructure for Analytics addresses hybrid use cases through VPN connectivity and data transfer, as well as consistency and control over the software stack both on premise and in the IBM cloud.

Register here

WEBINAR: Trusted Techniques to Ensure the Security and Compliance of Big Data in the Cloud - 8 December 2015

Data Informed: Analytics on the Cloud 3-part webinar series

We are excited to announce this 3-part webinar series, Analytics on the Cloud, that will help you build high-performance analytics and Big Data solutions.

Part 2: Trusted Techniques to Ensure the Security and Compliance of Big Data in the CloudTuesday, December 8, 2015 

Business discussions around storing and accessing data — particularly huge amounts of data — in the cloud inevitably will hone in on security. Before any organization signs off on deploying big data in the cloud, critical security and compliance concerns will need to be addressed.

In this session, review enterprise requirements for data security and compliance in the cloud. Understand how IBM Cloud Infrastructure for Analytics addresses these requirements from end to end — for data at rest and data in motion — round-trip through Hadoop in the cloud.

Register here

Data Science for Losers, Part 5 via @brakmic

In this part 5 Harris is taking us all through Apache Spark DataFrames.  If you know DataFrames in Python or R you definitely have the advantage as he quite rightly points out. I'm looking forward to the next topic hopefully in a later blog entry.

You can find the blog article for part 5 here

Data Science for Losers, Part 4 via @brakmic

In this part 4 Harris is taking us all through Machine Learning.  He explains the different types of machine learning and takes us though the very basics of SCIKIT-LEARN.  I look forward to his next blog.

You can find the blog article for part 4 here

Wednesday 2 December 2015

WEBINAR: Redefining the Economics of Analytics - 8 December 2015



Complimentary Web Seminar
December 8, 2015
2 PM ET/11 AM PT
Brought to you by Information Management
As organizations struggle to make use of the ever growing volumes of data, they often confront legacy systems that are tough to work with, internal and external data silos that are messy and complex to integrate, and a corporate culture that may be slow to embrace analytics. On top of all this, you may be paying top dollar for an outdated analytics system that is mostly used to push data around. Is there a better way?
Please join this webcast to learn about how collective intelligence, native distributed analytics architecture, and enabling an analytics culture can help you embed analytics everywhere, innovate faster, and empower more people within your business to make better decisions.
Featured Presenters:
Moderator:
Jim Ericson
Consultant
Editor Emeritus
Information Management
Speaker:
Tim Wassman
Director, Global Field Operations
Dell Statistica
Sponsored by:
Sponsor

Register here

The Role of KPIs in Managing Big Data via @infomgmt

KPIs have a dynamic relationship - information from one set of performance indicators can suddenly draw attention to the key role of another indicator – so we need to access them in real time, not in a historical report.

Interesting article which hopefully will make you think and maybe change the way you see it.

5 Data Trends That Will Impact You in 2016 via @infomgmt

Hortonworks Chief Technology Officer Scott Gnau shared his thoughts on what will be five of the most important data trends to impact the CIO in the coming year.

Interesting thoughts from Scott in Information Management.

Tuesday 1 December 2015

Industrial IoT Outlook 2016 via @Intersog

Intersog provides a summary of the recent IoT research to better feel the pulse of the industry and see where it's headed in the future.

Interesting well thought out article - definitely worth a read.

Cutting Through the Buzz of Big Data; 5 Big Data Myths Debunked via @bigcloudteam @Datafloq

The industry is developing at a rapid pace, with the technology improving month-on-month instead of year-on-year. There is such a buzz about Big Data that the narrative has almost taken on a life of its own – it has become this mythical being that can slay uncertainty and save any business from an untimely end. It's time to debunk some of these myths.

Interesting article well worth a read.


All The Best Big Data Tools And How To Use Them

There are thousands of Big Data tools out there. All of them promising to save you time, money and help you uncover never-before-seen business insights. And while all that may be true, navigating this world of possible tools can be tricky when there are so many options.

Useful guide to understand what does what.