Thursday 31 March 2016

Embedded Analytics Becoming the Strategy of Choice for Many via @infomgmt

Embedded Analytics Becoming the Strategy of Choice for Many by Bob Violino via +Information Management  - The adoption rate of embedded analytics among business users is twice that of traditional business intelligence (BI) tools, according to a new study.


Tensor methods to solve machine learning challenges via @Oreilly

Tensor methods to solve machine learning challenges from +O'Reilly - David Beyer and Anima Anandkumar discuss high-dimensional learning of probabilistic latent variable models and the design and analysis of tensor algorithms.

Interesting read.

Tuesday 29 March 2016

WEBINAR: Increase your ROI with Hadoop in 6 Months - 6 April 2016

presented by Dell, Intel, and Cloudera

Are you struggling to validate the added costs of a Hadoop implementation? Are you struggling to manage your growing data?

The costs of implementing Hadoop may be more beneficial than you anticipate. Dell and Intel recently commissioned a study with Forrester Research to determine the Total Economic Impact of the Dell | Cloudera Apache Hadoop Solution, accelerated by Intel. The study determined customers can see a 6-month payback when implementing the Dell | Cloudera solution.

Join Dell, Intel and Cloudera, three big data market leaders, to understand how to begin a simplified and cost-effective big data journey and to hear case studies that demonstrate how users have benefited from the Dell | Cloudera Apache Hadoop Solution.

Speakers

Clarke Patterson
Senior Director, Product Marketing
Cloudera

Glenn Keels
Executive Director, Product Marketing, Engineering Solutions
Dell

Register and you will learn:

  • How Forrester Research showed a 6 month payback period on the Dell | Cloudera Apache Hadoop
  • How Hadoop installations can achieve a 96% ROI in a short period of time
  • What you need to quickly stand up a Hadoop cluster in a short period of time


Register here

Introduction To Bulk Deletion Of Column Values In Hadoop Development With MapReduce via @Datafloq

Introduction To Bulk Deletion Of Column Values In Hadoop Development With MapReduce by Evan Gilbort via +Datafloq - For all the technical readers among you, here is an article about learning how to delete bulk column values by using Hbase bulk loading with Hadoop MapReduce. Proficient Hadoop developers are sharing important things required for bulk column deletion in Hadoop development. You can follow the steps shared by them to know how they do it and benefit from the insights provided by them.

Very technical but very useful.

Monday 28 March 2016

Don’t Expect Your DBA to Do a Hadoop Expert’s Job via @Data_Informed

Don’t Expect Your DBA to Do a Hadoop Expert’s Job by Rod Bodkin via @Data_Informed -  Knowing the difference between DBAs and Hadoop administrators is essential to maximizing the return on your big data investment.

LinkedIn open sources its WhereHows data mining software via @ZDNet

LinkedIn open sources its WhereHows data mining software by Larry Dignan via +ZDNet  -  LinkedIn said it will open source an internal application called WhereHows, which is a data mining portal for enterprise information. Technically, LinkedIn calls WhereHows “a data discovery lineage portal.” From a business perspective, WhereHows is designed to surface data from multiple stores via metadata.

Sunday 27 March 2016

In Support of Technology 'Agnosticism' via @infomgmt

In Support of Technology 'Agnosticism' by Wayne Citrin via +Information Management  - With the proliferation of new languages and platforms, the ability to be technology-agnostic has evolved from a “nice to have” to a “need to have” — and developers face this reality daily.

Interesting.  Please note this is a 2 page artcle

Think You Want To Be "Data-Driven"? Insight Is The New Data via @infomgmt

Think You Want To Be "Data-Driven"? Insight Is The New Data by Brian Hopkins via +Information Management  - While 74% of firms say they want to be “data-driven,” only 29% say they are good at connecting analytics to action.

Interesting.

Saturday 26 March 2016

Study Reveals 'Massive' Skills Gap for NoSQL, Apache Cassandra Pros via @infomgmt

Study Reveals 'Massive' Skills Gap for NoSQL, Apache Cassandra Pros by Bob Violino via +Information Management  - DataStax surveyed more than 250 members of its DataStax Academy, and found increasing demand for NoSQL database experts, specifically those trained on Apache Cassandra.

A good indication of areas to study.

Are You Asking All the Wrong Questions About Apache Spark? via @infomgmt

Are You Asking All the Wrong Questions About Apache Spark? by David Weldon via +Information Management  - Like the adoption of any new technology, there are always important questions that the IT leader should ask before starting the implementation.

A view of questions that should be asked - I'm sure we can all think of more.

Friday 25 March 2016

Drive Influence with Uplift Modeling via @Data_Informed

Drive Influence with Uplift Modeling by Eric Siegel via @Data_Informed - Predictive Analytics World founder Dr. Eric Siegel discusses the role of predictive analytics in uplift modeling and influencing customer behaviour.

Good article and worth a read.

The Internet of Things and the Necessity of Fog Computing via @Data_Informed

The Internet of Things and the Necessity of Fog Computing by Jelani Harper via @Data_Informed - As the Internet of Things matures, traditional methods of processing and analysing data will struggle with constantly arriving streaming data. The fog model can provide greater access to data and speed time to action.

Interesting article.


Thursday 24 March 2016

WEBINAR: Could you Spot the Fox in your Henhouse? Protect Data Against Insider Threats - 29 March 2016


Complimentary Web Seminar
March 29, 2016
2 PM ET/11 AM PT
Brought to you by Information Management

Did you know that approximately 55% of threats to sensitive data come from insiders? Additionally, most organizations don't have the capabilities they need to monitor end-user and privileged-user behavior or safeguard sensitive data from insiders. Join this panel-style webcast to get the most current information on insider threats and learn about the 3 capabilities you need to secure sensitive data against insiders

Register here

4 Steps for Thinking Critically About Data Measurements via Harvard Business Review

4 Steps for Thinking Critically About Data Measurements by Thomas C. Redman via +Harvard Business Review - Four Steps for Thinking Critically About Data Measurements.

Interesting read.

Machine-Learning Algorithm Identifies Tweets Sent Under the Influence of Alcohol via MIT Technology Review

Machine-Learning Algorithm Identifies Tweets Sent Under the Influence of Alcohol via MIT Technology Review - Sending your ex-partner a teary-eyed tweet at 1 a.m. after a bottle of Chardonnay isn't necessarily the best of way of achieving reconciliation.

There is nowhere to hide thanks to a new machine learning algorithm.

Wednesday 23 March 2016

SLIDESHOW: Cashing In: The 18 Top Paying Big Data Certifications via @infomgmt

Cashing In: The 18 Top Paying Big Data Certifications by David Weldon via +Information Management  -  It’s good to be a data professional these days. Hiring demand is at record levels. Pay rates are getting there too. And most any data certification will earn you a bump in your paycheck. Here’s a look at what the top data certifications are earning in pay premiums.

Work a look.

11 Top books for IOT, AI, Big Data and Gamification via @jamsovaluesmart

11 Top books for IOT, AI, Big Data and Gamification by James Doyle via +JAMSO - Big Data is part of a broader puzzle. How will Big Data link to IOT, AI and be applied within cognitive science and gamification?

Useful list of books.

Tuesday 22 March 2016

Interactive plotting with rbokeh via @rbloggers

Interactive plotting with rbokeh by Teja Kodali on @rbloggers -  Teja shows us how you can use rbokeh to build interactive graphs and maps in R.

Very useful and helps you do great charts.

Oracle Cloud Growth Shows Promise as Company Tops Estimates via @infomgmt

Oracle Cloud Growth Shows Promise as Company Tops Estimates by Brian Womack via +Information Management  - Oracle, facing challenges similar to those of longtime rivals IBM and Microsoft Corp., is starting to show signs of success in the cloud.

Good to see that cloud is starting to make inroads for Oracle as that also suggests it is making inroads generally.

Monday 21 March 2016

Get Ready For BI/Analytics Vendor Landscape Reshuffle via @infomgmt

Get Ready For BI/Analytics Vendor Landscape Reshuffle by Boris Evelson via +Information Management  - Boris is now becoming convinced that there will be some Business Intelligence (BI) and analytics vendor shake ups in 2016.

There certainly are too many vendors and so it is inevitable that there will be some that fail or are taken over.

SAS Analytics Juggernaut Keeps on Truckin’ via @infomgmt

SAS Analytics Juggernaut Keeps on Truckin’ by David Menninger via +Information Management  - SAS has been a dominant player in the analytics marketplace for years, celebrating its 40th anniversary this year and reporting US$3.16 billion in 2015 revenue.

Seems they are not a has been and are still going strong.

Sunday 20 March 2016

Free Business Analytics Content – Thanks to Wikipedia – Parts 1 to 4 via @Datafloq

Free Business Analytics Content – Thanks to Wikipedia – Parts 1 to 4 via +Datafloq

Part 1

Part 2

Part 3

Part 4


It seems a shame not to read this :-)

How to Realize Real-Time Analytics and IoT Monetisation with Kafka via @Datafloq @ranadipa

How to Realize Real-Time Analytics and IoT Monetization with Kafka by Raj Nadipalli via +Datafloq  - Creating the architecture to support real-time data analytics or monetisation of Internet of Things data is not easy. What is needed is a fast, reliable platform that can support IoT feeds and enable functionality like real-time financial metrics or geolocation of inventory/goods or credit card fraud detection. Fortunately there is Apache Kafka, but what is it?

Good summary of Kafka if you don't understand what it does.

Saturday 19 March 2016

How Hadoop Has Truly Revolutionised IT via @Datafloq @GoodStratTweet

How Hadoop Has Truly Revolutionised IT by Martyn Jones via +Datafloq  - This is the story of how the amazing Hadoop ecosphere revolutionised IT. Before the advent of Hadoop and its ecosphere, the IT was a desperate wasteland of failed opportunities, archaic technology and broken promises. In the past years, Hadoop has revolutionized the Information Technology industry, resulting in new applications that can benefit from Data.

Sometimes it's worth reading about where we have already come so we know how great an achievement it already is.

How to Close the Big Data Talent Gap at Your Organization via @vanrijmenam @Datafloq

How to Close the Big Data Talent Gap at Your Organization by Mark van Rijmenam via +Datafloq  - Big data offers many benefits for organizations in all industries, but unfortunately a lot of companies don’t reap these benefits yet. The reason is not that they don’t want to start with big data, nor that they don’t understand what big data is. The challenge many companies face is attracting the right big data talent. So what should you do in order to attract the right talent and truly benefit from Big Data?

Interesting points on how to get and keep big data talent.

Friday 18 March 2016

WEBINAR: The Journey to Open Data Science - 23 March 2016



Deliver Innovation, Collaboration, and Interoperability


Open data science languages - R and Python - offer tremendous advantages over legacy, proprietary products like SAS and MATLAB. You can embrace modern innovation, attract a new generation of data scientists, and go from ad hoc analysis to production models in one platform that embraces the open source ecosystem.

But how does your enterprise make this transition without descending into anarchy? How can you embrace open source without entering into a quagmire of technical, process, and legal issues? How can you embrace R, Python, and their thousands of powerful analytic packages without their accompanying governance and legal risks? How do you see through the legacy vendor FUD and make open source work?

We're here to help - Continuum Analytics VP Products/CMO Michele Chambers and Sr. Data Scientist Christine Doig will help you embark on your enterprise's journey to open data science on March 23rd.

You'll learn how to:

  • Drive collaboration and true data science teamwork through open data science
  • Mitigate legal risk through indemnification and appropriate package selection
  • Democratize innovation through broad access to open data science tools
  • Bring advanced analytics to Excel-loving analysts with AnacondaXL

Christine and Michele will hold a Q&A session after the webinar, so tune in and get your questions answered.

Presenters:

MICHELE CHAMBERS @MCAnalytics

Michele Chambers is the VP Products and CMO at Continuum Analytics. She is the author of Big Data Big Analytics, Modern Analytics Methodologies, and Advanced Analytics Methodologies.

Michele is a 20-year veteran in advanced analytics, having served as the COO/President at RapidMiner, VP/GM of Advanced Analytics at Netezza (IBM), and Chief Strategy & Products Officer at Revolution Analytics (Microsoft).

She holds a MBA from Duke University and a BS in Computer Engineering.

CHRISTINE DOIG @ch_doig

Christine Doig is a Senior Data Scientist at Continuum Analytics, where she worked on MEMEX, a DARPA-funded project helping stop human trafficking.

She has 5+ years of experience in analytics, operations research, and machine learning in a variety of industries, including energy, manufacturing, and banking.

Christine holds a MS in Industrial Engineering from the Polytechnic University of Catalonia in Barcelona. She is an open source advocate and has spoken at PyData, EuroPython, SciPy, PyCon, and many other conferences.

Register here

When Does Education Level Matter in Data Science? via @bobehayes

When Does Education Level Matter in Data Science? by Bob Hayes @bobehayes - Getting an advanced degree can help improve your data science skills, but only for some job roles.

Interesting look at eduction vs job role in data science.

An Analytics Tookit

An Analytics Tookit by Kevin Gray - Kevin Gray explains the use of many of the various analytical tools available for researchers and data scientists to ply their trade

A must read and worth the time investment to go through

Thursday 17 March 2016

WEBINAR: Planes, Trains, and Automobiles - A Data Scientist’s Guide to Modeling Engine Degradation - 22 March 2016


Pivotal

Summary

Planes, Trains, and Automobiles - A Data Scientist’s Guide to Modeling Engine Degradation


With the growth of connected “things”, industries are presented with huge opportunities to leverage sensor data to improve their operations, products and services. With the proliferation of these devices, competitive advantages will develop from appropriate leveraging of the deluge of data. From connected appliances to jet engines, industries are already undergoing massive transformations.  Critical to success is the ability to not only collect data from sensors, but to also leverage big data technologies and data science expertise to extract actionable insights from the data.

It is critical to be able to model degradation of a machine to prevent catastrophic events and adjust maintenance scheduling. This is true in industries including oil and gas, transportation and even consumer products.

In this latest DSC webinar, the Pivotal Data Science team will present a data-driven approach to detecting and tracking jet-engine degradation using simulated sensor data. In particular we will focus on (1) data integration and cleansing, (2) transformation of time series data from sensors into meaningful features for modeling and (3) the algorithms used to build models to identify engine degradation patterns.

Speakers:

  • Sarah Aerni, Principal Data Scientist  -- Pivotal
  • April Song, Principal Data Scientist -- Pivotal


Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Register here

Analytics 3.0 and Data-Driven Transformation via @Data_Informed

Analytics 3.0 and Data-Driven Transformation by Chandramohan Kannusamy via @Data_Informed  - The advent of mobile, the Internet of Things, and cloud has reinforced the need for a new era of analytics to solve challenges in the customer, product, operations, and marketing domains. And new niche startups armed with data and digital weapons are all set to shake up the market with a wave of digital disruptions.

Good summary and vision to the next generation of analytics.

Study Finds the Greatest Challenge to Tech Training: Lack of Data @infomgmt

Study Finds the Greatest Challenge to Tech Training: Lack of Data by David Weldon +Information Management  -  A new study of the technology workforce finds that one of the largest challenges to tech training in general is one that data analysts can certainly appreciate – a lack of data on what works.

I have certainly spent substantial time in the past creating new data or scrubbing production data ready to be used in training. Only my own good practice has saved time going forward as I had saved all my scripts.

Wednesday 16 March 2016

The Data Science Process via @kdnuggets

The Data Science Process by Springboard via +KDnuggets  - What does a day in the data science life look like? Here is a very helpful framework that is both a way to understand what data scientists do, and a cheat sheet to break down any data science problem.

I would add to it that you need to take care of some of the international data like currencies and languages (as well as time as mentioned in the article), and defaults.

Automated Data Science and Data Mining via @kdnuggets

Automated Data Science and Data Mining by Gregory Piatetsky via +KDnuggets  - Automated Data Science is becoming more popular. Here is his initial list of automated Data Science and Data Mining platforms.

I'm sure over time the list will grow and grow.

Tuesday 15 March 2016

Facebook’s AI is learning by reading loads of children’s books via @NewScientist

Facebook’s AI is learning by reading loads of children’s books by Aviva Rutkin via +New Scientist  - If neural networks are modeled off the processes of the human brain, training them with the same source material we use to teach children has a certain appeal. Facebook recently released a paper on success they've had with this approach. The illustrated results are impressive—the model was asked to fill in a blank, mad-lib style, and did so correctly with "frightened".

Very interesting.

12 Drivers of BigData Analytics by Vishal Kumar

12 Drivers of BigData Analytics by Vishal Kumar - So, why is Vishal writing another blog on the importance of BigData & Analytics? A couple of days back he bumped into an executive, and a small talk went into an hour-long conversation on what is the business justification to starting the BigData initiative.

Well worth reading.

A Practical Guide to Anonymizing Datasets with Python & Faker via @DistrictDataLab

A Practical Guide to Anonymising Datasets with Python & Faker by Benjamin Bengfort via @DistrictDataLab - How Not to Lose Friends and Alienate People.

A very useful blog if you need to create some anonymous data.

Monday 14 March 2016

WEBINAR: Scaling for Success in Today's Data Driven World - 24 March 2016

sdtimes

Basho-Logo.png

Scaling for Success in Today's Data Driven World

Best Practices from Matt Davis, Sr. Site Reliability Engineer at Open X

DATE: Thursday, March 24, 2016
TIME: 1:00 PM ET

For many companies, as new sources of data flow into an organization the opportunities for business are almost up to the size of one’s imagination, provided you have the right tools and platforms in place to equip yourself to handle and deliver against these future promises.

Unfortunately many of these tools seem difficult to test, are changing rapidly and require a new skill set that teams may not have in place. The costs and risks associated with downtime from implementing or using the wrong solution are higher than ever.  Gartner published that the industry-recognized average cost of downtime is equivalent to roughly $5,600 each minute (over $300,000 per hour).

The risk is betting on the wrong infrastructure or not taking into account the requirement for agility and speed, scalability and consistency. Ultimately it comes down to the level of resiliency your business needs to be successful.

Join Matt Davis, Sr. Site Reliability Engineer at OpenX-- a global leader in digital and mobile advertising technology--  Heather McKelvey, VP of Engineering at Basho Technologies and David Rubinstein, Editor-in-Chief of SD Times, for a discussion of how OpenX is addressing the ever-increasing need and expectation for personalization, the growth of digital engagement, and the need to leverage Big Data for competitive advantage.

Among the topics they will address are:

  • Best practices around the new tools, methods and software stacks that will be needed to support the development of applications to capitalize on the deluge of data 
  • The tradeoffs and balances between speed, scalability and consistency and options on how your business can balance 
  • Some enlightening Basho research results that highlight the some of the costs and implications of database failure. 
Register here

Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image via MIT Technology Review

Google Unveils Neural Network with “Superhuman” Ability to Determine the Location of Almost Any Image  via MIT Technology Review - There are all kinds of interesting use cases for this. It even works indoors!

Wow.  Please note on this site you can only access 5 free articles per month.

Top Big Data Processing Frameworks via @KDnuggets

Top Big Data Processing Frameworks by Matthew Mayo +KDnuggets  - A discussion of 5 Big Data processing frameworks: Hadoop, Spark, Flink, Storm, and Samza. An overview of each is given and comparative insights are provided, along with links to external resources on particular related topics.

Really interesting and helps if you are not familiar with them

Sunday 13 March 2016

Growing Complexity of Data Integration, Governance Could Harm Companies via @infomgmt

Growing Complexity of Data Integration, Governance Could Harm Companies by Bob Violino via +Information Management  - The increasing complexity of the enterprise resource planning (ERP) application portfolio is driving the need for a “defined postmodern” application integration strategy, according to Gartner Inc.

I think that hybrid environments increase the need for good data integration and govenance - if we don't do that we will lose control which will make the data and any results pretty worthless.

Why pandas users should be excited about Apache Arrow via @wesmckinn

Why pandas users should be excited about Apache Arrow by Wes McKinney @wesmckinn - Good explanation from Wes and a must read if you are a Python user that uses Pandas.

Saturday 12 March 2016

The most important thing to know in Cassandra data modeling: The primary key via @PlanetCassandra

The most important thing to know in Cassandra data modeling: The primary key by Patrick McFadin, chief evangelist for Apache Cassandra via +PlanetCassandra  - Here is an explanation why.

Great article and a must read.

Developers vs Data Scientists - Different Approaches to a Common Goal via @infomgmt

Developers vs Data Scientists - Different Approaches to a Common Goal by Steve Miller and Bryan Senseman via +Information Management  - This article discusses the different approaches for essentially trying to achieve the same goal, and the learnings that can come from that.

CIOs Told to Upgrade Systems, Transform Models, and Embrace Analytics via @infomgmt

CIOs Told to Upgrade Systems, Transform Models, and Embrace Analytics by Bob Violino via +Information Management  - IT leaders need to modernize core systems, industrialise analytics capabilities and use autonomic platforms to transform IT operating models and infrastructures, according to a new study from Deloitte Consulting.

Interesting insight.

Friday 11 March 2016

WEBINAR: Bring the Benefits of Cloud Managed Services to your Oracle Applications - 15 March 2016

Logo

Webinar Event Details

Date: Tuesday, March 15, 2016
Time: Noon ET/ 9:00 am PT
Duration: 60 minutes (including Q&A)

What You'll Learn

The value of cloud for many data intensive applications is clear. But what about running Oracle workloads on cloud?

Is it safe to move mission-critical business applications like Oracle E-Business Suite, PeopleSoft, JD Edwards, etc. to the cloud? How long will it take you to prepare Oracle instances for development, test, or production? What about Oracle Database and Middleware license implications?

In this session, we will explain the value proposition of IBM’s Cloud Managed Services for Oracle Applications PaaS platform.

Attend this webcast to learn how IBM can help you:

• Reduce costs by using a managed cloud infrastructure and a staff experienced with Oracle

• Realize faster service delivery and time to value for Oracle-based business processes by enabling more rapid systems provisioning

• Improve the resiliency of Oracle environments through high-availability and disaster recovery options

Register today to begin enjoying the advantages and value cloud can bring to your Oracle applications.

Register here

Simple Yet Powerful Excel Tricks for Analyzing Data via @AnalyticsVidhya

Simple Yet Powerful Excel Tricks for Analyzing Data by Sunil Ray via +Analytics Vidhya - If you are keen to upgrade your data analysis skills, then you MUST go through this article, providing powerful tips & tricks for Excel & saving a lot of your time too.

Essential especially if like me you are a little rusty on Excel and need a quick reminder.

What Employers Now Want In Data & Analytics Leaders via @infomgmt

What Employers Now Want In Data & Analytics Leaders by Justin Cerilli via +Information Management  - Analytics leaders are now tasked to use data & analytics to drive employee enablement, cross-selling, operations improvement, culture change and innovation.

I think the storytelling and navigating political environments are key - often you need to be able to tell the story well and if you can't navigage the politics you are doomed to failure.

Thursday 10 March 2016

WEBINAR: How Data Can Protect You From Cognitive Bias - 15 March 2016




Summary

How Data Can Protect You From Cognitive Bias


The smarter you are, the stronger your cognitive bias. Data is a proven way to protect yourself from cognitive bias and can even augment your intelligence.

In this latest Data Science Central Webinar event we will educate and entertain you with learnings about cognitive bias and how data, when used correctly, can improve your decisions. It will also explore how to systematically analyse data using your visual system which results in better decisions.

Speaker: Jock Mackinlay, VP Visual Analysis -- Tableau

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Register here

How different SQL-on-Hadoop engines satisfy BI workloads via @CIOonline

How different SQL-on-Hadoop engines satisfy BI workloads by Thor Olavsrud via @CIOonline  - A new benchmark of SQL-on-Hadoop engines Impala, Spark and Hive finds they each have their own strengths and weaknesses when it comes to Business Intelligence (BI) workloads.

A must read if you are going to do this.

Scala vs Python- Which one to choose for Spark Programming? by @dezyreonline

Scala vs Python- Which one to choose for Spark Programming? by @dezyreonline - Choosing a programming language for Apache Spark is a subjective matter because the reasons, why a particular data scientist or a data analyst likes Python or Scala for Apache Spark, might not always be applicable to others. Based on unique use cases or a particular kind of big data application to be developed - data experts decide on which language is a better fit for Apache Spark programming.

Very clear and useful for anyone needing to make this choice.

Wednesday 9 March 2016

WEBINAR: How Data Can Protect You From Cognitive Bias - 15 March 2016



Overview

Title: How Data Can Protect You From Cognitive Bias

Date: Tuesday, March 15, 2016

Time: 09:00 AM Pacific Daylight Time

Duration: 1 hour

Summary

How Data Can Protect You From Cognitive Bias


The smarter you are, the stronger your cognitive bias. Data is a proven way to protect yourself from cognitive bias and can even augment your intelligence.

In this latest Data Science Central Webinar event we will educate and entertain you with learnings about cognitive bias and how data, when used correctly, can improve your decisions. It will also explore how to systematically analyze data using your visual system which results in better decisions.

Speaker: Jock Mackinlay, VP Visual Analysis -- Tableau

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central



Register here

Zookeeper and Oozie: Hadoop Workflow and Cluster Managers by @dezyreonline

Zookeeper and Oozie: Hadoop Workflow and Cluster Managers by @dezyreonline -  Four core modules form the Hadoop Ecosystem: Hadoop Common, HDFS, YARN and MapReduce. On top of these modules, other components can also run alongside Hadoop, of which, Zookeeper and Oozie are the widely used Hadoop admin tools.

Great overview.

IOT Is The Killer App For Big Data via @forbes

IOT Is The Killer App For Big Data by Mike Kavis via +Forbes  - this great article explains why IOT and Big Data go hand in hand.

Great article and well worth the time to read and understand. Please note this is a 2 page article.

Tuesday 8 March 2016

Internet-of-Things becomes Internet-of-Everything

Internet-of-Things becomes Internet-of-Everything by Per Richtun via +Information Management  - What we see now is that we still gather data from remote devices and sensors, but the data can be used to trigger action. To execute business processes. Or influence already running processes.

I agree with him - it's going to drive so many things going forward.

IoT and Asset Management: An interdependent relationship? via @infomgmt

IoT and Asset Management: An interdependent relationship? by ulinnig via +Information Management  - In IoT terms, how do you harness its value without asset management? I think the answer is, you don’t. IoT needs EAM, and vice-versa. Without that interdependence, only minimal value can be derived.

Interesting thoughts.

Monday 7 March 2016

Using Business Analytics to Make the Most of Data in 2016 via @infomgmt


Using Business Analytics to Make the Most of Data in 2016 by David Menninger via +Information Management  - Organizations generate data continuously, and they should analyse and refine it continuously – that is, optimize it – to improve their actions, decisions and processes.

He is completely right - if you do not simplify the data acquisition and reporting the effort to collect and process it makes the benefit so much smaller.

Big Data and Information Optimization in 2016 via @infomgmt

Big Data and Information Optimization in 2016 by David Menninger via +Information Management  - The big data market continues to expand and enable new types of analyses, new business models and new revenues streams for organizations that implement these capabilities.

Very insightful blog by David Meninger. He is completely right about needing to analyses in memory data - some data is already out of data by the time it has been written to disk.

Sunday 6 March 2016

Architecting and Structuring a Big Data Ecosystem via @Data_Informed

Architecting and Structuring a Big Data Ecosystem by Sabina Schneider and Alejandro de la Viña via @Data_Informed - Sabina Schneider and Alejandro de la Viña of Globant discuss challenges that often accompany big data ecosystem projects.

Very interesting article which will give you some very useful insights.

21 Must-Know Data Science Interview Questions and Answers via @kdnuggets

21 Must-Know Data Science Interview Questions and Answers by Gregory Piatetsky via @kdnuggets - 20 Questions to Detect Fake Data Scientists has been very popular on KDnuggets - most viewed in the month of January. However these questions were lacking answers, so KDnuggets Editors got together and wrote the answers to these questions. I also added one more critical question - number 21, which was omitted from the 20 questions post.

It has been split into 2 halves because it was so long.

Part 1

Part 2

Saturday 5 March 2016

Google releases new developer tool for analytics via @sdtimes

Google releases new developer tool for analytics by Christina Mulligan via @sdtimes  - Google is updating its analytics portfolio to keep up with the ever-changing Web. The company announced Autotrack for analytics.js, a new solution designed to give developers new tools to track their data.

Interesting update.


Top Embedded Analytics Business Intelligence Software via @PredAnalytics

Top Embedded Analytics Business Intelligence Software by imanuel via @PredAnalytics - Embedded Analytics is an approach to simplify Business Intelligence by embedding it directly into operational applications and business processes. Embedded analytics is using the reporting and analytic capabilities in transactional business applications. Embedded analytics capabilities can reside outside the application, reusing the analytic infrastructure built by many enterprises, and is easily accessible from inside the application. Sisense, Pentaho, Microstrategy, Yellowfin, …

Great article with examples within - a great first place to look at this area and what is available.

Friday 4 March 2016

What Qualifies A BI Vendor As A Native Hadoop BI Platform? via @infomgmt

What Qualifies A BI Vendor As A Native Hadoop BI Platform? by Boris Evelson via +Information Management  - Basically, to most of the BI vendors Hadoop is just another data source. Let's now see what qualifies a BI vendor as a "Native Hadoop BI Platform".

I never realised it was so complicated yet feels simple to achieve that - worth understanding.

Deleting Data Vs. Destroying Data: The Difference Can Be Damning via @infomgmt

Deleting Data Vs. Destroying Data: The Difference Can Be Damning by Pat Clawson via +Information Management  - ‘Deleting’ data only removes pointers to the data – creating the illusion that the data has been removed, when it can still be accessed and retrieved.

A good reminder of the difference between the two concepts.

Thursday 3 March 2016

APACHE ARROW: LINING UP THE DUCKS IN A ROW… OR COLUMN via @opendoorlabs

APACHE ARROW: LINING UP THE DUCKS IN A ROW… OR COLUMN by Tony Baer via @opendoorlabs  - Just released as a top-level project, Apache Arrow provides a unified data layer for the increasing numbers of in-memory analytics engines to build on. It will provide a significant speed boost to Spark, Storm, Drill, and most of the engines you're familiar with, which will all integrate with Arrow out of the gate.

Great article worth reading and think a bit more about.

Serving Analytics the Right Way via @opendoorlabs

Serving Analytics the Right Way by Kevin Teh via @opendoorlabs - Opendoor Data Scientist Kevin Teh talks about the right and the wrong ways to service the analytical needs of an organization. Don't: force analysts to handle one off requests or use spreadsheets. Do: have a shared model that helps users self-serve, and publish data to places where people can't help but consume it.

Great suggestions and great examples to back up the article contents. Recommended read.

Wednesday 2 March 2016

11 Important Model Evaluation Techniques Everyone Should Know via @DataScienceCtrl

11 Important Model Evaluation Techniques Everyone Should Know by Laetitia Van Cauwenberge  via @DataScienceCtrl - Model evaluation metrics are used to assess goodness of fit between model and data, to compare different models, in the context of model selection, and to predict how predictions (associated with a specific model and data set) are expected to be accurate. This article discusses some of them.

Very good article and well worth reading.

Interactive plotting with rbokeh via @rbloggers

Interactive plotting with rbokeh by Teja Kodali via @rbloggers - great blog explaining how to use rbokeh in R to produce interactive graphs and maps.

Includes R code and examples and recommended for reading if you use R.

Tuesday 1 March 2016

Best way to learn kNN Algorithm using R Programming by via @AnalyticsVidhya

Best way to learn kNN Algorithm using R Programming by Payel Roy Choudhury via +Analytics Vidhya  - Here's your comprehensive guide to kNN algorithm using an interesting example and a case study demonstrating the process to apply kNN algorithm in building models.

Great article with R code and examples.

Neck Deep In Data, More Firms Put Their Head In the Clouds via @infomgmt

Neck Deep In Data, More Firms Put Their Head In the Clouds by Bob Violino via +Information Management  - Continuous data proliferation is pushing organizations toward cloud storage models, according to research firm Frost & Sullivan.

Seems that cloud is growing and here to stay.