Sunday, 30 August 2015

Demand for Big Data Analytics Software Still Accelerating via @infomgmt

The big data software market -- including business intelligence and analytics solutions - will grow nearly sixfold by 2019, according to a recent report from Ovum. Article from Information Management.

Data and Analytics in the Cloud Is Real Today via @infomgmt

Private and hybrid cloud implementations of data and analytics often coincide with large data integration efforts says this blog from Information Management.

Saturday, 29 August 2015

Tips to prepare an outstanding CV for data science roles via @AnalyticsVidhya

The CV is something that makes your first impression. This great article from Analytics Vidhya aims to provide you with some thoughts to make your CV stand out from the stack of CVs for any data science role

21 Business intelligence and analytics terms you should know via @7wdata

A great list of high level BI terms that we should all know. Good please for beginners to start from.

Friday, 28 August 2015

Machine Learning for Programmers: Leap from developer to machine learning practitioner via @TeachTheMachine

Practical guide to help software developers get started with machine learning written by Jason Brownee.

A Beginner’s Guide to Eigenvectors, PCA, Covariance and Entropy via @deeplearning4j

Easy to follow introduction to eigenvectors and their relationship to matrices. For the most part, this is a plain English tutorial that continues with covariance, principal component analysis, and information entropy.

A brilliant resource from Deep Learning for Java and well worth bookmarking. I so wish this was around when I was doing my Mining Massive Datasets course.

Thursday, 27 August 2015

WEBINAR: The Role of Data Wrangling in Driving Hadoop Adoption - 1 Sept 2015

Event Information: The Role of Data Wrangling in Driving Hadoop Adoption


Registration is required to join this event. If you have not registered, please do so now.

Event status:	Not started (Register)
Date and time:	Tuesday, September 1, 2015 3:00 pm Central Daylight Time (Chicago, GMT-05:00)
	Tuesday, September 1, 2015 1:00 pm Pacific Daylight Time (San Francisco, GMT-07:00)
Duration:	1 hour
Description:	The Briefing Room with Mark Madsen and Trifacta Like all enterprise software solutions, Hadoop must deliver business value in order to be a success. Much of the innovation around the big data industry these days therefore addresses usability. While there will always be a technical side to the Hadoop equation, the need for user-friendly tools to manage the data will continue to focus on business users. That’s why self-service data preparation or "data wrangling" is a serious and growing trend, one which promises to move Hadoop beyond the early adopter phase and more into the mainstream of business. Register for this episode of The Briefing Room to hear veteran Analyst Mark Madsen of Third Nature explain why business users will play an increasingly important role in the evolution of big data. He’ll be briefed by Trifacta's Will Davis and Alon Bartur, who will demonstrate how Trifacta's solution empowers business users to “wrangle" data of all shapes and sizes faster and easier than ever before. They’ll discuss why a new approach to accessing and preparing diverse data is required and how it can accelerate and broaden the use of big data within organizations. Host: Eric Kavanagh CEO The Bloor Group Analyst: Mark Madsen CEO Third Nature Guest: Will Davis Director Product Marketing Trifacta Guest: Alon Bartur Principal Product Manager Trifacta All episodes of The Briefing Room are archived here:http://www.insideanalysis.com/webcasts/the-briefing-room/recent-episodes/.

Performing ANOVA Test in R: Results and Interpretation via @mcpasin

Great tutorial on the use of and interpretation of ANOVA for R by Marco Pasin. Well worth a read and bookmark.

Best way to learn kNN Algorithm using R Programming via @AnayticVidhya

Let's look at kNN algorithm using an interesting example and a case study to demonstrated the process to apply kNN algorithm in building models. Great blog/tutorial from Analytics Vihya. I wish I could have used this when doing a Mining Massive Datasets course.

Wednesday, 26 August 2015

The Rise Of The Chief Data Officer via BigDataMadeSimple

Sometimes change has to be accompanied by numbers. That is the foundation on which the Big Data revolution is built. Read this interesting article from Big Data Made Simple.

Importing Data Into R – Part Two via @DataCamp

Part 2 of the excellent blog/tutorial on importing data into R by the Data Camp team. One to bookmark and keep for sure.

Tuesday, 25 August 2015

5 Ways Big Data Disrupts Your Existing Data Warehouse (In A Good Way) via @infomgmt

The 'Data-First' approach -- combining data lakes with big data -- changes the way businesses think about their existing data stores. Here's how in this article from Information Management. I completely agree with #3 Data Management is key.

8 Steps to Business Intelligence Success via @infomgmt

Here's how to hit the road running with a new or revised business intelligence initiative. Great blog form Information Management containing steps that should be obvious but are not.

Monday, 24 August 2015

Optimisation Analytics Comes to the Mass Market via @infomngmt

Once the preserve of data scientists and operations research specialists, optimisation will become mainstream in general purpose business analytics over the next five years.

Great blog from Information Management.

The Challenges and Opportunities of Big-Data-as-a-Service via @Datafloq

The relevance of BDaaS is gradually spreading across industries and its visibility is on the rise. Its usage is becoming more common in sectors like business, health, finance, retail, governance and telecommunications. With more joining the bandwagon, it is on its way to becoming the next face of the information revolution. But what are some of the challenges and opportunities of BDaaS?

Great article from Datafloq with great diagrams to make it clearer.

Sunday, 23 August 2015

How to choose the right data science / analytics / big data training? via @AnalyticsVidhya

Great guide to help you work out what training you need and where to go to in order to get it.

Click through from the guide to their training listing page. I can recommend the Johns Hopkins courses on Coursera - I have done all but one of them (which I shall do in September)

5 step checklist of multiple linear regression via [Data-Mania.com]

Read this excellent checklist and all the help around it from Data-Mania. If you are not signed up to her site I strongly recommend that you do.

Saturday, 22 August 2015

Analytics and the Customer Lifecycle Management: Fixing the Disconnect via @infomgmt

Data analytics is everywhere, but most efforts fail to address customer lifecycle management. Here's how to set things right from Information Management.

This is critical to get right as I have seen massive customer databases or tables containing mostly out of data or duplicate information. It gives wrong answers to certain questions which can be expensive.

Don't Throw Hadoop at Every BI Challenge via @infomgmt

While deploying BI on Hadoop offers multiple benefits, you'll also face a range of challenges.

Great blog from Information Management.

I think points 1 and the very last bullet point are very apt -

There is no way Hadoop will work for your BI if everyone thinks they own it - there has to be ONE owner and many others that march to the same tune.

Data Governance is key on all projects and types of project - if the governance is not in place and quality is not assured then the results will be worthless or wrong.

Friday, 21 August 2015

7 Types of Regression Techniques you should know! via @AnalyticsVidhya

Here are the 7 types of regression techniques that a data scientist should know from the blog on Analytics Vidhya. Great blog and well worth a read.

38 Seminal Articles Every Data Scientist Should Read via @DataScienceCtrl

Here is selection containing both external and internal papers, focusing on various technical aspects of data science and big data. From Data Science Central. A definite add to your favourites I'm sure.

Thursday, 20 August 2015

WEBINAR: The Key to Big Data Modeling: Collaboration - 26 Aug 2015

Collaboration is the Key

Some claim that, in the age of Big Data, data modeling is less important or even not needed. However, with the increased complexity of the data landscape, it is actually more important to incorporate data modeling in order to understand the nature of the data and how they are interrelated. Data modeling must adapt to handle the increasingly complex enterprise landscape including Big Data.

The Key to Big Data Modeling: Collaboration
Wednesday, August 26, 2015
11:00am Pacific / 2:00pm Eastern

One of the key data modelling issues for Big Data is how to foster collaboration between new groups, such as data scientists, and traditional data management groups. There are often different paradigms, and yet it is critical to have a common understanding of data and semantics between different parts of an organization. In this session, Len Silverston will discuss:

How Big Data has changed our landscape and affected data modeling
How to conduct data modeling in a more ‘agile’ way for Big Data environments
How we can collaborate effectively within an organization, even with differing perspectives

WEBINAR: Build Smarter Applications Fuelled by Data with IBM and Apache Spark - 25 Aug 2015


Build Smarter Applications Fueled by Data with IBM and Apache(r) Spark™
Tuesday, August 25, 2015 01:00 PM EDT The combination of data and design is revolutionizing data science today. It’s not just about data access anymore. It is about embedding analytics fueled by data into the fabric of business and society. It’s also about data scientists and data engineers. IBM is committed to educating these data professionals worldwide on Apache(r) Spark(tm) technology, to help data scientists build models quickly, and iterate faster. IBM sees Spark as the analytics operating system upon which developers of all types, from startups to giant corporations, can build analytics. It’s about innovation, to drive intelligence into every business application including: IoT, web, mobile, social, business process and more. Combining data, design and speed, IBM and Spark are creating a new blueprint of innovation together. This is the start of something big. Join us and learn and hear how smarter applications fueled by data are powering the enterprise today combing the power of data, simplicity of design and speed of innovation. Presenters: Kimberly Madia World Wide Product Marketing Manager IBM Karen J. Bannan Moderator and Technology and Business Journalist Register here

6 Signs You're Going to Fail At Big Data via @infomgmt

Instead of big data discussing successes, it's often more valuable to learn from mistakes. Such is the case with big data -- where it's essential to avoid these six common mistakes.

Interesting analysis piece from Information Management.

Addressing the Predictive Analytics Skills Gap via @infomgmt

It takes a team with domain knowledge, statistical and mathematical knowledge, and technical knowledge to integrate predictive analytics into other technology systems and line of business (LoB) operations.

Interesting blog from Information Management.

Wednesday, 19 August 2015

Getting smart with Machine Learning – AdaBoost and Gradient Boost via @AnalyticsVidhya

Boosting is one of the most powerful tool used in machine learning. Let's get smart with Machine Learning with AdaBoost and Gradient Boost.

Great article to try and explain boosting in simple terms. I would say if you have a dataset you know well then just try it and see what difference it makes against the results you already have.

For Analytics to Have an Impact, Keep it Simple via @Data_Informed

Insight into the analytics process can boost decision makers’ confidence in the results. Tyler H. McCormick, Cynthia Rudin, Dmitry Malioutov and Kush Varshney offer tips for how to make analytics transparent and, therefore, more impactful.

Great article containing tips from Data Informed.

Tuesday, 18 August 2015

Hot? Warm? Cold? Which Data Should You Move to Hadoop? via @Data_Informed

William Peterson of MapR lists considerations and steps that can minimize disruption to your business while offloading data to Hadoop.

A great list of clear steps to move to Hadoop with as little disruption as possible..

Insights-as-a-Service Grows with Focus on Real Time via @Data_Informed

As organizations eye time to insight as a key business differentiator, insights-as-a-solution offerings rise to meet the need for speed, writes Jamie Thomas.

Interesting article from Data Informed

Monday, 17 August 2015

WEBINAR: Taming the Beast: Extracting Value from Hadoop - 20 Aug 2015

Taming the Beast:
Extracting Value from Hadoop

Thursday, August 20 at 8am PT / 11am ET

About this Webinar

After deploying a data lake, organizations often reflect "I bet the farm on Hadoop, now what?" They have broken down the silos, mashed up structured and multi-structured data, and set up the Hadoop clusters. However, the investment has yet to pay off. The CIO wants answers. The CMO wants actionable information. The CEO wants results. Organizations need information on how to deliver on the promise of big data analytics.

What are the pitfalls to avoid?
How are other organizations succeeding?
What are best practices for implementing advanced/modern analytics?

Join Dr. Ingo Mierswa, CTO at RapidMiner; John Myers, Managing Research Director at leading IT analyst firm Enterprise Management Associates (EMA); and Lyndsay Wise, Research Director at EMA, for a discussion on how to close the loop between predictive insights and action using big data analytics.

Attendees will gain insight on

How to give yourself a Hadoop reality check for those stuck in the hype of the data lake
Empowering analysts to anticipate the opportunities and risks of big data analytics
Guidance on monetizing insights buried in your multi-structured data
Building and deploying predictive models spanning cloud and on-premise environments

Who Should Attend

Members of the "Collaborative Team," including:

Business users
Business analysts
Data scientists
IT professionals

We look forward to your participation!

-RapidMiner

WEBINAR: Leveraging Data for Effective Data Visualization - August 19 2015

Be sure data is verified before it's visualized.

You’re invited to this free webinar:

Leveraging Data for Effective Data Visualization

Date: Wednesday, August 19 Time: 11 a.m. ET (60 min)

Data visualization tools empower Business Analysts to synthesize millions of variables and piles of spreadsheets into functional dashboards. Unfortunately, in many companies, the need for better data is not part of the drive for better dashboards.

The reality is, today’s data visualization tools are only as good as the data they reflect. Helping users consolidate, transform and deliver the most accurate and up-to-date information is critical to leveraging your dashboards and the data behind them. In this live webinar, you’ll learn:

Ÿ actionable steps to improving data prep for data visualization

Ÿ why agile data governance and management is key to data visualization success

Ÿ strategies for adopting an agile, self-service approach to data access, analytics and visualization.

Presenter:

Lyndsay Wise - Research Director, Business Intelligence and Data Warehousing, EMA

Lyndsay Wise joined EMA in 2015 as Research Director for Business Intelligence (BI) and Data Warehousing, focusing on data integration, data governance, cloud technologies, data visualization, analytics, and collaboration. In 2007, Lyndsay founded WiseAnalytics, a boutique analyst and consulting firm focused on business intelligence for small and mid-sized organizations. She provided consulting services as well as industry research into leading technologies, market trends, BI products and vendors, mid-market needs, and data visualization. She has over 10 years experience in software research, BI consulting, and strategy development, specializing in software evaluation and best-fit solution selection. Lyndsay is also the author of Using Open Source Platforms for Business Intelligence: Avoid Pitfalls and Maximize ROI.

Analytics Success Requires 3 Types of People via @infomgmt

Leaders must first recognize that analytics skill sets must be developed in all of their people, not just the data analysts.

Interesting article from Information Management.

Essentials of Machine Learning Algorithms (with Python and R Codes) via @AnalyticsVidhya

If you are aspiring data scientist or you are a machine learning enthusiast this would be one of the most useful guide in your journey. Here are the various machine learning algorithms along with R & Python codes to run them. Get ready to explore them.

Amazing guide containing machine learning algorithms from Analytics Vidhya - definitely something to check out.

Sunday, 16 August 2015

SLIDESHOW: 8 Data Science Job and Career Skills via @infomgmt

Whether you’re a student or a business professional looking to make a career change, Airbnb Data Scientist Dave Holtz says there are eight core competencies you’ll need to succeed in the field of data science.

Slideshow from Information Management.

Marketing Analytics: Essentials of Cross-Selling and Upselling (with a case study) via @AnalyticsVidhya

Cross Selling and Up-selling is one of the most prominent strategy used across marketing strategy of any company. Here is how marketing analytics is driving these via Analytics Vidhya

Saturday, 15 August 2015

What We’ve Learned About Sharing Our Data Analysis via @jsvine

Publishing reproducible data analysis is an expectation in many domains and is growing in popularity. Here's a good overview of what that means exactly and how one news team is accomplishing it.

I would add to his article the following:

You can learn about Reproducible Research with R in this excellent free course from Coursera and Johns Hopkins University.

You can create a R markdown document (which merges R code into a document which is also a report of the analysis you have done. It can then be published on RPUBS for all to see.

The New Science of Sentencing via @MarshallProj

Should prison sentences be based on crimes that haven't been committed yet? Excellent article that explores some profound impacts that data has on society.

Friday, 14 August 2015

WEBINAR: State of the Union: Mobile Web Performance - Aug 19 2015

Wednesday, Aug 19, 2015, 10AM PST / 1PM EST

Webinar Speaker

Tammy Everts
Senior Researcher & Evangelist

Dive into the latest research into the mobile performance of the world’s most popular e-commerce sites as we seek to answer the question: In the fight to offer shoppers the richest possible content on mobile devices, are retailers helping or hurting the user experience?

This webinar looks at performance metrics such as load time, time to interact, page size, page composition, and adoption of performance best practices.

In this webinar, we will cover:

What mobile shoppers care about in their online experiences
Three worst practices you should avoid
Three best practices you should adopt

WEBNAR: When is the right time for real-time? Architectural best practices for Hadoop - 18 August 2015

Title: When is the right time for real-time? Architectural best practices for Hadoop

Date: Tuesday, August 18, 2015

Time: 09:00 AM Pacific Daylight Time

Duration: 1 hour

Summary

Please join us on August 18, 2015 at 9am PDT for our latest Data Science Central Webinar Series: When is the right time for real-time? Architectural best practices for Hadoop sponsored by MapR and ThinkBig, a Teradata Company.

Real-time processing is an important part of your Hadoop architecture, but is it always the best approach to analytics? Join us for our latest DSC Webinar with experts from MapR and Think Big, as we delve into the decision making process around Hadoop real-time and batch processes. You will learn the ins and outs of low-latency design for analytics, as well as see how these designs get implemented in the real world.

You will learn:

Useful design patterns for building your Hadoop stack that best serves low-latency requirements
Pitfalls to avoid when choosing your real-time processing option
Real customer examples highlighting decision-making processes for both real-time and batch processing

Panelist:
Steve Wooledge, Vice President, Product Marketing -- MapR
Bill Kornfeld Director, R&D -- Think Big, a Teradata Company

Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

How Data Management Best Practices can enhance the Quality of your Data? via @habiledata

Blog discussing how Data Management can affect your Data Quality.

I completely agree that a vast amount of money is wasted by making business decisions based on bad data.

The 10 best cities to find a big data job in the US via @DataScienceCtrl @BernardMarr

Blog from Bernard Marr in Data Science Central of the 10 best cities to find Big Data jobs in the US.

I agree with Vincent Granville's comment and would add Austin TX to the list.

Thursday, 13 August 2015

WEBINAR: Data Mining: Failure to Launch -- How to Get Predictive Modeling Off the Ground, and Into Orbit - 18 Aug 2015

Tue, Aug 18, 2015 4:00 PM - 5:30 PM BST

WHAT'S COVERED
The vast majority of BI professionals and Big Data enthusiasts are excited about the prospects of data mining and predictive analytics, but are fully mystified about where to begin or even how to prepare. Of those who did initiate a modeling initiative, a recent data mining industry survey of predictive modeling practitioners reports that 51% of data mining projects either never left the ground, did not realize value or the ultimate results were not measurable.

Despite its elusive nature, data mining technology has surpassed the flash-in-the pan “miracle tool” stigma with widespread and sustained success stories highlighted in mainstream publications, along with recurring case studies of improved operational efficiencies, enhanced business intelligence and residual payback. For any organization with annual revenues more than $50 million, employing predictive analytics is not a matter of whether, but when.

Attend this free webinar to learn how to get started with data mining and overcome limitations that cause data mining projects to fall short of their potential.

WHAT'S DELIVERED
This webinar is intended for stakeholders, functional managers and business practitioners in business, industry, government and academia, who have made substantial investments in data collection, storage, retrieval, visualization and basic analysis but may not have the technical or strategic experience necessary to chart an effective roadmap to uncover the valuable predictive insights hidden within their existing data. No prior knowledge is required. The webinar will cover:

- How and where to get started
- Why failure to implement is so common, and why pitfalls are so avoidable
- Case summaries that reveal the rewards of proper design and implementation
- Why establishing an internal predictive modeling practice is within your reach
- Live participant polls and an interactive guru session with the expert
- Resources and direction on how to move forward with confidence
- And more...

7 Steps of Data Exploration & Preparation – Part 2 via @AnalyticsVidhya

Why missing values occur in our data and why treating them is necessary?

Great blog by Analytics Vidhya.

I would add to their section on why the data has missing values by adding this. If you have any control on how the data is obtained it is critical that care is taken.

If the data comes via web screens that obtain input from users, make sure they are presented with a list of drop down values to choose from.
If there are likely to be missing data provide an input option of "Not Applicable".
Use default values where possible.

These also improve performance for querying any physical data table.

Slowing Hadoop Growth? Latest Data Suggests Otherwise via @infomgmt

In the fast-growth big data market, some pundits spent recent months wondering if Hadoop's rapid rise was set for a slowdown. At least for the moment, Hortonworks has silenced those critics.

Article in Information Management

Wednesday, 12 August 2015

Handling Missing Data via @OReillyMedia @jakevdp

In most data science tutorials, data is presented as clean and homogeneous. In the real world, getting pristine data is cause for celebration. In the latest instalment of the Python Data Science Handbook (Early Release), Jake VanderPlas looks at how to use built-in Pandas tools for handling missing data in Python. Great article on Python functionality for something that we all have issues with..

How Experian Is Using Big Data via @infomgmt

Experian deploys MapR’s distribution of Hadoop, Syncsort’s DMX-h data integration platform and other big data technologies to support its business. Great case study from Information Management.

Tuesday, 11 August 2015

Addressing the Predictive Analytics Skills Gap via @infomgmt

Get Knowledge from Best Ever Data Science Discussions on Reddit via @AnalyticsVidhya

There are things that only experience can teach and these data science discussions on Reddit exemplifies that Great blog from Analytics Vidhya

Monday, 10 August 2015

Special Report: How to Use Predictive Modeling to Pick Your Best Prospects & Boost ROI Up to 172% via @MarketingSherpa

What if you could better predict which of your past customers are your best prospects to purchase again? You can with predictive modelling. See how you can use predictive modelling - the Holy Grail of direct marketing - to wrestle with and segment mounds of customer data.

Great guide from Marketing Sherpa.

IBM Bolsters Spark for Analytics on Linux Mainframes via @infomgmt

IBM continues to invest in Apache Spark - an open source platform for big data analytics. The latest moves involve Apache Spark for Linux running on IBM mainframes, plus partnerships with three data-mining software companies.

An interesting development. Article from Information Management.

Sunday, 9 August 2015

Let's Break All The Data Rules!

Companies that challenge pre-existing rules are winning. Some quick back-of-the-napkin stats show that non-technology companies that break data rules and think insight first outperform their S&P cohorts by almost 20%

Great blog from Michele Goetz on Information Management. I particularly like #3 which will be easier and makes it more flexible whilst not being a free for all.

Why Data-Driven Cultures Outperform Rivals

Data-driven organizations innovate more quickly and can anticipate the needs of their customers, continuously improving and developing the next generation of products and services. That drives significant incremental revenue over competitors.

A reason to work hard at getting it right. Article from Information Management.

Saturday, 8 August 2015

How Data Is Redefining CIO & Chief Data Officer Roles via @infomgmt

Heidrick & Struggles Partner Paul Groce describes in Information Management the evolving talent landscape for CIOs, CTOs and chief data officers in the age of data-driven business leadership. Plus, the latest on Chief Information Security Officer (CISO) roles.

15 Questions All R Users Have About Plots via @DataCamp

Great blog post from DataCamp giving you the low down on plots within R. A great reminder even for experienced R users of some things you may have not used for a while.

Is A Data Lake THE Answer? Think Again. Here Comes Elastic Analytics via @infomgmt

In Information Management Brian Hopkin's blog looks at something which bares some resemblance to a data lake that is in a cloud and give analytics on demand.

I can see things going the way Brian Hopkins describes in his blog, but the organisation has to be in the right place to achieve something like this and I worry that some organisations are too stuck in old ways to make this big a shift.

Friday, 7 August 2015

WEBINAR: In-Memory Processing for High Performance Analytics - 11 August 2015

Summary

In-memory database processing is a hot topic in the market, and for good reason. It can deliver performance and system efficiencies for your analytics which in turn can yield business benefits. But adopting in-memory requires consideration of multiple issues such as product cost, expected performance benefits, and ongoing database/memory management. Not all in-memory database technologies are created equal.

Join us in this informational webinar where guest speaker Noel Yuhanna, principal analyst at Forrester Research, will share his current research and market insights on in-memory database technologies and trends. Imad Birouty, director of product marketing at Teradata Corporation, will then share Teradata's approach to in-memory and how advanced engineering techniques help companies gain the most performance at the lowest cost.

Featuring:

Today's Speakers:

Noel Yuhanna, Principal Analyst, Forrester

Imad Birouty, Director of Product Marketing, Teradata

How to Ensure a Successful Transition to the Cloud via @Data_Informed

Cloud offers myriad benefits and can be a strategic differentiator for companies. Bill Shute of Viewpointe helps you assess cloud options and offers tips for selecting the right cloud provider.

We need to thing and plan carefully when implementing a cloud infrastructure and seek experienced help if you are not confident. Best to be careful than get it wrong.

Taking Business Intelligence to a Whole New Platform via @Data_Informed @jamesafisher

James Fisher of Qlik discusses the growing adoption of data-discovery solutions in the BI market and the advantages of a platform approach to data analytics.

As the world of data transforms (big data, data lakes, Hadoop, etc.) so must the BI we use to analyse the data. If BI doesn't try to keep up, then more and more will be see the rise of in house written and developed tools that waste time and resources.

Thursday, 6 August 2015

Can’t Find a Data Scientist? Turn to a Business Analyst via @Data_Informed @cchristopher

With a shortage of data scientists to meet demand, businesses should look to business analysts to take on many tasks that previously were the responsibility of the data scientist, writes Chael Christopher of New Vantage Partners.

I do agree with him to a point. I would add that there are a lot of MOOC free online courses (for example the Data Science specialisation by Johns Hopkins) which that person could do to add a little more knowledge in the areas they need to add.

10 Top Commercial Hadoop Platforms via @Data_Informed @BernardMarr

Bernard Marr shares his view of several commercial Hadoop distributions on Data Informed.

Many of the companies that offer Hadoop (including some on his list) offer their own version of the open source software. As he points out, many of them are within their own cloud and therefore are via a subscription. I have a concern as to how easy it would be to then change vendor and not lose functionality or data in some form. Or even if you then get stuck being tied to a vendor and having to pay increasingly higher costs. Something that needs to be taken into account when picking a vendor.

Wednesday, 5 August 2015

The APPLY family of functions in R via @DataCamp

A great tutorial from the blog at DataCamp explaining the APPLY family of functions. Something everyone who uses R should know and understand if they want to start manipulating data in some way - I've used it to swap rows and columns around.

Earlier Generation BI Needs A Tune Up via @infomgmt @bevelson

It's time for systems of insight and next-generation business intelligence says Forresters's Boris Evelson in this blog on Information Management.

I have to agree with him. Big Data, IOT and all the other current and future trends around data have caused a lot of change along the lines of Hadoop, Spark, MySQL, etc. but I don't see equivalent changes in the BI tools area. Yes they were changed to handle some of these changes, but they haven't changed at the same pace.

Tuesday, 4 August 2015

Internet of Things (IoT) Unlocks Revenue Growth Opportunities via @infomgmt

More than 80% of 795 companies surveyed by consulting firm Tata Consultancy Services (TCS) increased revenue by investing in the Internet of Things (IoT).

Interesting numbers in this article from Information Management.

9 Master Data Management & Data Governance Trends to Track via @infomgmt

Business must navigate nine Master Data Management (MDM) trends to enhance data governance, customer service, supply chain management and more, according to Aaron Zornes, chief research officer at The MDM Institute: Article here on Information Management.

Some interesting trends to think about.

The Rise of NoSQL via @infomgmt

The continual increase in unstructured big data from the Internet of Things, the changeable requirements for developing successful mobile apps and the trend for user-generated content are paving the way for NoSQL databases to prove their value.

Good blog from Information Management

Predictive Analytics Enters the Business Mainstream via @infomgmt

What the latest research also reveals about predictive analytics adoption trends and outcomes from Information Management.

Monday, 3 August 2015

SLIDESHOW: 8 Data Governance Design Principles via @Iinfomgmt @1stSanFrancisco

Follow these key steps from Angie Pribor of First San Francisco Partners.

I have to agree with #8 - you think it has been communicated adequately but it never seems to have been, so communicate it more.

Improve Customer Experience: Make Big Data an Actionable Asset via @BigData_Review @UnboundID

We’re generating data at a staggering pace, creating more than 90% of the total amount of information that exists in the world in the last few years. This tremendous wealth of data has the potential to provide companies with highly valuable insights.

Good article with things that should be common sense but often aren't.

What’s the difference between Causality and Correlation? via @AnalyticsVidhya

Do you end up using the words Causation and Correlation interchangeably? These similar sounding names have different fundamental implications. Great blog from Analytics Vidhya explaining the difference between the two words.

Sunday, 2 August 2015

Optimize Cost of Enterprise Data Warehouse with Apache™ Hadoop via @CIGNEX

The Enterprise Data Warehouse built using Teradata, Oracle, DB2 or other DBMS is undergoing a revolutionary change. As the sources of data become rich and diverse, storing them in a traditional EDW is not the optimal solution.

Interesting blog from Cignex.

8 Objectives for Your MDM Strategy via @infomgmt

Experts at the MDM & Data Governance Summit in San Francisco deliver timely guidance. In this article they consider the situation at Cargill Inc. The 150-year-old provider of food, agriculture, financial and industrial solutions worldwide. Armed with $134.9 billion in annual revenues and roughly 152,000 employees, Cargill leverage MDM best practices to speed decisions and squeeze costs out of its supply chain, according to their Data Management Lead Brad Williams.

Interesting article.

How to Formulate Your Internet of Things (IoT) Strategy via @infomgmt

Having an Internet of Things (IoT) foundation is critical to enabling connected products, assets and supply chains, according to new report International Data Corp. (IDC).

Interesting article from Information Management.

Saturday, 1 August 2015

The truth about MapReduce performance on SSDs by @yanpeichen and @kashkamb via @radar

It is well-known that solid-state drives (SSDs) are fast and expensive. But exactly how much faster — and more expensive — are they than the hard disk drives (HDDs) they're supposed to replace? And does anything change for big data?

Great article by Yanpei and Karthik where they show that the cost-per-performance is approaching parity with HDDs.

Watson Thinks It Can Critique Your Writing via @ExtremeTech

IBM has just unveiled a new (experimental) tool in the Watson arsenal — the Tone Analyser. It allows Watson to scan a piece of text and tell you what the tone of the writing is based on word use. Link to Dataversity article which links to the original article on ExtremeTech.

Strange tool which I guess could be useful too.