Showing posts with label HIVE. Show all posts
Showing posts with label HIVE. Show all posts

Friday, 28 February 2020

Presto-powered S3 data warehouse on Kubernetes by @joshua_robinson via @Medium

Joshua Robinson offers up a tutorial on how to set up a Presto data warehouse using Docker that could query data on a FlashBlade S3 object store, and a follow-up tutorial that explains how to move everything, including the Hive Metastore, to run in Kubernetes.

This is very useful to read and might help you to achieve something quicker than you have planned.

Thursday, 9 November 2017

Tuesday, 24 October 2017

Beyond Hadoop by James Ovendon via @iegroup

A company once synonymous with big data is on its way out, but what comes next?

Interesting.  So people are starting to use alternate to Hadoop or using it for other reasons.

Sunday, 28 August 2016

Data Partitioning in Big Data Application with Apache Hive by Vijay Aegis via CodeInnovationsBlog

Big data consulting company professionals are introducing the concept of partitioning in big data application. You need to read the post completely to understand how to do partitioning in such app using Apache Hive. If you don’t know how to do it, experts will help.

Useful blog.

Thursday, 5 May 2016

10 new exciting features in Apache Hive 2.0.0 via BigDataMadeSimple

10 new exciting features in Apache Hive 2.0.0 by Kumar Chinnakali via BigDataMadeSimple - We should be excited that Apache Hive community have released the largest release and announced the availability of Apache Hive 2.0.0. It brings great and exciting improvements in the category of new functionality, Performance, Optimizations, Security, and Usability.

Thursday, 10 March 2016

How different SQL-on-Hadoop engines satisfy BI workloads via @CIOonline

How different SQL-on-Hadoop engines satisfy BI workloads by Thor Olavsrud via @CIOonline  - A new benchmark of SQL-on-Hadoop engines Impala, Spark and Hive finds they each have their own strengths and weaknesses when it comes to Business Intelligence (BI) workloads.

A must read if you are going to do this.

Friday, 26 February 2016

WEBINAR: Predictive Analytics Deployment to Mainframe or Hadoop - 3 March 2016




Predictive Analytics Deployment to Mainframe or Hadoop
Thursday, March 3, 2016 7:00:00 PM GMT - 8:00:00 PM GMT
The big challenge for analytics-driven organizations today is closing the gap between deriving an analytic result and getting the ROI. Organizations need a consistent and efficient way to deploy analytic results into everything from systems of record like mainframes to modern big data infrastructure. 

Join James Taylor, CEO of Decision Management Solutions and Michael Zeller, CEO of Zementis, in this live webinar to learn how the Predictive Model Markup Language (PMML) provides an XML standard that streamlines the deployment of predictive analytic models. With PMML a model can be developed in one tool or language, whether open source like R or commercial predictive analytics products, and easily migrated to a wide range of operational systems including mainframes like IBM zSystems and new data infrastructure like Hadoop with Spark / Storm / Hive. 

You will learn:
• The challenges of an increasingly complex analytic environment.
• How analytics increase the value of legacy systems of record, mainframes and big data infrastructure.
• Why PMML is the critical glue between heterogeneous analytics environments.

The presenters will use case studies to outline this proven, standard-based approach to analytics deployment in today's complex predictive analytics environments.

All registrants will receive a copy of the new paper by James Taylor, "Standards-based Deployment of Predictive Analytics - Using a standards-based approach to deploy predictive analytics on operational systems from mainframes to Hadoop."

Register today! This is a live webinar with a Q&A following. If you would like to attend but can't make the webinar time, please register to receive a copy of the white paper, presentation and a link to the recording.

Presenters:

James Taylor and Michael Zeller
James Taylor is the CEO of Decision Management Solutions, experts in decision management and decision modeling. He provides strategic consulting, working with clients to adopt decision modeling, predictive analytics and business rules. James is the author of multiple books and articles and writes a regular blog at JT on EDM.  James is a contributor to the BABOK® Guide on decision modeling and is a co-submitter of the new Decision Model Notation (DMN) standard. 

Michael Zeller is the CEO and Co-Founder of Zementis. Mike has extensive experience in strategic implementation of technology, business process improvement and systems integration. He strives to provide customers with innovative business solutions tailored to their unique needs. He also serves on the Board of Directors of Tech San Diego and as Secretary/Treasurer on the Executive Committee of ACM SIGKDD, the premier international organization for data mining.

Register HERE


Sunday, 7 February 2016

How much SQL is required to learn Hadoop? via @dezyreonline

How much SQL is required to learn Hadoop? via +DeZyre  -  With widespread enterprise adoption, learning Hadoop is gaining traction as it can lead to lucrative career opportunities. There are several hurdles and pitfalls students and professionals come across while learning Hadoop. This post provides detailed explanation on how SQL skills can help professionals learn Hadoop.

Interesting thoughts. Certainly I can cope with SQL fine but Java I'm certainly not as good at. Nice to read confirmation I can survive with the skills I have.

Tuesday, 5 January 2016

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources via @AnalyticsVidhya

Learn Big Data Analytics using Top YouTube Videos, TED Talks & other resources via +Analytics Vidhya - by Manish Saraswat this is a comprehensive list of excellent videos, resources available for you to get inspired & get going on the subject.  Most as YouTube videos and there is a write-up with each to suggest the audience for it.

Monday, 28 December 2015

File Formats in Apache HIVE via @acadgild

File Formats in Apache HIVE via @acadgild - This blog from AcadGild discusses the different file formats available in Apache Hive.  After reading this Blog you will get a clear understanding of the different file formats that are available in Hive and how and where to use them appropriately.

It contains great examples and should prove very useful to people.

Saturday, 26 December 2015

Data serialization with avro in hive via @acadgild

Data serialization with avro in hive via @acadgild - This blog from AcadGild focuses on providing in depth information of Avro in Hive. Here we have discussed about the importance and necessity of Avro and how to implement it in Hive. Through this blog you will get a clear idea about Avro and its implementation in your Hadoop projects.

Very useful blog and gives great examples.

Wednesday, 18 November 2015

WEBINAR: Best Fit Engineering for SQL on Hadoop - 24 November 2015


Overview
Title: Best Fit Engineering for SQL on Hadoop
Date: Tuesday, November 24, 2015
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary

Best Fit Engineering for SQL on Hadoop
Join us for our latest DSC Webinar series as we discuss how enterprises have increasingly large volumes of structured and semi-structured data generated by all sorts of applications.  Much of that data is increasingly finding its way into Hadoop clusters for analytics because of its versatility and the economical, linear scalability of both data storage and compute.  And SQL is still the best option for querying it:
  • SQL is the universal connector to many BI tools and technologies
  • Prevalent SQL skills overcome the Hadoop skills gap
  • Hadooponomics enables more analytics on more data at a much lower cost
Forrester recently concluded that organizations need to choose more than one SQL-on-Hadoop tool to satisfy all requirements. Hortonworks and Teradata agree in this “best fit engineering” approach designed to match the benefits of each tool set to map to actual workload requirements, while remaining true to 100% open source innovation. 
You will learn about SQL on Hadoop best practices, including:
  • A brief history of SQL on Hadoop
  • Architecture and use cases for Hive and Presto
  • Technical deep dive and futures for Hive and Presto 
Speakers: 
Mark Shainman, Program Manager -- Teradata
Mark Lochbihler, Director, Partner Engineering -- Hortonworks
Hosted by: Bill Vorhies, Editorial Director -- Data Science Central

Teradata Hortonworks logo

Register here




Thursday, 4 June 2015

SQL and Hadoop: It's complicated

With the 1.0 release of Apache Drill and a new 1.2 release of Apache Hive, everything you thought you knew about SQL-on-Hadoop might just have become obsolete

Saturday, 14 February 2015

How Big Data Pieces, Technology, and Animals fit together

A great summary of Big Data from +KDnuggets pulling together information from a variety of source to give an explanation of what all the names of the objects that make it up mean.  I hadn't realised just how all of these connected and the similarity between the names to animals.

Read it here on +KDnuggets

Friday, 9 January 2015

How Facebook uses Hadoop and Hive

Facebook is one of Hadoop and big data’s biggest champions, and it claims to operate the largest single Hadoop Distributed Filesystem (HDFS) cluster anywhere, with more than 100 petabytes of disk space in a single system as of July 2012.Facebook runs the world’s largest Hadoop cluster. Just one of several Hadoop clusters operated by the company spans more than 4,000 machines, Facebook deployed Facebook Messages,its first ever user-facing application built on the Apache Hadoop platform.Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day.

Continue reading here from +Ashwani Agarwal

Tuesday, 21 October 2014