Showing posts with label APACHE. Show all posts
Showing posts with label APACHE. Show all posts

Wednesday, 12 October 2022

WEBINAR - A hands-on look at the structure of an Apache Iceberg Table 18 October 2022

 
Dremio Logo
 
Webinar
 

 
 
 REGISTER 
 
 
 
 

date

 

Tuesday, October 18, 2022

time

 

10 AM PT / 1 PM ET

 
 
 
 

Hi there,

Apache Iceberg is an open table format for large-scale analytics on the data lake. In this webinar, we’ll deep dive into the internal structure of an Apache Iceberg table from a practical point of view.

In this webinar, we will learn:

  • Various components of an Apache Iceberg table
  • Perform CREATE, INSERT, UPDATE  with an Iceberg table and understand what happens underneath
  • Learn about Iceberg metadata tables
  • Example use cases for leveraging metadata tables

Register for this webinar to gain a practical understanding of the structure of an Apache Iceberg table.

 
 
 

Speaker

 
 
 
 
Dipankar Mazumdar

Dipankar Mazumdar
Developer Advocate
Dremio

 
 
Register

Monday, 5 September 2022

WEBINAR: The Life of an Apache Iceberg Query - 13 September 2022

 
Dremio Logo
 
Webinar
 

 
 
 REGISTER 
 
 
 
 

date

 

Tuesday, September 13, 2022

time

 

10 AM PT / 1 PM ET

 
 
 

Hi there,

Apache Iceberg offers the tools for query engines to make fast and efficient query plans on your data lakehouse. In this webinar, we’ll learn how Iceberg queries play out through planning and execution.

In this webinar we will learn:

  • The step-by-step process of an INSERT query
  • The step-by-step process of a DELETE query
  • The step-by-step process of an UPSERT/MERGE query
  • The step-by-step process of a SELECT query
  • Understanding of Iceberg delete files

Register for this webinar to gain a deeper understanding of how Iceberg brings to you fast queries on your data lakehouse.

 
 
 

Speaker

 
 
 
 
Alex Merced

Alex Merced
Developer Advocate
Dremio

 
 
Register

Tuesday, 5 July 2022

WEBINAR: Laying the foundation of a Data Lakehouse with AWS Glue, Apache Iceberg and Dremio - 14 July 2022

 
Dremio Logo
 
Webinar
 

 
 
 REGISTER 
 
 
 
 

date

 

Thursday, July 14, 2022

time

 

9 AM PT / 12 PM ET

 
 
 
 

Hi there,

Moving analytical workloads from the data warehouse to the data lakehouse can save money, make more of your data accessible to your consumers, and provide a better user experience for your data.

Three key technologies that enable you to build the foundation of a data lakehouse are AWS Glue, Apache Iceberg, and Dremio.

In this webinar, we’ll go through the benefits of a data lakehouse architecture then dive into a live demo where we’ll create Apache Iceberg tables using AWS Glue and then run blazing fast analytics on the table using Dremio.

 
 
 

Speaker

 
 
 
 
Alex Merced

Alex Merced
Developer Advocate
Dremio

 
 
Deploy Dremio
 

Monday, 20 June 2022

WEBINAR: Apache Iceberg: An Architectural Look Under the Covers - 28 June 2022

 

 
Dremio Logo
 
Webinar
 

 
 
 REGISTER 
 
 
 
 

date

 

Tuesday, June 28, 2022

time

 

9:00 AM PT | 12:00 PM ET

 
 
 
 
 

Speaker

 
 
 
 
Jason

Jason Hughes
Director of Product Management
Dremio

 
 
 

Data Lakes have been built with a desire to democratize data - to allow more and more people, tools, and applications to make use of data. A key capability needed to achieve it is hiding the complexity of underlying data structures and physical data storage from users. The de-facto standard has been the Hive table format, released by Facebook in 2009 that addresses some of these problems, but falls short at data, user, and application scale. So what is the answer? Apache Iceberg.

Apache Iceberg table format is now in use and contributed to by many leading tech companies like Netflix, Apple, Airbnb, LinkedIn, Dremio, Expedia, and AWS.

Join Jason Hughes, Technical Director at Dremio, for this webinar to learn the architectural details of why the Hive table format falls short and why the Iceberg table format resolves them, as well as the benefits that stem from Iceberg’s approach.

 
 

You will learn:

bullet The issues that arise when using the Hive table format at scale, and why we need a new table format
 
bullet How a straightforward, elegant change in table format structure has enormous positive effects
 
bullet The underlying architecture of an Apache Iceberg table, how a query against an Iceberg table works, and how the table’s underlying structure changes as CRUD operations are done on it
 
bullet The resulting benefits of this architectural design
 
Register
 

Monday, 23 May 2022

How to Run Airflow Locally With Docker by Giorgos Myrianthous via @TDataScience

 A step by step guide for running Airflow with Docker on your local machine.

This is very clear and easy to follow. Well worth a bookmark.

Monday, 12 November 2018

WEBINAR: Scaling Big Data Pipelines in Apache Spark, No Coding Required - 15 November 2018


Various companies across multiple industries collect and house vast amounts of data. However, most face the same challenge: the ability to process big data and quickly find insight within its framework. Introducing KnowledgeSTUDIO with Apache Spark, the ultimate solution for both data scientists and data analysts. The graphical user interface with Big Data capabilities allows organizations to build pipelines seamlessly.
Join us and learn how users of KnowledgeSTUDIO for Apache Spark, a wizard-driven productivity tool for building Spark workflows, have overcome these challenges.

Learn how data science teams can: 
  • Utilise interactive workflows with an automated design canvas for building, displaying, refreshing, and reusing analytic models
     
  • Automatically generate code that can be customised and incorporated into production scripts
     
  • Include manually written code within the graphical workflow
     
  • Leverage advanced modelling with open source packages such as Spark ML, Spark SQL
     
  • Avoid overhead costs of parallelisation when datasets are very small
     
  • Build, explore data segments, and discover relationships using patented Decision Tree technology
REGISTER NOW

Wednesday, 24 January 2018

WEBINAR: Matei Zaharia’s Predictions for 2018: Big Data and AI Highlights - 31 Jan 2018

Event Banner
Overview
Title: Matei Zaharia’s Predictions for 2018: Big Data and AI Highlights
Date: Wednesday, January 31, 2018
Time: 09:00 AM Pacific Standard Time
Duration: 1 hour
Summary
Matei Zaharia’s Predictions for 2018: Big Data and AI Highlights
Over the past few years, AI and big data have powered numerous technologies that have changed the way we live, from autonomous cars to conversational systems to personalization. As a result, the excitement around these technologies has spiked. But how can we separate the hype from reality, and which advances will make an impact in practice next?
In this DSC webinar, Databricks co-founder and Stanford computer science professor Matei Zaharia, who started the Apache Spark project in 2009, will share his perspective on which big data and AI trends will come to fruition in 2018. He will discuss how centering organizations around high-quality data will be the main driver to AI, which AI applications are seeing broad success in practice, and how new technologies including deep learning, data marketplaces and cloud computing will affect the computing landscape.
Join this webinar to learn about:
  • The current state of big data and AI
  • Some of the new innovations taking place in research
  • Key challenges that companies face in getting value from data and AI
  • Matei’s predictions for 2018 for how companies and the technology industry will overcome these challenges
Speaker: Matei Zaharia, Co-founder and Chief Technologist -- Databricks
Hosted by: Bill VorhiesEditorial Director -- Data Science Central
  databricks
Register here