Data: July 2020

Friday, 31 July 2020

10 big data blunders businesses should avoid by Sara Brown via @MITSloan

Big data is a promising investment for firms, but embracing data can also bring confusion and potential minefields - everything from where companies should be spending money to how they should be staffing their data teams.

This was an interesting read and definitely a good list to use as a basis of what you need to avoid in order to not make a mistake.

Wednesday, 29 July 2020

WEBINAR: Natural Language Trends in Visual Analysis 6 August 2020

Data Science Central Webinar Series Event

Natural Language Trends in Visual Analysis
Join us for this latest DSC Webinar on Aug 6^th, 2020

Natural language processing has garnered interest in helping people interact with computer systems to make sense and meaning of the world. In the area of visual analytics, natural language has been shown to help improve the overall cognition of visualization tasks.

In this latest Data Science Central webinar, Vidya will discuss how natural language can be leveraged in various aspects of the analytical workflow ranging from smarter data transformations, visual encodings, autocompletion to supporting analytical intent. More recently, chatbot systems have garnered interest as conversational interfaces for a variety of tasks. Machine learning approaches have proven to be promising for approximating the heuristics and conversational cues for continuous learning in a chatbot interface.

Vidya will explore the implications for these data-driven approaches in broadening the scope for visual analysis workflows. She will also discuss the future directions for research and innovation in this space.

Speaker:
Vidya Setlur, Principal Research Scientist -- Tableau

Hosted by: Stephanie Glen, Editorial Director -- Data Science Central

Title:	Natural Language Trends in Visual Analysis
Date:	Thursday, Aug 6^th, 2020
Time:	9:00 AM - 10:00 AM PDT

Space is limited so please register early:

Reserve your Webinar seat now

Snorkel is a fundamentally new interface to ML without hand-labeled training data by/via @jthandy

Snorkel has the directness of rules with the flexibility of ML.

This looks really interesting and I like the idea of combining both approaches. Something to watch out for in the future I think.

Monday, 27 July 2020

The Rise of DataOps (from the ashes of Data Governance) by @ryano144 via @TDataScience

This makes a comparison between software engineering and data analysis, which shows that that source control management is the fundamental transition that allows the practice to go from hobby to profession. Source control management provides reproducibility, which is the core fundamental requirement of any engineering discipline.

Definitely worth a read and an in-depth think about how you can implement the various techniques mentioned in this article.

Friday, 24 July 2020

The Frameworks that Google, DeepMind, Microsoft and Uber Use to Train Deep Learning Models at Scale by @jrdothoughts via @Medium

GPipe, Horovod, TF-Replicator and DeepSpeed combine cutting edge aspects of deep learning research and infrastructure to scale the training of deep learning models.

I found this fascinating. I really hadn't quite connected all the dots in my mind to connect the frameworks up like this.

Wednesday, 22 July 2020

Change The Way You Write Python Code With One Extra Character @DorelMasasa in @thestartup_

One small syntax change, one giant step for your coding skills.

Some great code suggestions that will make a big difference to Python code.

Monday, 20 July 2020

Pandas DataFrame (Python): 10 useful tricks by Maurizio Sluijmers via @gitconnected

10 basic tricks to make your pandas life a bit easier.

Some great tricks from Maurizio and code snipped are embedded in the article.

WEBINAR: Go from Data, to Data Prep, to Data Science 29 July 2020

DataRobot_Data_Prep_Email_banner_v.2.0.png

July 29, 2020

11.00 am ET - 45 min including Q&A

It goes without saying, in order to train data science models to produce predictive forecasts, you need data. But the process of getting the data into models is not always cut and dry, as data science and analytics teams continue to struggle with getting the right kind of data, in the proper format, for the appropriate analysis. As a result, teams end up spending more time uncovering and preparing data for data science models than they do on refining the actual models. But this doesn't have to be the case.

Through the powerful combination of Snowflake and DataRobot, it's now easy for data users to leverage the leading cloud data platform to quickly build, train and deploy data science models. Want to hear how?

Register today to hear from Josh Klaben-Finegold, Product Manager at Datarobot and Mike Klaczynski, Director of Product Marketing at Snowflake, as they discuss how you can conduct enterprise self-service data prep for data science in just a few clicks.

Join and learn how:

Snowflake + DataRobot empower users to collaborate, prepare, and process data for machine learning at scale, with enterprise governance
You can easily prepare your data for feature engineering
Leveraging the power of both Snowflake + DataRobot together is easy and seamless via Snowflake Partner Connect - demo included!

Friday, 17 July 2020

Data Prep Still Dominates Data Scientists’ Time, Survey Finds by Alex Woodie via @datanami

Data scientists spend about 45% of their time on data preparation tasks, including loading and cleaning data, according to a survey of data scientists conducted by Anaconda. The company also analyzed the gap between what data scientists learn as students, and what the enterprises demand.

Yes, it does take time, but if you prepare your data right then the results will be good.

Wednesday, 15 July 2020

Neuromorphic computing finds new life in machine learning via @ZDNet and @TiernanRayTech

Neuromorphic computing has had little practical success in building machines that can tackle standard tests such as logistic regression or image recognition. But work by prominent researchers is combining the best of machine learning with simulated networks of spiking neurons, bringing new hope for neuromorphic breakthroughs.

Wow - this was completely fascinating.

Monday, 13 July 2020

Scope and Impact of AI in Agriculture by Yogita Kinha via @kdnuggets

The major advantage of focusing on AI-based methods is that they tackle each of the challenges faced by farmers from seed sowing to harvesting of crops separately and rather than generalising, provide customised solutions to a specific problem.

An interesting read and an eye-opener in something that affects us all.

Top 9 Data Science certifications to know about in 2020 by @rpdesai24 via @TDataScience

Some of the best data science certification programs worth considering.

Some of these look great and would be a very low cost to get some kind of qualification/certification.

Friday, 10 July 2020

Why Statistics Don’t Capture The Full Extent Of The Systemic Bias In Policing by Laura Bronner via @FiveThirtyEight

Because of a statistical quirk called “collider bias,” the criminal justice system may be even more racially biased than studies suggest. Here's how collider bias works, including charts that clearly show the problem.

This was interesting and the same problems I'm sure are repeated in other areas too. Another bias to try to remove.

Wednesday, 8 July 2020

WEBINAR: DataOps: How Bell Canada Powers their Business with Data - 15 July 2020

Data Science Central Webinar Series Event

DataOps: How Bell Canada Powers their Business with Data
Join us for the latest DSC Webinar on July 15^th, 2020

Agile data management has become a necessity for organizations that need to maximize the value of their data. The business is focused on outcomes but the bulk of the effort is built around data processing and delivery. Demand for data outstrips the capacity of IT organizations and data engineering teams to deliver. New data management practices that adapt the practices DevOps to support data operations (DataOps) are the key to agility in data management. The enabling technologies exist today and data management practices are moving quickly toward a future of DataOps. DataOps is an automated, process-oriented methodology, used by analytic and data teams, to improve the quality and reduce the cycle time of data analytics.

In this latest Data Science Central webinar, you will learn:

How to identify the most impactful bottlenecks sitting the way of streamlined data processing
How to evaluate multiple strategies for improving data processing outcomes and their relative impact
Where to prioritize people, process, and technology changes to maximize impact
How Bell revolutionized their data delivery framework by incorporating DataOps principles and technology

Featured Speakers:
Johnathan Bald, Sr. Director of Sales -- Hitachi Vantara
Jude Vanniasinghe, Sr. Manager of Business Intelligence -- Bell

Presentation Moderator:
Mike Williams, Global Solution Lead, Analytics and IoT -- Hitachi Vantara

Hosted by: Sean Welch, Host and Producer -- Data Science Central

Title:	DataOps: How Bell Canada Powers their Business with Data
Date:	Wednesday, July 15^th, 2020
Time:	9 AM - 10 AM PDT

Space is limited so please register early:

Reserve your Webinar seat now

WEBINAR: ETL & Advanced Machine Learning - Open Source, No Code Required - 16 July 2020

Sponsored News from Data Science Central

Data practitioners create value to their organization through ETL, visualization, machine learning and deployment. Their flexible and reliable working tools are equally important as the ability to collaborate in-house and with the community. KNIME is an open source platform that covers this entire life cycle, free, and easy to install and use.

Join the free webinar on July 16, 1:30 PM
It's happening in two time zones: Americas (CDT) and Asia/Europe (CEST).

Register Now

Join the team of Data Scientists for a quick and practical introduction to KNIME Analytics Platform.

What you will learn in this session:

Reduce the time needed to automate ETL.
Integrate new data science methods, from simple to sophisticated such as deep learning, advanced ML, text mining, and time series.
Explore a modern, open source data science platform with a visual workflow editor.

What is the Zero Trust Model (ZTM) @BanafaAhmed via @Datafloq

The Zero Trust Model of information security simplifies how information security is conceptualized by assuming there are no longer “trusted” interfaces, applications, traffic, networks, or users. It takes the old model— “trust but verify”—and inverts it because recent breaches have proven that when an organisation trusts, it doesn’t verify. The zero-trust model of information security means “verify and never trust.”

I found this fascinating as this is not my area of expertise. When you think about it, the approach makes perfect sense.

Tuesday, 7 July 2020

WEBINAR: How Data Science is Changing Football: What Every Industry Can Learn! - 14 July 2020

DataRobot_How_Data_Science_is_Changing_Football_What_Every_Industry_Can_Learn_Email_Banner_v1.0.png

July 14, 11:00 am ET

45 min including Q&A

Football and other sports have changed dramatically by adopting Artificial Intelligence - and will disrupt every business industry. Decisions across the organization benefit greatly by injecting data science, which more accurately comprehend the complexities of real-life information. Predictions and recommendations are improved for use cases off the field (marketing, ticket pricing), in-game strategy, and recruiting players.

This webinar will lead a lively discussion on how winning in sports translates to winning across other industries, overcoming cultural resistance, and doing analytics at scale and velocity to win the race.

Monday, 6 July 2020

The Most Important Fundamentals of PyTorch you Should Know by Kevin Vu via @Exxactcorp

PyTorch is a constantly developing deep learning framework with many exciting additions and features. We review its basic elements and show an example of building a simple Deep Neural Network (DNN) step-by-step.

This was incredibly clear and very useful as it contained code examples for you to learn from. Definitely recommended.

Friday, 3 July 2020

Introduction to Datapane: A Python Library to Build Interactive Reports by @KhuyenTran16 ivia @TDataScience

Simple Framework to Create Beautiful Reports and Share your Analysis Results with your Team

This looks like a great way to produce some great reports from your Python code. Definitely worth a bookmark and giving some applause. Some great code snippets.

Wednesday, 1 July 2020

Why successful AI needs fast data access via @theregister

Spending on artificial intelligence systems will grow from $37.5bn worldwide in 2019 to $97.9bn in 2023, according to IDC. And use cases cover everything from ERP, manufacturing software and content management to automated customer service agents, threat intelligence and fraud.

Not just fast data access but often lots of processing power too.

Data