Data: TEXT

Showing posts with label TEXT. Show all posts

Wednesday, 28 September 2022

4 Tools to Automatically Extract Data from Datetime in Python by Khuyen Tran via @TDataScience

How to Extract Datetime From Text and Data From Datetime.

This is just SO useful.

Tuesday, 26 July 2022

WEBINAR: Extract Data from PDFs at Scale - 4 August 2022

Auto-extraction of unstructured and image PDF data is here.

Live Webinar

Extract Data
from PDFs at Scale

Much of the valuable data locked in your PDFs is unstructured, contained in images, or both. Until now, analyzing that data took manual entry and transcription — time-consuming and expensive.

But now, it’s finally possible to extract that data automatically, without sacrificing efficiency for accuracy. In this live interactive conversation, our own VP of Data Science, Adam Blacke, reveals:

	How much valuable data is hidden in your organization’s PDFs — data you couldn’t previously access

	How new automated OCR breakthroughs can parse that data in a blink, with drag-and-drop ease

	How these new efficiencies are transforming company after company — and how yours can be next

Join the webinar

Date

Thursday, Aug. 4, 2022

Time

9 a.m. Pacific

Save My Spot

Speakers

		Adam Blacke VP of Data Science Alteryx

		Chris deMontmollin Product Marketing Manager Alteryx

Wednesday, 25 May 2022

Why I Stopped Dumping DataFrames to a CSV and Why You Should Too by Avi Chawla via @TDataScience

It’s time to say goodbye to pd.to_csv() and pd.read_cv

This was really interesting and I was just automatically going to a CSV format whether I actually needed it or not.

Monday, 2 May 2022

Boost Performance of Text Classification tasks with Easy Data Augmentation by Satyam Kumar via @TDataScience

Text data augmentation for NLP tasks.

Interesting thoughts and definitely something to try.

Monday, 21 February 2022

Python’s F-Strings Are A Lot More Useful Than You Might Have Thought by @emmettboudgie via @TDataScience

Some cool things most people do not realize f-strings can do in Python,

Interesting to read and think about as I had no idea about some of these things.

Monday, 31 January 2022

What’s in an F-String? by Murtaza Ali via @TDataScience

An overview of Python’s method for combining strings and variables and why you should use it.

A very useful article and well worth a bookmark.

Monday, 24 January 2022

Python Single vs. Double Quotes — Which Should You Use And Why? by/via Better Data Science

What are the differences between Python single and double quotes?

I think it is very important to be consistent and use double quotes for text if you can, else use escape characters on the single quote that is part of the text.

Wednesday, 3 November 2021

The Match-Case In Python 3.10 Is Not That Simple by Christopher Tao via @TDataScience

7 examples to show the “MATCH case” is not “SWITCH case”

This is really useful and cleverly shows the differences between the two commands.

Monday, 25 October 2021

WEBINAR: Maximizing Data Labeling Operations in High-Stakes Industries: Tips for Tools and Teams - 2 November 2021

Maximizing Data Labeling Operations in High-States Industries

Are you interested in learning about overcoming data annotation challenges like scaling teams, labeling complex data, and handling edge cases?

There's an art and science to choosing the processes and teams used to extract and structure the data found in images, video, and documents for AI and business insights. In this interactive LinkedIn Live chat, we're talking to Alberto Rizzoli, CEO & Co-founder of V7 Labs, a data labeling platform for text and visual data. CloudFactory and V7 regularly collaborate to optimize data labeling operations for customers and build high-quality datasets for global innovators.

Join us on November 2 at 11 am ET / 4 pm BST: Maximizing Data Labeling Operations in High-Stakes Industries: Tips for Tools and Teams

Here are a few topics we plan to discuss with V7:

•	Fascinating real-world examples of computer vision development in agriculture and healthcare
•	Maximizing data operations resources and scalability by combining SMEs like medical doctors and experienced data annotators during the AI lifecycle
•	Optimizing human and computer collaboration to process edge cases that baffle text annotation tools like optical character recognition (OCR)
•	Preparing for data annotation challenges by choosing proven tools, processes, and human in the loop workforces

P.S. Have questions? Contact CloudFactory anytime here. You might enjoy learning about CloudFactory's collaboration with V7 on Covid-19 AI training data.

Wednesday, 9 October 2019

The Seven Patterns Of AI by Kathleen Walch via @forbes

AI use cases tend to fall into one or more of these seven common categories. Kathleen Walch explains in this article from Forbes.

This is a great list and I think could be used in order to work out what COULD be done and use it to plan a roadmap for the future.

Monday, 29 July 2019

How Etsy taught style to an algorithm by/via @FastCompany

Is it romantic or rustic? Boho or minimal? Etsy needed to offer searchers a way to find goods that matched their style aesthetics, but since descriptions aren’t uniform and don’t always describe the style, text mining the descriptions wasn’t enough. Colour and patterns don’t reliably predict style, so image recognition alone didn’t do it either. Enter a model that blends text analysis with image recognition based on 43 human-identified styles.

I love this real-life example detailing the steps they took to work out how to do this. Definitely, a methodology that could be used by other organisations to do a similar type of thing.

Wednesday, 1 May 2019

How algorithms know what you’ll type next by Wessel Stoop and Antal van den Bosch via @puddingviz

This tutorial explains how text predictors work.

This is very clear and easy to understand and follow along as you work through the Twitter example they use. Once you have worked out how it works you can just use similar sets of code for other places.

Friday, 1 March 2019

OpenAI’s new multitalented AI writes, translates, and slanders by James Vincent via @verge

OpenAI is said to have trained an unsupervised language model that can read and write at a level that's never been seen before. It's called GPT-2 and they say it's so good, they're afraid to release it. This article in The Verge explores the claims and the presumed dangers, including samples of GPT-2's capabilities. Follow the links for more info, code and related articles.

I really liked this article which was a great read and definitely made me think about it.

Thursday, 25 October 2018

The Main Approaches to Natural Language Processing Tasks by Matthew Mayo via @kdnuggets

Let's have a look at the main approaches to NLP tasks that we have at our disposal. We will then have a look at the concrete NLP tasks we can tackle with said approaches.

Good lists of approaches with examples that are useful for both the learner and the more experienced practitioner to keep on hand to remind you or them all.

Tuesday, 8 May 2018

WEBINAR: Combining Human Intelligence with ML for NLP and Speech - 17 May 2018

Overview

Title: Combining Human Intelligence with Machine Learning for NLP and Speech

Date: Thursday, May 17, 2018

Time: 09:00 AM Pacific Daylight Time

Duration: 1 hour

Summary

Combining Human Intelligence with Machine Learning for NLP and Speech

Executing successful Natural Language Processing (NLP) and Speech projects in the real world is complicated. It is often difficult to find the right volume of raw data to annotate, especially if some categories/words/topics are very rare in the data. It is also difficult to find and manage the right people to annotate, transcribe or create the data, especially when the use case requires domain expertise or certain languages and accents.

Join this latest Data Science Central webinar and learn how to incorporate better active learning and annotation strategies into your NLP projects to achieve better in your NLP and Speech applications. This webinar will include a brief demo of the Figure Eight platform to show how to generate high-quality, human-annotated training data and incorporate that training data into human-in-the-loop machine learning systems that you can run in your own environment.

Speaker:

Robert Munro, Chief Technology Officer -- Figure Eight

Hosted by:
Bill Vorhies, Editorial Director -- Data Science Central

Sunday, 29 April 2018

Understanding Feature Engineering - 4 part article by Dipanjan Sarkar via @TDataScience

Great 4 part series that you really need to set some time aside so you can sit and read these:

1 - Strategies for working with continuous, numerical data

2 - Strategies for working with discrete, categorical data

3 - Traditional strategies for taming unstructured, textual data

4 - Newer, advanced strategies for taming unstructured, textual data

Saturday, 13 January 2018

Google’s voice-generating AI is now indistinguishable from humans by @davegershgorn via @qz

In this paper, Google researchers explain a text-to-speech system called Tacotron 2, which claims near-human accuracy at imitating audio of a person speaking from text.

This is a really exciting development and so worth keeping an eye on.

Wednesday, 13 July 2016

Text Mining 101: Topic Modelling by Goutam Nair via @kdnuggets

We introduce the concept of topic modelling and explain two methods: Latent Dirichlet Allocation and TextRank. The techniques are ingenious in how they work – try them yourself.

I found this really interesting.

Wednesday, 24 February 2016

Topic Modeling Large Amounts of Text Data via @Data_Informed

Topic Modeling Large Amounts of Text Data by Frank D. Evans via @Data_Informed - Exaptive Data Scientist Frank Evans discusses how to use Spark to glean insights from large sets of unstructured text data.

Really worthwhile read as we all struggle with unstructured data.

Thursday, 11 February 2016

WEBINAR: Text Analytics Delivers Game-Changing Customer Insights - 16 February 2016

Text Analytics Delivers Game-Changing Customer Insights

Join us to learn how text analytics can help you discover the hidden social insights that can transform your business

Date: February 16, 2016
Time: 11 AM ET

To remain competitive, businesses need to operate at the speed of social. At least 80% of enterprise data is unstructured, contained in the myriad text-based social conversations that are happening every day. Unlocking the hidden value of text through predictive analytics is imperative for understanding customers’ opinions and needs to make better, more informed business decisions.

During this webinar RapidMiner and Aylien will explore the power of social content by analyzing data captured from thousands of tweets referencing Super Bowl 50 ads to determine viewer sentiments and predict potential trends in brand adoption.

Attend this webinar to:

Learn how to leverage predictive and text analytics for: understanding your clients, improving customer satisfaction, and optimizing marketing spend
Learn how to quickly make sense of social media data across thousands of responses using sentiment analysis and predictive modeling
Understand the impact of predictive and text analytics on business opportunities
Learn how to share and communicate customer insights through data visualization

Can’t attend? Register anyways, and we will send you the recording of the webinar after the event.

Register here