Authors Julia Silge and David Robinson discuss the power of tidy data principles, sentiment lexicons, and what they're up to at Stack Overflow.
“Many of us who work in analytical fields are not trained in even simple interpretation of natural language,” write Julia Silge, Ph.D., and David Robinson, Ph.D., in their newly released book Text Mining with R: A tidy approach
. The applications of text mining are numerous and varied, though; sentiment analysis can assess the emotional content of text, frequency measurements can identify a document’s most important terms, analysis can explore relationships and connections between words, and topic modeling can classify and cluster similar documents.
I recently caught up with Silge and Robinson to discuss how they’re using text mining on job postings at Stack Overflow, some of the challenges and best practices they’ve experienced when mining text, and how their tidytext package for R Continue reading "R’s tidytext turns messy text into valuable insight"
Recapping winners of the Strata San Jose Startup Showcase.
Every five years, we invent a new technology that, when sprinkled atop existing business problems, acts as a panacea for managers. In the ‘90s it was the Web, followed quickly by SaaS, mobility, clouds, data, and now AI.
But there's a bigger underlying pattern. Web and SaaS gave us interfaces anyone could use. Mobility made them ubiquitous, taking us away from the workday—we check our phones dozens of times a day, and, often, they're the last thing we look at before sleep and the first thing we grab upon waking. Clouds gave us elastic, on-demand computing. Big data gave clouds something to do. And AI is a set of algorithms that make sense of that big data, teasing threads of gold from the digital hay.
Take, for example, the winners of the Strata San Jose Startup Showcase
. From a field of Continue reading "Data science startups focus on AI-enabled efficiency"
Manage resources and fixtures with Spock's lifecycle hooks.
Testing persistence is one of the most frequently encountered types of integration test.
If done incorrectly it can mean death to a suite of tests because they will run slowly and be incredibly brittle.
One of the most common antipatterns encountered is to test everything by reference to a single monolithic fixture containing an ever-growing volume of data that attempts to cater to every corner-case.
The fixture will soon become the victim of combinatorial explosion—there are so many combinations of entities in various states required that the sheer amount of data becomes overwhelming and impossible to monitor.
Tests based on monolithic fixtures tend to be replete with false moniker
testing—"I know the pantoffler widget is the one set up with three variants, one of which has a negative price modifier and is zero-rated for tax."
Continue reading How (not) to Continue reading "How (not) to approach persistence testing in Java and Groovy"