R’s tidytext turns messy text into valuable insight

Authors Julia Silge and David Robinson discuss the power of tidy data principles, sentiment lexicons, and what they're up to at Stack Overflow. “Many of us who work in analytical fields are not trained in even simple interpretation of natural language,” write Julia Silge, Ph.D., and David Robinson, Ph.D., in their newly released book Text Mining with R: A tidy approach. The applications of text mining are numerous and varied, though; sentiment analysis can assess the emotional content of text, frequency measurements can identify a document’s most important terms, analysis can explore relationships and connections between words, and topic modeling can classify and cluster similar documents. I recently caught up with Silge and Robinson to discuss how they’re using text mining on job postings at Stack Overflow, some of the challenges and best practices they’ve experienced when mining text, and how their tidytext package for R
Continue reading "R’s tidytext turns messy text into valuable insight"

Four short links: 26 July 2017

Blockchain and Securities Law, Low-Energy Sensing, Robotics Deep Learning, and Chipped Employees
  1. Report of Investigation ... DAO (SEC) -- This Report reiterates these fundamental principles of the U.S. federal securities laws and describes their applicability to a new paradigm—virtual organizations or capital raising entities that use distributed ledger or blockchain technology to facilitate capital raising and/or investment and the related offer and sale of securities.
  2. How to Make a Wireless Sensor Live for a Year on One Tiny Coin Cell Battery -- We see that making the right choices in software can make a stunning 400x difference in power consumption – two orders of magnitude – even with an already extremely power-efficient hardware.
  3. Deep Learning in Robotics: A Review of Recent Research -- what it says on the cover.
  4. A Wisconsin Company Will Let Employees Use Microchip Implants to Buy Snacks and Open Doors -- Participating employees will have the
    Continue reading "Four short links: 26 July 2017"

Four short links: 25 July 2017

AI Sentencing, AI Vocabulary, Soft U2F, and Encrypted Email
  1. Opening the Lid on Criminal Sentencing -- Duke researchers building a (socially) better algorithm. CORELS makes it possible for judges and defendants to scrutinize why the algorithm classifies a particular person as high or low risk. [...] None of the research team’s models rely on race or socioeconomic status.
  2. Agents that Imagine and Plan -- the relentless appropriation of terms from cognition and psychology bugs me. But I can't figure out whether I'm long-term wrong, i.e. whether the future will look on the distinction between software and wetware (Ai and human intelligence) as irrelevant.
  3. Soft U2F -- Authenticators are normally USB devices that communicate over the HID protocol. By emulating a HID device, Soft U2F is able to communicate with your U2F-enabled browser, and by extension, any websites implementing U2F. Improves site security by preventing phishing. (The magic numbers
    Continue reading "Four short links: 25 July 2017"

Data science startups focus on AI-enabled efficiency

Recapping winners of the Strata San Jose Startup Showcase. Every five years, we invent a new technology that, when sprinkled atop existing business problems, acts as a panacea for managers. In the ‘90s it was the Web, followed quickly by SaaS, mobility, clouds, data, and now AI. But there's a bigger underlying pattern. Web and SaaS gave us interfaces anyone could use. Mobility made them ubiquitous, taking us away from the workday—we check our phones dozens of times a day, and, often, they're the last thing we look at before sleep and the first thing we grab upon waking. Clouds gave us elastic, on-demand computing. Big data gave clouds something to do. And AI is a set of algorithms that make sense of that big data, teasing threads of gold from the digital hay. Take, for example, the winners of the Strata San Jose Startup Showcase. From a field of
Continue reading "Data science startups focus on AI-enabled efficiency"

How (not) to approach persistence testing in Java and Groovy

Manage resources and fixtures with Spock's lifecycle hooks. Testing persistence is one of the most frequently encountered types of integration test. If done incorrectly it can mean death to a suite of tests because they will run slowly and be incredibly brittle. One of the most common antipatterns encountered is to test everything by reference to a single monolithic fixture containing an ever-growing volume of data that attempts to cater to every corner-case. The fixture will soon become the victim of combinatorial explosion—there are so many combinations of entities in various states required that the sheer amount of data becomes overwhelming and impossible to monitor. Tests based on monolithic fixtures tend to be replete with false moniker testing—"I know the pantoffler widget is the one set up with three variants, one of which has a negative price modifier and is zero-rated for tax." Continue reading How (not) to
Continue reading "How (not) to approach persistence testing in Java and Groovy"