View the high resolution version of today’s graphic by clicking here.
It only takes a few minutes of cable news to get the feeling that the world is heading into a tailspin.
Endless images of homicide investigations, natural disasters, car crashes, and drug busts fill the airwaves on a daily basis. It’s upsetting – but also certainly captivating for the average viewer.
In fact, the news cycle thrives on fear and violence, so mainstream networks find a way to fill up 99% of programming with these singular events. It’s addicting and sometimes anger-inducing, but is it representative of what’s really going on in the world?
Good News Happens Slowly
Today’s infographic comes to us from economist Max Roser of Our World in Data, and it highlights six megatrends that show that in
A look at the stages of designing an API and service.
The growing popularity of microservices and APIs has led developers to automate the repetitive aspects of these technologies, such as defining all the RESTful requests, generating templates for the target code, and scaling out new instances. In this article, I’ll explore the tasks in designing a microservice and suggest the parts that can respond to automation.
Microservices are just a cloud-friendly and Agile-friendly implementation of the decades-old concept of modularity. I suggested the radical modular breakdown of applications in a 2002 article, saying that: “the application, once the central fixture of computer use, would dissolve into a soup of servers, application fragments, and user interfaces.” By encapsulating tasks into separate services, parts of an application can be developed independently, deployed efficiently as containers, and scaled out automatically with new instances as traffic grows.
The Gotham Gal was an early seed investor in Ring’s predecessor Doorbot, which came out of an incubator called Edison Jr that she was also an investor in. So it was a nice win for her (and me since we are partners in everything).
But the story I want to tell is about Simulscribe.
Ring’s founder and CEO, Jamie Siminoff, started a company called Simulscribe (now called PhoneTag) back in the early 2000s. I met him around the time USV got started in 2003/2004 and the value proposition for Simulscribe was so compelling to me:
stop checking voice mails, get your voice mails translated and sent to you in email
I loved it and the Gotham Gal and I became early users. We still use the PhoneTag service!!
A comparison of the accuracy and performance of Spark-NLP vs. spaCy, and some use case recommendations.
This is the third and final installment in this blog series comparing two leading open source natural language processing software libraries: John Snow Labs’ NLP for Apache Spark and Explosion AI’s spaCy. In the previous two parts, we walked through the code for training tokenization and part-of-speech models, running them on a benchmark data set, and evaluating the results. In this part, we compare the accuracy and performance of both libraries on this and additional benchmarks, and provide recommendations on which use cases fit each library best.
Plugging in our data is a challenging step since our data is made of unformatted, non-sentence-bounded text that is raw and heterogeneous. I am working with a folder full of .txt files, and need to save the results in word-tag format, by filename so I can compare it to the correct answers later. Let’s work it out:
start = time.time()
path = "./target/testing/"
files = sorted([path + f for f in os.listdir(path) if os.path.isfile(os.path.join(path,
A step-by-step guide to initialize the libraries, load the data, and train a tokenizer model using Spark-NLP and spaCy.
The goal of this blog series is to run a realistic natural language processing (NLP) scenario by utilizing and comparing the leading production-grade linguistic programming libraries: John Snow Labs’ NLP for Apache Spark and Explosion AI’s spaCy. Both libraries are open source with commercially permissive licenses (Apache 2.0 and MIT, respectively). Both are under active development with frequent releases and a growing community.
The intention is to analyze and identify the strengths of each library, how they compare for data scientists and developers, and into which situations it may be more convenient to use one or the other. This analysis aims to be an objective run-through and (as in every natural language understanding application, by definition) involves a good amount of subjective decision-making in several stages.
A look at the new streaming SQL engine for Apache Kafka.
Modern businesses have data at their core, and this data is changing continuously at a rapid pace, with increasing volumes. Stream processing allows businesses to harness this torrent of information in real time, and tens of thousands of companies like Netflix, Uber, Airbnb, PayPal, and The New York Times use Apache Kafka as the streaming platform of choice to reshape their industries. Whether you are booking a hotel or a flight, taking a cab, playing a video game, reading a newspaper, shopping online, or wiring money, many of these daily activities are powered by Kafka behind the scenes.
However, the world of stream processing still has a very high barrier to entry. Today’s most popular stream processing technologies, including Apache Kafka’s Streams API, still require the user to write code in programming languages such as Java or Scala.
Attend a day-long exploration of Jupyter’s best practices and practical use cases in business and industry.
O’Reilly and NumFOCUS will present Jupyter Pop-up Boston on March 21 at District Hall, in Boston’s Seaport neighborhood.
The event is a day-long exploration of Project Jupyter in a casual setting, focused on the local community. We’ll have a dozen talks, a panel discussion, an “Ask Me Anything” with experts on the project, plus lots of time to meet and talk with people who share common interests and concerns.
The timing is quite interesting for Jupyter. Success stories from 2016-17, such as the data science program at UC Berkeley, illustrate the power of JupyterHub deployments at scale, in both education and industry. As universities and enterprise firms learn to handle the technical challenges of rolling out hands-on, interactive computing at scale (one of the core themes we’ll explore), the organizational challenges come to
Winning Atari, Explaining Algorithms, Code for America Summit, and Historic AI Book
Benchmarking Canonical Evolution Strategies for Playing Atari — the AI found two clever strategies for succeeding at Qbert: one is to play a level endlessly, where jumping off a level causes an enemy to follow but you get enough points from killing the enemy that you get another life; and in the other the agent discovers an in-game bug. First, it completes the first level and then starts to jump from platform to platform in what seems to be a random manner. For a reason unknown to us, the game does not advance to the second round but the platforms start to blink and the agent quickly gains a huge number of points. There’s video too. (via Brendan Dolan-Gavitt)
In the new podcast, Chris and I talk about Dropbox, its amazing story and its challenges ahead as a public company. The company filed for an IPO recently. We discuss how not all storage is equal, and the best way to extract premium dollars from a storage operation is to combine it with more useful applications. Dropbox, so far, hasn’t succeeded in its ability to get its 500+ million registered users to buy into its app attempts so far.
Some previous posts that I refer to in this podcast:
You can scroll down to the bottom of that CB Insights post to see how they came up with this data.
There are two things about this chart that I would like to talk about:
1/ Getting VC/angel funding is hard, but even if you do secure it once (1st round), the probability that you will secure it again is only 50-70%, and the probability that you will secure it five more times is between 0-5%. That is what CB Insights calls the “VC Funnel.” Before you jump to conclusions, there are many reasons why a company would not raise a second round, a third round, etc. It could close down, which is probably the main reason companies don’t raise a 2nd round, but it could also sell
The O’Reilly Podcast: Modern day DNS for hybrid cloud, intelligent traffic steering, and DevOps.
Modern-day DNS goes beyond a simple internet “phone book” service to provide dynamic traffic management, flexibility, and performance. In this episode of the O’Reilly podcast, I had a chance to discuss modern-day DNS and its role in building resilient infrastructure with Gary Sloper, VP of global sales engineering at Oracle Dyn.
Here are some highlights:
Greater flexibility and resilience in hybrid cloud environments using DNS
As organizations are trying to deliver their content closer to the edge—meaning their end users—they need to intelligently route traffic, and that can be achieved with DNS. In a hybrid cloud environment, DNS can provide flexibility by allow you to route between fixed assets, like a data center, and moveable assets, such as cloud. Within that hybrid strategy, you can utilize DNS, not only for the traditional aspects of what most
Lawyering AIs, Ways of Books, State Machines, and Futures Tools
AI vs. Lawyer — 20 experienced U.S.-trained lawyers were pitted against the LawGeex artificial intelligence algorithm. The 40-page study details how AI has overtaken top lawyers for the first time in accurately spotting risks in everyday business contracts. (via Mashable)
Bestsellers vs. Most Read (John Birmingham) — Books live in To-Be-Read piles. Some of those piles are digital books, and some of them are paper. But many, many, many readers wait until they’re in the mood for a particular kind of book before plucking it off their TBR pile. (Quote is actually by Kathryn Rausch)