What’s in a transport layer?

Understanding gRPC in the dawn of microservices. Microservices are small programs, each with a specific and narrow scope, that are glued together to produce what appears from the outside to be one coherent web application. This architectural style is used in contrast with a traditional "monolith" where every component and sub-routine of the application is bundled into one codebase and not separated by a network boundary. In recent years microservices have enjoyed increased popularity, concurrent with (but not necessarily requiring the use of) enabling new technologies such as Amazon Web Services and Docker. In this article, we will take a look at the "what" and "why" of microservices and at gRPC, an open source framework released by Google, which is a tool organizations are increasingly reaching for in their migration towards microservices.

Why Use Microservices?

To understand the general history and structure of microservices emerging as an architectural pattern, this
Continue reading "What’s in a transport layer?"

Query the planet: Geospatial big data analytics at Uber

A deep dive into Uber's engineering effort to optimize geospatial queries in Presto. From determining the most convenient rider pickup points to predicting the fastest routes, Uber aims to use data-driven analytics to create seamless trip experiences. Within engineering, analytics inform decision-making processes across the board. One of the distinct challenges for Uber is analyzing geospatial big data. City locations, trips, and event information, for instance, provide insights that can improve business decisions and better serve users. Geospatial data analysis is particularly challenging, especially in a big data scenario, such as computing how many rides start at a transit location, how many drivers are crossing state lines, and so on. For these analytical requests, we must achieve efficiency, usability, and scalability in order to meet user needs and business requirements. To accomplish this, we use Presto in our production environment to process the big data powering our interactive SQL engine.
Uber’s Presto architecture
Uber’s Hadoop infrastructure
example geofence
QuadTree indexes
Using QuadTree to index San Francisco
Presto optimizes a query using QuadTree
Continue reading "Query the planet: Geospatial big data analytics at Uber"

Four short links: 19 September 2017

BMI, Govt Apps Threatened, Geospatial Jupyter, and W3C Adds DRM to HTML (*spit*).
  1. Brain Machine Interface Isn't SF Any More (Wired) -- the demo is typing without a keyboard, the article is really about the CEO (started Internet Explorer, got a classics degree at 30, then got a PhD in neuroscience).
  2. Is Apple About to Accidentally Kill Government as a Platform? (Jen Pahlka) -- In an effort to reduce the proliferation of spam apps, Apple changed its App Store review guidelines to ban “apps created from a commercialized template or app generation service.” In what appears to be a misguided interpretation of an otherwise reasonable rule, Apple has decided to included white-labeled government apps in this category.
  3. geonotebook -- A Jupyter notebook extension for geospatial visualization and analysis.
  4. World Wide Web Consortium Abandons Consensus, Standardizes DRM, EFF resigns (Cory Doctorow) -- EFF no longer believes that the W3C process
    Continue reading "Four short links: 19 September 2017"

Four short links: 18 September 2017

AI Journos, AI Hype, Faces from Photos, and Regulating Online Advertising
  1. AI-Produced Journalism -- In its first year, the Post has produced around 850 articles using Heliograf. That included 500 articles around the election that generated more than 500,000 clicks — not a ton in the scheme of things, but most of these were stories the Post wasn’t going to dedicate staff to anyway. [...] It’s unclear how that approach can be scaled to cover local communities, where the digital news model has fallen short. Heliograf can be used to digest data like standardized test scores and crime stats; covering a zoning board meeting is another matter. And AI isn’t being used beyond big news organizations, Lewis pointed out. “There’s such a huge gap between the AI haves and have-nots. We are many years away from these things being implemented at the local level.”
  2. Deep Learning Hype in
    Continue reading "Four short links: 18 September 2017"

Four short links: 15 September 2017

Hardware Life Tetris, VR-64, Face Average, and LoRa Backscatter
  1. Tetris From the Ground Up -- quixotic brilliance. Hardware to Game of Life to Tetris.
  2. VR Goggles For C64 -- I built the VR64 using three components: a $10 plastic VR goggle, a $26 LCD, and a cheap power transformer (plus lots of glue gun fun!). I split the screen into two sections, one for the left eye and one for the right. Each section is 19 columns by 25 rows, and the center two rows are not used. Each eye, has 152X200 pixels in high resolution and only 76X200 in multi-color mode! (via Vice)
  3. The Average Face of a UK Member of Parliament -- the idea of a facial mean disconcerts me still.
  4. LoRa Backscatter -- they reverse-engineered the proprietary LoRa physical layer to do this! (Readable article about the tech also available, explaining why this is
    Continue reading "Four short links: 15 September 2017"

Visualizing convolutional neural networks

Building convnets from scratch with TensorFlow and TensorBoard. Given all of the higher level tools that you can use with TensorFlow, such as tf.contrib.learn and Keras, one can very easily build a convolutional neural network with a very small amount of code. But often with these higher level applications, you cannot access the little inbetween bits of the code, and some of the understanding of what’s happening under the surface is lost. In this tutorial, I’ll walk you through how to build a convolutional neural network from scratch, using just the low-level TensorFlow and visualizing our graph and network performance using TensorBoard. If you don't understand some of the basics of a fully connected neural network, I highly recommend you first check out Not another MNIST tutorial with TensorFlow. Throughout this article, I will also break down each step of the convolutional neural network to its absolute
four convolutional network
visualized convolutional filters 1
visualized convolutional filters 2
a 2x2 kernel with a stride of 2
max pool with a 3x3 kernel, with a stride of 1x1
visualized graph model
training data accuracy and loss
adding dropout to the training
Visualize evolving filters 1
Visualize evolving filters 2
Continue reading "Visualizing convolutional neural networks"

Load, search, and secure data in multiple formats

The O'Reilly Podcast: Dave Cassel on building a unified enterprise database to store and query any type of data. In this podcast episode, I speak with Dave Cassel, technical community manager at MarkLogic, creator of a multi-model NoSQL database that aims to integrate data silos for a unified view. We talked about integration patterns for loading and exporting data at ease, an architecture that enables efficient search and queries, and layers of security that follow the data from its original source throughout its lifecycle.

Work on applications, as soon as you load the data

The idea of 'load as-is' is that your data already exists in some form, and that form can vary dramatically. It can be word documents or XML or JSON data. It can also be stuff that you've already got in relational databases. The idea here is that if we can take that data in whatever form
Continue reading "Load, search, and secure data in multiple formats"