Analog Deep Learning, Low-Trust Internet, Media Literacy, and Psych Experiments
The Next Generation of Deep Learning: Analog Computing (IEEE) — Further progress in compute efficiency for deep learning training can be made by exploiting the more random and approximate nature of deep learning work flows. In the digital space that means to trade off numerical precision for accuracy at the benefit of compute efficiency. It also opens the possibility to revisit analog computing, which is intrinsically noisy, to execute the matrix operations for deep learning in constant time on arrays of nonvolatile memories. (Paywalled paper)
The Internet is Increasingly a Low-Trust Society (Wired) — Zeynep Tufecki nails it. Social scientists distinguish high-trust societies (ones where you can expect most interactions to work) from low-trust societies (ones where you have to be on your guard at all times). People break rules in high-trust societies, of course, but laws, regulations, and norms
Wacky Timestamps, Computers and Spies, Surveillance Capitalism, and Twitter Adventures
NTFS Timestamps — a 64-bit value representing the number of 100-nanosecond intervals since January 1, 1601 (UTC). WTAF?
Computers Changed Spycraft (Foreign Policy) — so much has changed—eg., dead letter drops: It is easy for Russian counterintelligence to track the movements of every mobile phone in Moscow, so if the Canadian is carrying her device, observers can match her movements with any location that looks like a potential site for a dead drop. They could then look at any other phone signal that pings in the same location in the same time window. If the visitor turns out to be a Russian government official, he or she will have some explaining to do.
Private Computation, Robot Framework, 3D Objects, and Self-Supervised Learning
Private Join and Compute (Google) — This functionality allows two users, each holding an input file, to privately compute the sum of associated values for records that have common identifiers. (via Wired)
PyRobot — from CMU and Facebook. PyRobot is a framework and ecosystem that enables AI researchers and students to get up and running with a robot in just a few hours, without specialized knowledge of the hardware or of details such as device drivers, control, and planning.
PartNet — a consistent, large-scale data set of 3D objects annotated with fine-grained, instance-level, and hierarchical 3D part information. Our data set consists of 573,585 part instances over 26,671 3D models covering 24 object categories. This data set enables and serves as a catalyst for many tasks such as shape analysis, dynamic 3D scene modeling and simulation, affordance analysis, and others.
Bodies in Seats — the story of Facebook’s 30,000 content moderators: contractors, low pay (as little as $28,800 a year), and a lot of PTSD for everyone. “Nobody’s prepared to see a little girl have her organs taken out while she’s still alive and screaming.” Moderators were told they had to watch at least 15 to 30 seconds of each video.
Dialog — a domain-specific language for creating works of interactive fiction. Inspired by Inform and Prolog, they say.
End-User Probabilistic Programming — We examine the sources of uncertainty actually encountered by spreadsheet users, and their coping mechanisms, via an interview study. We examine spreadsheet-based interfaces and technology to help
The O’Reilly Data Show Podcast: Nick Pentreath on overcoming challenges in productionizing machine learning models.
In this episode of the Data Show, I spoke with Nick Pentreath, principal engineer at IBM. Pentreath was an early and avid user of Apache Spark, and he subsequently became a Spark committer and PMC member. Most recently his focus has been on machine learning, particularly deep learning, and he is part of a group within IBM focused on building open source tools that enable end-to-end machine learning pipelines.
Voice2Face, DIY Minivac, Cloud Metrics, and Envoy for Mobile
Speech2Face: Learning the Face Behind a Voice — complete with an interesting ethics discussion up front. I wonder where this was intended to go: after all, it can’t perfectly reconstruct faces, so what you get is a stereotype based on the voice. Meh.
Minivac 601 Replica (Instructables) — Created by information theory pioneer Claude Shannon as an educational toy for teaching digital circuits, the Minivac 601 Digital Computer Kit was billed as an electromechanical digital computer system.
Nines Are Not Enough: Meaningful Metrics for Clouds — We show that this problem shares some similarities with the challenges of applying statistics to make decisions based on sampled data. We also suggest that defining guarantees in terms of defense against threats, rather than guarantees for application-visible outcomes, can reduce the complexity of these problems.
A look at the landscape of tools for building and deploying robust, production-ready machine learning models.
Our surveys over the past couple of years have shown growing interest in machine learning (ML) among organizations from diverse industries. A few factors are contributing to this strong interest in implementing ML in products and services. First, the machine learning community has conducted groundbreaking research in many areas of interest to companies, and much of this research has been conducted out in the open via preprints and conference presentations. We are also beginning to see researchers share sample code written in popular open source libraries, and some even share pre-trained models. Organizations now also have more use cases and case studies from which to draw inspiration—no matter what industry or domain you are interested in, chances are there are many interesting ML applications you can learn from. Finally, modeling tools are improving, and
Why Are We So Pessimistic? (Brookings) — The belief or perception that things are much worse than they really are is widespread, and I believe it comes with significant detrimental impacts on societies.
Machine learning solutions for data integration, cleaning, and data generation are beginning to emerge.
“AI starts with ‘good’ data” is a statement that receives wide agreement from data scientists, analysts, and business owners. There has been a significant increase in our ability to build complex AI models for predictions, classifications, and various analytics tasks, and there’s an abundance of (fairly easy-to-use) tools that allow data scientists and analysts to provision complex models within days. As model building become easier, the problem of high-quality data becomes more evident than ever. A recent O’Reilly survey found that those with mature AI practices (as measured by how long they’ve had models in production) cited “Lack of data or data quality issues” as the main bottleneck holding back further adoption of AI technologies.
Even with advances in building robust models, the reality is that noisy data and incomplete data remain the biggest hurdles to
Multiverse Databases, Detecting Photoshopping, Simulation Platform, and Tail-Call Optimization: The Musical
Towards Multiverse Databases (Morning Paper) — The central idea behind multiverse databases is to push the data access and privacy rules into the database itself. The database takes on responsibility for authorization and transformation, and the application retains responsibility only for authentication and correct delegation of the authenticated principal on a database call. Such a design rules out an entire class of application errors, protecting private data from accidentally leaking.
Open Sourcing AI Habitat (Facebook) — a new simulation platform created by Facebook AI that’s designed to train embodied agents (such as virtual robots) in photo-realistic 3D environments. […] To illustrate the benefits of this new platform, we’re also sharing Replica, a data set of
Information Operations, Game Creator, History Lessons, and Physical Pen Testing
Information Operations on Twitter: Principles, Process, and Disclosure (Twitter) — We believe that people and organizations with the advantages of institutional power and which consciously abuse our service are not advancing healthy discourse but are actively working to undermine it. By making this data open and accessible, we seek to empower researchers, journalists, governments, and members of the public to deepen their understanding of critical issues impacting the integrity of public conversation online, particularly around elections. This transparency is core to our mission. Twitter is leading in this area; it’s great to see. I hope this makes others lift their game.