Artificial intelligence as we know it is hilariously, gravely flawed. Because machines require human input to infer and learn from and because the resources — be it the actual people, the training assets, or otherwise — being put towards AI at our largest technological institutions tend to be characteristically male and white, those algorithms tend to carry certain intrinsic biases. At I/O 2019, AI monolith Google showcased a few ways it's trying to correct those slants.
One of those ways is something called Testing with Concept Activation Vectors, or TCAV, which has been in the works for years. Google CEO Sundar Pichai gave a brief mention of TCAV in his I/O keynote yesterday, which you can see in the video here. You can also check out Google Brain research Been Kim give a talk about it here.
In short, TCAV is analysis tool which allows developers to inspect their algorithms and find what concepts or big-picture aspects the AI is associating with their samples. Whereas neural networks are peering into clusters of pixels at a time to make sense of a picture, we conceptualize zebras to be black-and-white striped horses — TCAV is able to go in post-mortem and deliver analysis of what human-comprehensible concepts the network is prioritizing.
That said, TCAV is not necessarily a means within itself to end bias — developers will have to go back and tweak their AI themselves to fix those problems — but this is a big step in the right direction if there's enough good intention and effort behind it.
A separate, but somewhat related development with regards to reducing bias is federated learning. Google has applied it to Gboard in order to suggest user-generated words in common parlance to other users quickly. It's able to do that by implementing machine learning right from the end users' phones.
The phones get a master word database model from Google. Users then start typing new words like "Okurrr" into Gboard, which then gets inserted into the device's training data. The devices later report only the new word data back to Google. No contextual data — words surrounding the new word — is collected, meaning that only the occurrence of the new word is reported.
Being able to retrieve the "photo negatives" from users could better anonymize mass data gathering and could open up potential sample sizes for more developers.
There's plenty more to work on when it comes to reducing potential bias in artificial intelligence. Given how quickly it's rolling out by entities like police departments and the Chinese government, that work can't be done fast enough.