The human ears of Alexa, Google and Siri
by Volker Weber
It does not really matter whether it is Alexa or Google or Siri - there are humans who grade the results of voice recognition. And yes, sometimes, you may have triggered your voice assistant when you did not mean to.
We know that Amazon and Google keep your voice recordings, at least until you delete them. And yes, you can find them in your account. What I am missing here is a transparent disclosure from Apple. The only thing I can find in their privacy statement is this:
For research and development purposes, we may use datasets such as those that contain images, voices or other data that could be associated with an identifiable person. When acquiring such datasets, we do so in accordance with applicable law in the jurisdiction in which the dataset is hosted. When using such datasets for research and development, we do not attempt to re-identify individuals who may appear therein.
As far as I know Apple does not store Siri recordings in your account. Whatever they have is anonymized and not linked to your Apple ID.
Comments
About the human intervention I’m getting more headaches why this part in the training process. I would guess to train the service getting this information the next time on its own.
Preventing the control with human ears would only make results a little bit more fuzzy.
I really would like to see Apple (or someone else) to really play the security role more consequently. For example, making the complete voice service architecture transparent and encrypted. Also would like to see a switch to deny internet access for apps in iOS.
Humans listening to training examples would be no problem if it’s guaranteed that everything is already recorded anonymously.
It will impossible and for the purpose of humans reviewing impossible to make the recordings/data really anonymous. While you can remove the name/account data from the recording there needs to be some other data to allow humans to review and analyse the responses.
To judge if the assistant gave a useful/correct answer to a geographical question (e.g. travel directions, distances or local information) you need to know the location of the person asking the question (e.g. to determine if the assistant gave information about the most logical/correct place if there are several places of the same name).
To judge if the assistant correctly identifies names you again need to hear the names, another data point to identify someone.
And I'm sure there will be more examples. Real anonymity is pretty much impossible.
Benjamin, it’s quite simple: data quality. To get to the best results they got to constantly control and improve and intervene in case the learning goes off the rails.
Consider my Netatmo security camera and image recognition - it always suggets thta my robot mower is a car. The model has no concept of a four wheel vehicle that size not being a car, it just has the concept of four wheel vehicle it seems. To improve the quality I have to give the feedback that its wrong there, otherwise it couldn’t improve.
„Siri and Dictation do not associate this information with your Apple ID, but rather with your device through a random identifier.“ und „You can reset that identifier at any time by turning Siri and Dictation off and back on, effectively restarting your relationship with Siri and Dictation. When you turn Siri and Dictation off, Apple will delete the User Data associated with your Siri identifier, and the learning process will start all over again.“ von
https://www.apple.com/privacy/approach-to-privacy/
The point is - why not let the user rate the quality of the answer and/or flag the answer for review? There was never a feature to give feedback.
You can't. There is no record of what you said, that would be recognized as yours.
Martin, yes I know the voice recognition service is trained this way but that was not my point. This whole discussion came up because some real persons listen to recordings to understand what was said. The software trained by this persons is already able to do the same in most cases. And this is mandatory for the voice service to work. But like a google search the provider will also most likely analyze recordings (hits/false-positives) and gain some insights. The chewed data does not have to be stored. Only for data processing.
Armin, some company could spent effort on transparent and secure services and hardware. Specially for the voice recognition training it's not required to connect recordings with user identities at all. I think there is a market for a company that follows a more user centric strategy. And I bet this will not be based on Android.
Benjamin, I think you're massively missing the point.
For good voice recognition (or any search results, assistants etc) context is everything. To provide the best possible answers knowing a lot about the person asking a question makes a huge difference. Location, interests, previous questions, all those kinds of things. Without context the quality of the answer will simply be inferior.
If I ask you "can you recommend me a good restaurant" and either
a) you know nothing about me
b) you know what kind of food I like, you know where I live, you know where I am, you know how much I'm able/willing to spend
In which scenario are you more likely to provide a good, useful answer that will satisfy me, a or b?
This doesn't have to be connected to a name/identity. The name can be removed. It's largely irrelevant. But you still need the data.
However...
With all that data and information it's a doddle to de-anonymise the data and figure out who is the name/identity behind the data. Plenty of reports out there on the interwebs how people have been identified simply by their location data. Or other combinations of data.
Exactly the data you need to create a useful voice recognition / assistant type of service.
Anonymity is hard. Very hard. Extremely hard.
Well, well, well. And why exactly does Google need to know all that data, instead of my device?


