How Does Siri Work: Technology and Algorithm
Ever since we saw Sci-fi movies such as Star Trek or 2001: A Space Odyssey we became enamored with the idea of a computer that we can correspond with. Then finally, in 2010, our dreams came true and Siri leaped out of our imaginations and into our cell phones and tablets.
Most iPhone users love the personal virtual assistant, known as Siri, that comes built into the phone itself. You can use Siri for almost anything such as find GPS navigation, organize your schedule make a call or text and many other functions that have become much easier thanks to Siri’s speech recognition software. However, exactly what is the technology behind Siri? Let’s take a closer look.
How Siri Works Technically
Siri uses two main technologies: speech recognition and natural language processing (NLP). The first technology is taking the words that a human being said and converting it into a textual form. In practice, when beginning a sentence with the words “Hey, Siri” you activate Apple’s speech recognition software that changes your words into written form. However, this is not so simple because every person has a unique voice timbre and accent which may vary from state to state and country to country.
Apple uses huge, datasets to provide Siri with an effective model of speech recognition which is then trained on varying datasets that are made up of voice samplings from lots of people, which allows Siri to recognize all sorts of accents, inflections, and pace of speech.
Over the past couple of years, there have been many developments in deep learning and the mistake rate of such software has dropped below 10%. When you give Siri a command or ask a question, Siri comprehends your speech, but it sends the converted text back to Apple for additional processing. Apple’s servers would run additional NLP algorithms to get the gist of your question or request. For example, there are many ways of asking Siri to remind you about a meeting. One option is “Hey Siri, can you remind me about the meeting tomorrow at 11?” or “Can you give me a heads up about the meeting tomorrow at 11?” Siri first needs to figure out from all of the various ways formulating a request or question that you would like it to remind you about a meeting tomorrow at 11.
If your phone is not connected to the internet, this could be a problem since Siri does not process your speech on your phone, but on the bright side, there are also a couple of benefits. First of all, by offloading the lion’s share of the work to powerful computers, it allows you to save valuable resources and the data that is collected, is used to continuously improve Siri’s performance.
Such an analysis of your intent requires an extraordinary amount of data to train NLP algorithms. This is why Apple hires lots of engineers who have previously worked with the above-mentioned technologies to train the Siri algorithm. Also, let’s not forget, that when Siri receives a response from Apple’s servers, it must then take the text and convert it into speech, which is not as difficult when compared to processing user command, but it still requires effort on the part of Siri.
If we return to the movie 2001: A Space Odyssey we mentioned earlier, in that movie there was a supercomputer named Hal and when someone asked it a question “What is the meaning of life?” it came back with a humors answer: “24” Just like the supercomputer in the movie, technology today cannot grasp user intent, detect sarcasm, humor, wit and many forms of expressions we use every day. Now even though that the development of a software that can detect all of our intentions even more NLP training and larger data sets the kinds used to train IBM’s Watson. In order to take Siri to the next level, there need to be additional layers of information retrieval and automated reasoning to detect all of the subtleties of our speech that exist even though we barely notice it.
Even though, as we mentioned, there has been a lot of progress in NLP and deep learning technology, users are sometimes frustrated by Siri’s lack of comprehension and often disable Siri. As the new models of iPhones come out, pay close attention to new developments in Siri’s technology because it will be able to detect much more than just your speech, but also tone and voice pattern so you will not have to say things like “Put an exclamation point at the end of that sentence.” Everything will be done automatically. So the next time you take out your iPhone and ask Siri a question, keep in mind the complex processes that take place in seconds because it marks a new age in smart technology.