AI's 99 percent is not good enough

 An AI that is 99% accurate is not good enough. Think of having to digitize some handwritten document with important end-customer data. Does it make sense to use machine learning if on top of that you need human operators? Think of how many times Alexa gets you wrong? Even when it comes to playing the same song you have requested many times before. But Alexa is not a mission-critical system. You don’t care if it’s 90% accurate that alone 99%, but for critical systems, it seems that AI isn’t delivering the value it is supposed to (how far are we from the promised self-driving dream?).

How much effort will be saved by a human quality controlling every document vs if a human operator digitized them from scratch? I remember a project I was tangentially involved in years ago in the pre-AI area (in fact over 20 years ago), the project manager decided to use three people to digitize the data and then use text comparison so that if two documents match then an algorithm would drop the third. It was outsourcing and the wages were low. Today for sensitive data in the West we only have the luxury of hiring one operator for the job. When an AI system is 99% accurate that operator must quality control every document. Especially when every record needs to be captured correctly. Think of medical data or some criminal background check or even government hand-filled forms that contain your PII data including your social insurance number.

Every computer scientist knows that today the proper solution to this problem is still to move data entry to the end-user. If they input all their information through some webform you don’t need quality control. But good luck having your doctor use a keyboard to write a prescription (doctor’s handwriting is a sign of pride – “I made it”).

Today, it seems most places where AI can add value is for aggregate data capture where errors can be tolerated or automatically mitigated for. For instance, if you digitize doctors' notes for statistical analysis (for instance a survey of how many patients get prescribed some medication), a data scientist could remove the effect of the 1% error by classifying it as noise. A simple technique would just get rid of the 10,000 outliers (non-existing drug names) for the 1,000,000 documents for instance.

Yet when being 100% accurate is primordial, it seems that AI is a distraction. Taking another application, if you had the experience of using auto-translate for whole documents, you will be surprised by the accuracy of the result, yet you will also notice that a lot of the mistakes will escape your reviewing eyes and thus you will end up having more rounds of corrections after the document gets proofread by others.

When it’s important to be right the first time, you will opt for doing the manual translation yourself and only referring to Google’s or Microsoft’s translation services for those technical terms you might not remember or know. The only place for that 99% accurate AI might turn out to be to quality check your work.

The accuracy of English language correction software like Grammarly and Microsoft Office’s Editor is high, and these products continue to improve. Yet even with them, you have to be careful as a New Yorker writer found out. They are only good to some degree and sometimes add more errors to your writing that they fix (this writer used to pass his blogs through a round of Grammarly only to realize that the result was worse than the starting text).

So going back to our problem of digitizing sensitive documents and being 100% accurate, maybe at the end of the day, it makes sense to keep using human operators for the task but have AI quality check their work. The documents which fail automated QA will have another round of checks by humans. This is how you reach 100% accuracy using 1.01 human instead of three.



Comments

Popular posts from this blog

Beyond the Gaps of Weak AI: Deep Learning as the Path to Artificial General Intelligence

The Pincer after the North American Programmer’s Job

SuperIntelligence: A book Review