Saturday, July 6, 2019

Bad data!!!

In the world with increasing data, data from sensors (IOT), data from forms filled by customers (digitization of paper forms), data from social media posts and comments, data from your feedback. Some of these data is not collected properly that leads to 'bad data'. These bad data can skew all analysis that is done on them.

Bad data can be classified into two types: one that is keyed in incorrectly due to fat fingering otherwise called typo. The other type is faux like in faux leather meant to deceive or is false information. The first type is easy to correct, but the second one is tough as it is based on context.

An example is contact forms in many sites that is required to be filled in before you download an article or fully see an article. Users can key in false information to get access to the content it is tough to validate and one just hopes that someone types in the correct information.Names and Email ids are tough ones as there could be many variations like Kari, Cari, Carey, Cary, Cory, Corey etc.

One of the most common solution to solve the fat fingering is using input data/data suggestions for address, so that people do not key in incorrect information by fat fingering. This is field validation of the past with boundary conditions on what the user could enter. We are getting better at suggesting the inputs so to let user avoid mistakes as they lead to Bad Data - the topic of the article.

Machine learning and artificial intelligence would solve the second category where people provide false information and use pattern to identify them and reject the information right at the point of entry. The user experience will be frustrating if the solution has high false positives.And since people can get really creative, the solution would more likely use an unsupervised learning instead of supervised learning requiring training data set.

This indeed is an interesting problem with Bad Data!

No comments:

Post a Comment