I’ve been involved in building a system that reads structured data from a special form of contracts from a specific industry. Prices, clauses, pick up, delivery, etc. A couple hundred datapoints per contract. We had many discussions around how to present and sell an imperfect system. The thing is, the potential customers are today transcribing the contracts manually and we quickly realized that people make a ton of mistakes doing that. It became obvious when we were working on assertion datasets ourself. It’s not a perfect system and you have to consider how you use the data (aggregating for price indexing for instance), but we’re actually doing better than what people are achieving when they have to transcribe data for hours a day.