Matching Data is a clustered search technology applied to the TAUS Data Cloud repository and to web-crawled data. Matching Data uses an example data set and returns matches according to relevance on a segment level across files and domains. With this methodology developers of MT engines can create data sets tuned to their own domains.

How it works:

  1. Query corpus submission: User provides a query corpus and a profile of the data they are looking for (domain name, languages, domain description)
  2. Data matching: Based on a query corpus the best matching data in the TAUS Data Cloud is identified, on a segment-level basis
  3. Selection creation: Data selections are created, with different matching rates (Compact, Medium, Large)
  4. Selection review and choice: The user chooses the most fitting match rate(s) and languages
  5. Payment and download: After the payment, the data is ready for download

