Matching Data is a clustered search technology applied to the TAUS Data Cloud repository and to web-crawled data. Matching Data uses an example data set and returns matches according to relevance on a segment level across files and domains. With this methodology developers of MT engines can create data sets tuned to their own domains.
How it works:
Query corpus submission: User provides a query corpus and a profile of the data they are looking for (domain name, languages, domain description)
Data matching: Based on a query corpus the best matching data in the TAUS Data Cloud is identified, on a segment-level basis
Selection creation: Data selections are created, with different matching rates (Compact, Medium, Large)
Selection review and choice: The user chooses the most fitting match rate(s) and languages
Payment and download: After the payment, the data is ready for download