We are introducing OpenAI Data Partnerships, where we’ll work together with organizations to produce public and private datasets for training AI models.
Modern AI technology learns skills and aspects of our world—of people, our motivations, interactions, and the way we communicate—by making sense of the data on which it’s trained. To ultimately make AGI that is safe and beneficial to all of humanity, we’d like AI models to deeply understand all subject matters, industries, cultures, and languages, which requires as broad a training dataset as possible.
Including your content can make AI models more helpful to you by increasing their understanding of your domain. We’re already working with many partners who are eager to represent data from their country or industry. For example, we recently partnered with the Icelandic Government and Miðeind ehf to improve GPT-4’s ability to speak Icelandic by integrating their curated datasets. We also partnered with non-profit organization Free Law Project, which aims to democratize access to legal understanding by including their large collection of legal documents in AI training. We know there may be many more who also want to contribute to the future of AI research while discovering the potential of their unique data.
Data Partnerships are intended to enable more organizations to help steer the future of AI and benefit from models that are more useful to them, by including content they care about.