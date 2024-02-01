TechCrunch

Called OLMo, an acronym for "Open Language Models," the models and the dataset used to train them, Dolma -- one of the largest public datasets of its kind -- were designed to study the high-level science behind text-generating AI, according to AI2 senior software engineer Dirk Groeneveld. "We expect researchers and practitioners will seize the OLMo framework as an opportunity to analyze a model trained on one of the largest public data sets released to date, along with all the components necessary for building the models." By contrast, the OLMo models, which were created with the help of partners including Harvard, AMD and Databricks, ship with the code that was used to produce their training data as well as training and evaluation metrics and logs.