Here we provide various datasets used for the Text to Scene Generation projects.

Stanford Text2Scene Spatial Learning Dataset

This is the dataset associated with the paper
Learning Spatial Knowledge for Text to 3D Scene Generation. Angel Chang, Manolis Savva, and Christopher D. Manning. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP 2014). [pdf, bib]
In this dataset we provide the 3D scene and 3D model data as well as extracted statistics, useful precomputed attributes of the objects and their relations, and a spatial relation descriptions dataset collected from people that we used to ground spatial relation terms. Download the archive below and see the included README files for more details. Please cite the above paper if you use this data in your research. (2016-05-30, 78M)

Scenes and Descriptions for Text to Scene Generation


As part of the Text to Scene Generation project, we collected a dataset of over a thousand 3D scenes and several thousand descriptions of these scenes.

We use this dataset in the paper

Text to 3D Scene Generation with Rich Lexical Grounding. Angel Chang, Will Monroe, Manolis Savva, Christopher Potts, and Christopher D. Manning. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference of the Asian Federation of Natural Language Processing (ACL-IJCNLP 2015). [pdf, bib, slides]
to improve object selection in our system for 3D scene generation from text. We anticipate that this dataset will also find use in related problems involving grounded natural language understanding and generation.

Note: this dataset is still a beta release. Please report bugs to . Anonymized worker and task IDs are likely to change.


The data zipfile contains the dataset itself, in JSON format; the images zipfile is large and contains images of the models and scenes. This can be useful for understanding and visualizing the data. The precomputed zipfile contains precomputed information about the scenes, including bounding boxes of objects, relative positions, and contact points. (2015-05-21, 1.2M) (2015-05-21, 252M) (2016-05-06, 5.9M)

For a preview of the data, the scenes and descriptions can also be browsed online.

For the actual 3D model dataset used in the scenes, please contact . The 3D model data used in the scenes is a subset of the ShapeNetSem dataset from ShapeNet, an effort to provide semantic annotations for 3D models. To download them, you will need to agree to the ShapeNet terms of use.