The Social Science Variables Database (SSVD) enables ICPSR users to examine and compare variables and questions across studies or series. The SSVD currently includes over 5 million variables, representing about 76% of ICPSR's holdings that have quantitative data described in statistical syntax.
The SSVD builds on a pilot project funded by the National Science Foundation. It demonstrates the benefits of using structured variable-level documentation in XML, tagged according to the Data Documentation Initiative standard.
The intimidating amount of data and associated materials can create a barrier to use, and new-to-data users can be especially put off by the thousands of datasets, variables, and related publications. Giving them some tools to navigate ICPSR similar to a typical database for literature often helps. Searching by topic, discipline, year(s), variable/survey question, keyword—these make searching ICPSR more relatable to previous searching experiences.
Using the toolbar’s “FIND DATA” dropdown offers options to find data using keywords, variables, publications, and additional resources. The “SEARCH/COMPARE VARIABLES” option offers a robust search strategy for variable types or names and survey questions, which provides a more systematic approach to going page-by-page through survey documentation. It also allows for methodological studies on the development of measures over time and determining reliability and validity measurements for similar variables.
Use the “ICPSR Bibliography of Data-related Literature” search as a starting point to link literature to specific datasets. Databases we use to search for literature often don’t link to the dataset used, but doing a similar search in ICPSR will link users directly to the data and relevant documentation for that article.
Despite their similarities, sometimes data-related keywords can complicate the search strategies. For example, longitudinal versus cross-sectional versus ethnography all indicate different methods and resulting datasets. Providing a glossary of keywords/definitions for common ICPSR words or a controlled vocabulary thesaurus can be helpful.
After finding a dataset worth looking into, the documentation can be difficult to navigate, so they have a great resource on Codebooks. Because data is offered in multiple formats, I recommend only downloading the data users plan on using. If users click on the “Quick Download” they’ll get all the documentation and file types, so the download folder will be unnecessarily massive.
Health Data Example: Many health datasets are longitudinal studies and will have their waves separated in ICPSR, which can be misleading because they look like cross-sectional data. For example, a useful dataset for health research, there are four waves for Midlife in the United States (MIDUS) Series that are typically analyzed together in an aggregated file, but in ICPSR they are listed separately along with many other MIDUS-related studies. This can be confusing insofar as identifying the correct waves needed to merge into a comprehensive file for over-time analyses. It’s nice when datasets are already merged, like the General Social Survey in ICPSR that offers cumulative files for each additional year. There is a "Health and Facilities" topic that results in over 700 datasets.
ICPSR partners with several federal statistical agencies and foundations to create collections organized around specific topics. These thematic collections bring a dynamism to ICPSR from which the broader social science research community benefits. The sponsors provide new data (in most cases free to everyone), which stimulate more research. The funded collections and ICPSR work together to build additional infrastructure for data discovery and use.
There are many collections available on ICPSR's website, here are a few of the most popular:
Florida State University Libraries | 116 Honors Way | Tallahassee, FL 32306 | (850) 644-2706