Exploring the role of datasets sourced from people with disabilities
This research project is funded by Inclusive Information and Communications Technology RERC (90REGE0008) from the National Institute on Disability, Independent Living, and Rehabilitation Research (NIDILRR), Administration for Community Living (ACL), Department of Health and Human Services (HHS). Learn more about the work of the Inclusive ICT RERC.
Project Team: Hernisa Kacorri (Project Investigator); Rie Kamikubo, Utkarsh Dwivedi, Lining Wang, Crystal Marte, Amnah Mahmood (Students)
Screenshot of IncluSet, a data surfacing repository, that allows researchers to discover and link accessibility datasets.
Potential impact of this project on the lives of people with disabilities
The purpose of this project is to promote inclusive artificial intelligence (AI).
Artificial intelligence is a part of many technologies we use every day – like spell check, intelligent voice assistants like Alexa or Siri, face ID to unlock your phone, and Netflix movie recommendations. Artificial intelligence tools can only learn from the data made available to them.
If the datasets used to build and train these tools do not include the faces, voices, or preferences of some groups of users, the ability of the technology to identify the face, understand the voice, or predict the preferences of new users will be limited. And too often data generated by people with disabilities, older adults, and other groups of potential users are not included for training machine learning models and making sure the technology tools “understand” the full range of possible users.
Datasets generated by people with disabilities are scarce—and even when they exist they can be hard to find.
This project will help make the following outcomes possible:
- Allowing researchers to find and link existing accessibility datasets, so that these datasets can be included in new research and used to train and test AI.
- Thinking about the complex questions around the ethical uses of data and machine learning biases.
- Beginning to address the problems of machine learning bias and ensuring that AI tools are accessible and effective for all users.
On the IncluSet site, you can learn more about the accessibility data surfacing repository, find and download any of the publicly shared datasets, or link your own dataset to the repository.
- Artificial intelligence, or AI: “the theory and development of computer systems able to perform tasks that normally require human intelligence, such as visual perception, speech recognition, decision-making, and translation between languages.” (From Oxford Languages via Google)
- Dataset: “a collection of related sets of information that is composed of separate elements but can be manipulated as a unit by a computer.” For example, “all hospitals must provide a standard data set of each patient’s details.” (From Oxford Languages via Google)
- Machine learning: “a type of artificial intelligence in which computers use huge amounts of data to learn how to do tasks rather than being programmed to do them.
Machine learning makes it possible for computing systems to become smarter as they encounter additional data.” (From Oxford Learner’s Dictionaries)
- Kamikubo, R., Wang, L., Marte, C., Mahmood, A., & Kacorri, H. (2022). Data representativeness in accessibility datasets: A metaanalysis. In Froehlich, J., Shinohara, K., & Ludi, S. (Eds.), ASSETS ’22: The 24th International ACM SIGACCESS Conference on Computers and Accessibility (pp. 1-15, No. 8). New York: ACM. DOI: https://doi.org/10.1145/3517428.3544826
- Read the paper (pdf).
- Read a summary of the paper in this Trace news story: Inclusive AI: Representation of Age, Gender, and Race in Accessibility Datasets
- Kamikubo, R. (2022). Facilitating sharing and re-use of accessibility datasets: Benefits and risks. SIGACCESS Newsletter, 132, Article No. 4.
- Kamikubo, R., Dwivedi, U., & Kacorri, H. (2021). Sharing practices for datasets related to accessibility and aging. In J. Lazar, J. H. Feng, & F. Hwang (Eds.), ASSETS ’21: The 23rd International ACM SIGACCESS Conference on Computers and Accessibility (pp. 1-16, No. 28). New York: ACM. DOI: https://doi.org/10.1145/3441852.3471208 PMCID: PMC8855358
- Kacorri, H., Dwivedi, U., & Kamikubo, R. (2020). Data sharing in wellness, accessibility, and aging. Paper presented at NeurIPS 2020 Workshop on Dataset Curation and Security.
- Read the paper (pdf).
- Kacorri, H., Dwivedi, U., Amancherla, S., Jha, M., & Chanduka, R. (2020). IncluSet: A data surfacing repository for accessibility datasets. In T. Guerreiro, H. Nicolau, & K. Moffatt (Ed.), ASSETS ’20: The 22nd International ACM SIGACCESS Conference on Computers and Accessibility (pp. 1-4, No. 72). New York: ACM. DOI: https://doi.org/10.1145/3373625.3418026 PMCID: PMC8375514
IncluSet is a data surfacing repository enabling researchers and the disability community to discover and link accessibility datasets. Inside IncluSet you can find more than 190 existing accessibility datasets. Help the research and disability communities by linking your accessibility project/dataset to IncluSet.