The role of data, particularly curated data sets, has become critical for supporting research activities in the information field. As many information academic programs are now offering courses, certificates, and degrees in data science, data sets are increasingly important in pedagogical contexts as well. Systematic collection and curation of data sets is particularly complex in biomedical and health domains due to the associated privacy concerns. At the University of Toronto, a project is underway for collecting, managing, and provisioning data sets with the primary purpose of supporting scholarship and technology development activities. The project’s current domain of focus is healthcare data and hence it is called Health Data Exchange (HDX). However, the principles behind development of the HDX architecture and the critical functions HDX supports are likely to be useful in many other domains where long-term and reliable access to curated data sets is essential. In this talk, the three main layers of HDX, namely data, software, and community will be presented and described. The automated processes used to support efficient backend operations, such as curation of data sets by leveraging state-of-the-art topic discovery and labelling methods will be also demonstrated. In addition, the search and discovery services that undergird the research and social functions of HDX will also be presented. Finally, to provide a long-term vision and framing of the project, the relationship of the primary HDX functions to FAIR principles, namely the findability, accessibility, interoperability, and reusability dimensions of data use, will be demonstrated.
Professor Javed Mostafa is a Professor and the Dean of the Faculty of Information at the University of Toronto. His research focuses on multimedia information retrieval, personalization and user modeling as well as cyberinfrastructure for research and learning. Professor Mostafa came to University of Toronto in September 2023 from the University of North Carolina at Chapel Hill where he served as a Professor and the founding Director of an interdisciplinary informatics training program called the Carolina Health Informatics Program (CHIP) that oversaw collaboration among seven UNC academic units. Previously, at Indiana University, Bloomington, Professor Mostafa held the Victor H. Yngve Endowed Professorship and he also served as an Associate Dean of Academics and an Associate Dean of Research. Professor Mostafa completed his PhD in information science at the University of Texas at Austin in 1994, with a focus on developing information query models and search interfaces for video information. With more than 105 peer-reviewed publications of his own, Mostafa has served in editorial roles for several prestigious journals in the field. He was the editor-in-chief of the Journal of the Association for Information Science and Technology, an associate editor for the journal ACM Transactions on Information Systems and currently serves as an associate editor for the journal ACM Transactions on Internet Technology. At University of Toronto he directs the Laboratory of Applied Informatics Research (https://lairhub.com/). He is also the co-founder of two U.S.-based companies: KeonaHealth and Cymantix.