The South Big Data Hub and Georgia Tech bring missing voices to the conversation on data science education

This month, participants from universities across the nation, community colleges, tribal colleges, minority-serving institutions, nonprofits, and industry joined forces with the South Big Data Hub and Georgia Tech to confront the challenges of building data science capacity through traditional and alternative educational practices. Organized by Dr. Renata Rawlings-Goss, co-executive director of the South Big Data Hub, the two-day workshop, sponsored by multiple directorates within the National Science Foundation, brought together a diverse mix of participants to navigate the complex issues of reforming data science education to prepare for the data-driven workforce of the future.

“An entirely new type of workforce is needed for the 21-century. One that will require data-enabled talent for jobs across industries and government, as well as for future scientific discovery,” said Rawlings-Goss “That is why we are partnering with sponsors and people spanning many disciplines and roles to make sure that the discussion of data science remains as broad as it needs to be. To achieve the talent pool needed for continued U.S. growth and competitiveness in this new data economy, we must break open and structure education with access for all. Programs must include minorities, reach low-income future students, and pair different institution types like two-year teaching institutions with universities.”

Rawlings-Goss also heads the education working group within the Big Data Hub consisting of a nationwide network that tackles tough problems like these. And she wants everyone to join the conversation. “Data science is something that can be brought into nearly every discipline; the idea that data science resides in only a few traditional math and computer science disciplines is part of the challenge we are trying to overcome in education,” she said.

National Science Foundation Program Manager Stephanie August, who attended the event said, “NSF is involved because we want to get at the missing voices. We need these participants who are essential but mostly absent and unheard in the movement to develop data science programs.”

Many research universities are developing comprehensive data science programs at the undergraduate and graduate levels. As the movement to develop data science programs grows, a gap is forming that separates research institutions from primarily undergraduate-focused institutions, including community colleges and several minority-serving institutions.

Specific issues discussed included access to data, critical thinking, designing curriculum and assessment, data literacy, diversity, ethics, resources and staffing, building collaborations, and the pipeline to higher education from K-12. Participants also addressed training data science practitioners, and translational data science, or the application of data science principles, techniques, and technologies to scientific problems impacting human or societal welfare.

Recent education-enabling projects were showcased at the event. Aleksandr Blekh of Georgia Tech’s College of Engineering introduced participants to his work with JupyterHub, building executable textbooks in support of data science education. The South Hub recently partnered with U.C. Berkeley and completed an installation and demo that included 60 faculty members and deans from across the country interested in using the tool to expand their data science capacity.

The workshop agenda moved from awareness of existing programs, practice and challenges, specific topics and stakeholders, to actions needed to create the vision, curricula, programs, and opportunities of the future. It included discussion of what is important for the future of technology and society in the United States, and improving access to essential and economically viable jobs, especially for minorities and low-income students and workers.

Tasha Inniss, director of education and industry outreach at the nonprofit professional society INFORMS said, “We have all gone to conferences that attempt to discuss these issues and policy without having the full representation of the necessary stakeholders. This workshop, fortunately, has been different, because it includes participation of faculty from minority-serving institutions and community colleges as well as representatives from industry and non-profit organizations. Since INFORMS has members who are analytics and operations research professionals and students, I am particularly interested because it fits nicely with our analytics education goals.

Mary Rudis, a mathematics instructor from Bates College agreed. “This is the first event I have been to that is constructed from the ground up with the right mix, capable of producing a real outcome. The blend of participants in terms of diversity, institutional types, roles, and expertise reflects the very nature of data science. It is multidisciplinary, transdisciplinary, with complex issues that require input from many points of view.”

Rawlings-Goss, Iniss, Rudis, and more than a dozen other authors will continue to work together after the event to develop a report including plans and schedules to convert their vision to reality. Findings will be presented at the National Academies of Science in Washington DC in early December.

This workshop, “Negotiating the Digital and Data Divide,” is part of the “Keeping Data Science Broad” series created by Rawlings-Goss. Other activities include webinars and presentations to garner community input into pathways for keeping data science as a discipline broadly inclusive. The growing community seeks input from data science programs in any region across the nation, either traditional or alternative, and from a range of institution types. Two webinars leading up to the workshop explored the future of data science education and workforce at institutions of higher learning that are primarily teaching-focused. A webinar after this workshop will be announced to report on its outcomes and next steps.

The Keeping Data Science Broad series is co-sponsored by the National Science Foundation’s Directorates of CISE, MPS, EHR, and SBE, with participation from the National Academies of Science. It is also sponsored by the South Big Data Innovation Hub, and Georgia Tech’s Institute for Data Engineering and Science.

Visit this link to share your thoughts on the future of data science education with the South Big Data Hub.

Resources:

Previous events in the Keeping Data Science Broad series include:

“Data Science Education in Traditional Contexts,” recorded August 31, 2017, highlighted universities, teaching institutions, community colleges, and minority-serving institutions that have implemented data science education undergraduate programs as case studies for workshop participants to consider and compare to their own contexts.

Alternative Avenues for Development of Data Science Education Capacity, recorded September 22, 2017, explored efforts that build data science education capacity outside of the context of tradition curricular program development. Examples include integration of data science into courses and curricula outside of the traditional computer science, math, or statistics context (i.e., arts and humanities), the expansion of capacity by integrating third party or shared resources (i.e., MOOCs and open source educational resources) into curricula, and additional educational options outside of traditional courses (i.e., faculty training, “Data Science for Social Good” programs, and bootcamps).

View recordings of these events on the South Hub website.

Engage in upcoming 2018 South Big Data Hub activities. Join the Education Working Group or participate in a South Big Data Hub Data Carpentry Event. Increase your impact by attending a “train the trainer” session, or learn about the Hub’s recently completed installation and demo of the JupyterHub executable textbooks in support of data science education. Contact Renata Rawlings-Goss of the South Big Data Hub at rrawlings.goss@gatech.edu.

The Institute for Data Engineering and Science at Georgia Tech unifies data science researchers and resources spanning all disciplines throughout Georgia Tech to take on grand challenges in data science. It strategically builds collaborations and supporting resources to stimulate foundational research in areas such as machine learning, high-performance computing, and algorithms and optimization. It identifies and unites researchers to pursue collaborative and ambitious funding opportunities, to drive research, and to evolve and promote data science education. IDEaS provides an accessible and stable means of navigating the vast landscape of data science research and opportunities internally, and externally as it connects to industry and other partners.

The South Big Data Hub is part of a network of four regional Big Data Hubs, launched by the National Science Foundation and funded in part by host universities and other partners. Managed jointly by co-executive directors at the Georgia Institute of Technology and the University of North Carolina at Chapel Hill, the South Hub serves 16 states and the District of Columbia—from Texas to Delaware—with more than 800 members from universities, corporations, foundations, and cities committing their support. The Hubs accelerate partnerships, grow R&D communities, facilitate resource and data sharing, and build data science capacity for education and workforce development.

Media

Group photo of participants of the Negotiating the Digital and Data Divide Workshop