Georgia Tech, Tufts University, and Wisconsin Researchers Awarded $2.7M to Make Data Science More Accessible
Researchers at the Georgia Institute of Technology, Tufts University, and University of Wisconsin will develop new techniques to make machine learning in data science more accessible to non-data scientists under a $2.7 million grant from the Defense Advanced Research Projects Agency (DARPA) Data-Driven Discovery of Models (D3M) program.
Over the years, advances in machine learning have resulted in more complex, and more powerful, applications in information visualization. As a consequence, machine learning techniques to achieve specific insights from data have also gotten more complicated. Most require data science degrees or some formal data science training in order to use the tools that are being built.
Thus, the gap between subject matter experts – international politics majors, historians, biology experts, or climatologists, for example – and the complexity of the machine learning tools used to contextualize data will continue to grow.
“Often, these experts have a wealth of knowledge about things like international affairs or cybersecurity, but they don’t have a wealth of knowledge of what it means to use machine learning model X, Y, or Z,” said Alex Endert, an assistant professor in the School of Interactive Computing at Georgia Tech, one of the four collaborators on the project.
Currently, tools to adjust parameters on the data consist of buttons, control panels, dropdown menus and sliders, knobs and fields to adjust values, direct manipulations to define a machine learning model and letting it achieve the desired data.
This is less intuitive for non-data scientists, so the aim for the researchers is to move the user interaction into the visual space. Users could adjust the data within a scatter plot, for example, by zooming or panning, coloring items or generally demonstrating areas of interest inside the data. Then, they could infer how those parameters should change as a result of the exploration of the data.
“If we are successful, we have the chance to bring data analysis to the public,” said primary investigator Remco Chang, an associate professor in the Tufts University Department of Computer Science. “But to get there, we will need to allow the end users to be able to intuitively ask questions about their data that can be formalized and executed in machine learning. We need to allow the user to make sense of the complex results from machine learning and help contextualize the results in the user’s domain.”
The grant, which took effect earlier this year, will fund four years of research. Other participants are Georgia Tech School of Interactive Computing Professor John Stasko, and University of Wisconsin Department of Computer Science Professor Michael Gleicher.
DARPA’s D3M program aims to develop automated model discovery systems that enable users with subject matter expertise but no data science background to create empirical models of real, complex processes. Automated model discovery systems developed by the D3M program will be tested on real-world problems that will progressively get harder during the course of the program. Toward the end of the program, D3M will target problems that are both unsolved and underspecified in terms of data and instances of outcomes available for modeling.