Supporting User-Centered Analytical Interfaces at Scale

Talk
Dixin Tang
Talk Series: 
Time: 
03.01.2023 11:00 to 12:00

Many data analytical tools are built for the general public to easily make sense of data and get insights, such as spreadsheets, visual analytical tools, and many Python data analysis libraries. These tools are widely adopted by people with no or limited programming experience. Their popularity is mainly attributed to their intuitive and easy-to-use interfaces, referred to as user-centered analytical interfaces. Unfortunately, in face of a large dataset, the modern data analytical stack that supports these interfaces suffers from significant problems with interactivity, scalability, and resource utilization.In this talk, I will present my research on transforming the modern data analytical stack to efficiently support user-centered analytical interfaces at scale. I will focus on the two projects that address the interactivity and scalability problems. First, I will present transactional panorama, a formal framework that enables end-users to consume the results in progressively updating visualizations with desirable properties (e.g., coherence) and performance preserved. Transactional panorama extends database transactions to model the user’s interaction with progressively updating visualizations and opens a research direction that brings transactions into end-user analytics. After, I will discuss the decomposition rules for parallelizing the execution of pandas, a popular Python data analysis library. These decomposition rules are adapted from traditional parallel execution techniques to consider the new data model, API, and access patterns in pandas. Finally, I will discuss future projects that bring large-scale data analysis to the masses.