The paradox of IVM is that the concept has been around for a long time; there are hundreds of papers on this topic; hardly anyone would disagree that it's a useful feature for a database, yet no modern database has a satisfactory implementation.
> DBSP does make some tradeoffs when compared to differential dataflow. It simplifies the programming model by constraining how time and state management occur. This simplification limits some of the concurrency gains we see in timely and differential dataflow.
FWIW there are no fundamental big differences between dbsp and dd in terms of concurrency. Both models can concurrently process data on many threads/machines and both do it in similar ways (sharding things).
DD supports lattices that allow it to compute at multiple points in time simultaneously. As I understand it, DBSP limits time to one diff at a time. Lalith can correct me if I’m off base on this. :)
I'd say the difference is in the type of transaction isolation guarantees each system provides. DBSP can process multiple diffs in parallel, and when it's done it outputs a single diff that captures the effects of all the input diffs. DD can additionally attribute each output diff to a specific input diff by assigning each input diff and matching output diff a logical timestamp. This has a cost in terms of complexity and runtime overhead, but it allows strong isolation of concurrent transactions.
The paradox of IVM is that the concept has been around for a long time; there are hundreds of papers on this topic; hardly anyone would disagree that it's a useful feature for a database, yet no modern database has a satisfactory implementation.
I wonder what the authors mean with
> DBSP does make some tradeoffs when compared to differential dataflow. It simplifies the programming model by constraining how time and state management occur. This simplification limits some of the concurrency gains we see in timely and differential dataflow.
FWIW there are no fundamental big differences between dbsp and dd in terms of concurrency. Both models can concurrently process data on many threads/machines and both do it in similar ways (sharding things).
DD supports lattices that allow it to compute at multiple points in time simultaneously. As I understand it, DBSP limits time to one diff at a time. Lalith can correct me if I’m off base on this. :)
I'd say the difference is in the type of transaction isolation guarantees each system provides. DBSP can process multiple diffs in parallel, and when it's done it outputs a single diff that captures the effects of all the input diffs. DD can additionally attribute each output diff to a specific input diff by assigning each input diff and matching output diff a logical timestamp. This has a cost in terms of complexity and runtime overhead, but it allows strong isolation of concurrent transactions.
But as gz09 said, both DD and DBSP are data-parallel architectures that can evaluate queries concurrently on multiple threads or multiple machines.
(lalith here) -- Whatever ryzyk and gz09 said above. :)