Friday Afternoon Seminar

This is a (virtual) seminar on topics related to automata and logic happening every two months. You can subscribe to the mailing list here.
Here are the Google Calendar and the iCal link.

The next seminar is on June 3rd at 14:00 (Warsaw time zone) given by Antoine Amarilli.

Zoom link
Meeting ID: 869 5026 6380 Passcode: 402904

The (in)tractability of query evaluation over probabilistic data

Program (Warsaw time zone)

14:00 — 14:50Part 1
14:50 — 15:10— Discussion and break —
15:10 — 16:00Part 2
16:00 — 16:20— Discussion and break —
16:20 — 17:15Part 3
17:15 — 17:30— Discussion —


A central problem in database theory is query evaluation (aka model checking): what is the complexity of evaluating a fixed query on an input relational structure? We focus on the probabilistic version of this problem: the input facts are annotated with independent probabilities, and we wish to compute the total probability of the outcomes where the query holds. Equivalently, this is the problem of counting how many substructures satisfy the query, weighted by their probability.

The probabilistic query evaluation problem is in #P, and is known to be #P-hard for many queries, but there are also tractable cases – admittedly quite rare. To understand them, the goal is to show dichotomy results: characterize the queries for which the problem is in PTIME, and prove hardness for all other queries. Another important angle of attack is to characterize when the probability can be computed via so-called tractable lineage expressions. These are Boolean circuits that concisely represent the set of substructures that satisfy the query, expressed in tractable formalisms from knowledge compilation (e.g., d-DNNFs).

This talk will present our current understanding of the complexity of probabilistic query evaluation. We will first focus on queries of low expressiveness, namely, self-join-free conjunctive queries. In this case, the tractable queries have a syntactic characterization (i.e., they are hierarchical), and their lineages can be represented in tractable circuit formalisms. Further, the same tractability boundary holds in the unweighted case where all probabilities are 0 or 1/2, i.e., we count how many substructures satisfy the query.

We will then extend to unions of conjunctive queries (aka positive existential first-order logic). These queries also admit a dichotomy between tractable and intractable queries, but it is far more complex. Further, the dichotomy does not extend yet to the unweighted case, and the connection with circuits is more complicated and not definitive. In particular, a central open problem is whether we can express the lineage of tractable queries in an expressive but tractable circuit formalism (namely, d-Ds).

We will last present the case of more expressive queries. Specifically, on graph signatures, a dichotomy is known on all homomorphism-closed queries: the only tractable queries are the ones equivalent to a tractable union of conjunctive queries. We will also present directions for future research, e.g., finer dichotomies on queries when we also take into account structural parameters of input instances (such as treewidth).


Wojciech Czerwiński
Filip Mazowiecki