Add CIFF export command
This MR adds a ciff_export
command. There are two ways it can read the DuckDB data: either by transforming it to native Python objects (which is slow), or by using Arrow as intermediate representation (which requires protarrow).
I've also fixed an issue with the ciff_import
command. The previous version of the query to expand the postings
table into the terms
table had a maximum tf
of 200, which meant information would got lost if an imported CIFF had postings with higher tfs. The new version uses range(tf)
to generate the right number of rows for each posting.