Skip to content

Add CIFF export command

Gijs Hendriksen requested to merge ciff-export into main

This MR adds a ciff_export command. There are two ways it can read the DuckDB data: either by transforming it to native Python objects (which is slow), or by using Arrow as intermediate representation (which requires protarrow).

I've also fixed an issue with the ciff_import command. The previous version of the query to expand the postings table into the terms table had a maximum tf of 200, which meant information would got lost if an imported CIFF had postings with higher tfs. The new version uses range(tf) to generate the right number of rows for each posting.

Merge request reports

Loading