Commandline tool to split csv
This small tool spawn off from our need during the Nigeria MDGs Info System data mopup process, we needed to process millions of lines of csv file with a constraint of memory, and a good way to go was to split the csv based on one column and have each be processed separately in R.
We used streams to pick up one line at a time and dump the result to the output directory.
To use, do
csv-split -i [file_name] -b [group_by_column] [-o [output_directory]]
if output_directory
is not specified, it will default to [file_name]_by_[group]
you can also pipe from stdin:
cat data.csv | csv-split -b [group_by_column]
Usage: csv-split [options]
Options:
-h, --help output usage information
-V, --version output the version number
-i, --input <file> select an input csv
-o, --output <directory> select an output directory
-b, --groupby <group> the column you want to group by