Changelog

0.9 (2019-01-16)

Support for loading CSVs directly from URLs, thanks @betatim - #38
New -pk/--primary-key options, closes #22
Create FTS index for extracted column values
Added --no-fulltext-fks option, closes #32
Now using black for code formatting
Bumped versions of dependencies

0.8.1 (2018-04-24)

Updated README and CHANGELOG, tweaked --help output

0.8 (2018-04-24)

-d and -df options for specifying date/datetime columns, closes #33
Maintain lookup tables in SQLite, refs #17
--index option to specify which columns to index, closes #24
Test confirming --shape and --filename-column and -c work together #25
Use usecols when loading CSV if shape specified
--filename-column is now compatible with --shape, closes #10
--no-index-fks option

By default, csvs-to-sqlite creates an index for every foreign key column that is added using the --extract-column option.

For large tables, this can dramatically increase the size of the resulting database file on disk. The new --no-index-fks option allows you to disable this feature to save on file size.

Refs #24 which will allow you to explicitly list which columns SHOULD have an index created.
Added --filename-column option, refs #10
Fixes for Python 2, refs #25
Implemented new --shape option - refs #25
--table option for specifying table to write to, refs #10
Updated README to cover --skip-errors, refs #20
Add --skip-errors option (#20) [Jani Monoses]
Less verbosity (#19) [Jani Monoses]

Only log extract_columns info when that option is passed.
Add option for field quoting behaviour (#15) [Jani Monoses]

0.7 (2017-11-25)

Add -s option to specify input field separator (#13) [Jani Monoses]

0.6.1 (2017-11-24)

-f and -c now work for single table multiple columns.

Fixes #12

0.6 (2017-11-24)

--fts and --extract-column now cooperate.

If you extract a column and then specify that same column in the --fts list, csvs-to-sqlite now uses the original value of that column in the index.

Example using CSV from https://data.sfgov.org/City-Infrastructure/Street-Tree-List/tkzw-k3nq
```
csvs-to-sqlite Street_Tree_List.csv trees-fts.db \
    -c qLegalStatus -c qSpecies -c qSiteInfo \
    -c PlantType -c qCaretaker -c qCareAssistant \
    -f qLegalStatus -f qSpecies -f qAddress \
    -f qSiteInfo -f PlantType -f qCaretaker \
    -f qCareAssistant -f PermitNotes
```
Closes #9
Added --fts option for setting up SQLite full-text search.

The --fts option will create a corresponding SQLite FTS virtual table, using the best available version of the FTS module.

https://sqlite.org/fts5.html https://www.sqlite.org/fts3.html

Usage:
```
csvs-to-sqlite my-csv.csv output.db -f column1 -f column2
```
Example generated with this option: https://sf-trees-search.now.sh/

Example search: https://sf-trees-search.now.sh/sf-trees-search-a899b92?sql=select+*+from+Street_Tree_List+where+rowid+in+%28select+rowid+from+Street_Tree_List_fts+where+Street_Tree_List_fts+match+%27grove+london+dpw%27%29%0D%0A

Will be used in simonw/datasette#131
Handle column names with spaces in them.
Added csvs-to-sqlite --version option.

Using http://click.pocoo.org/5/api/#click.version_option

0.5 (2017-11-19)

Release 0.5.
Foreign key extraction for mix of integer and NaN now works.

Similar issue to a8ab5248f4a - when we extracted a column that included a mixture of both integers and NaNs things went a bit weird.
Added test for column extraction.
Fixed bug with accidentally hard-coded column.

0.4 (2017-11-19)

Release 0.4.
Automatically deploy tags as PyPI releases.

https://docs.travis-ci.com/user/deployment/pypi/
Fixed tests for Python 2.
Ensure columns of ints + NaNs map to SQLite INTEGER.

Pandas does a good job of figuring out which SQLite column types should be used for a DataFrame - with one exception: due to a limitation of NumPy it treats columns containing a mixture of integers and NaN (blank values) as being of type float64, which means they end up as REAL columns in SQLite.

http://pandas.pydata.org/pandas-docs/stable/gotchas.html#support-for-integer-na

To fix this, we now check to see if a float64 column actually consists solely of NaN and integer-valued floats (checked using v.is_integer() in Python). If that is the case, we over-ride the column type to be INTEGER instead.
Use miniconda to speed up Travis CI builds (#8)

Using Travis CI configuration code copied from https://github.com/EducationalTestingService/skll/blob/87b071743ba7cf0b1063c7265005d43b172b5d91/.travis.yml

Which is itself an updated version of the pattern described in http://dan-blanchard.roughdraft.io/7045057-quicker-travis-builds-that-rely-on-numpy-and-scipy-using-miniconda

I had to switch to running pytest directly, because python setup.py test was still trying to install a pandas package that involved compiling everything from scratch (which is why Travis CI builds were taking around 15 minutes).
Don't include an index column - rely on SQLite rowid instead.

0.3 (2017-11-17)

Added --extract-column to README.

Also updated the --help output and added a Travis CI badge.
Configure Travis CI.

Also made it so python setup.py test runs the tests.

Mechanism for converting columns into separate tables.

Let's say you have a CSV file that looks like this:

county,precinct,office,district,party,candidate,votes
Clark,1,President,,REP,John R. Kasich,5
Clark,2,President,,REP,John R. Kasich,0
Clark,3,President,,REP,John R. Kasich,7

(Real example from https://github.com/openelections/openelections-data-sd/blob/master/2016/20160607__sd__primary__clark__precinct.csv )

You can now convert selected columns into separate lookup tables using the new --extract-column option (shortname: -c) - for example:

csvs-to-sqlite openelections-data-*/*.csv \
    -c county:County:name \
    -c precinct:Precinct:name \
    -c office -c district -c party -c candidate \
    openelections.db

The format is as follows:

column_name:optional_table_name:optional_table_value_column_name

If you just specify the column name e.g. -c office, the following table will be created:

CREATE TABLE "party" (
    "id" INTEGER PRIMARY KEY,
    "value" TEXT
);

If you specify all three options, e.g. -c precinct:Precinct:name the table will look like this:

CREATE TABLE "Precinct" (
    "id" INTEGER PRIMARY KEY,
    "name" TEXT
);

The original tables will be created like this:

CREATE TABLE "ca__primary__san_francisco__precinct" (
    "county" INTEGER,
    "precinct" INTEGER,
    "office" INTEGER,
    "district" INTEGER,
    "party" INTEGER,
    "candidate" INTEGER,
    "votes" INTEGER,
    FOREIGN KEY (county) REFERENCES County(id),
    FOREIGN KEY (party) REFERENCES party(id),
    FOREIGN KEY (precinct) REFERENCES Precinct(id),
    FOREIGN KEY (office) REFERENCES office(id),
    FOREIGN KEY (candidate) REFERENCES candidate(id)
);

They will be populated with IDs that reference the new derived tables.

Closes #2

Can now add new tables to existing database.

And the new --replace-tables option allows you to tell it to replace existing tables rather than quitting with an error.

Closes #1
Fixed compatibility with Python 3.
Badge links to PyPI.
Create LICENSE.
Create README.md.
Initial release.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CHANGELOG.md

CHANGELOG.md

Changelog

0.9 (2019-01-16)

0.8.1 (2018-04-24)

0.8 (2018-04-24)

0.7 (2017-11-25)

0.6.1 (2017-11-24)

0.6 (2017-11-24)

0.5 (2017-11-19)

0.4 (2017-11-19)

0.3 (2017-11-17)

Files

CHANGELOG.md

Latest commit

History

CHANGELOG.md

File metadata and controls

Changelog

0.9 (2019-01-16)

0.8.1 (2018-04-24)

0.8 (2018-04-24)

0.7 (2017-11-25)

0.6.1 (2017-11-24)

0.6 (2017-11-24)

0.5 (2017-11-19)

0.4 (2017-11-19)

0.3 (2017-11-17)