Using CPA to take CSV files to SQLite database


#1

To try an circumvent needing to set up a dedicated MySQL server, I’m exploring the various routes of taking CSV data and creating a database after the CP run.

From the CPA documentation, it looks like there is a mode using CPA to follow the SQL_SETUP.sql script generated by CP (ExportoDatabase SQL/CSV) to write an SQLite database in my home directory. This seems like a good solution.

However, when I follow the directions specified in the documentation:

db_type = sqlite
db_sql_file = path and filename of .SQL file from ExportToDatabase

I get the following error message:
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: Query was: "CREATE TABLE IF NOT EXISTS Experiment (
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: experiment_id integer primary key AUTO_INCREMENT,
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: name text);
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: "
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: First exception was: near “AUTO_INCREMENT”: syntax error
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: Second exception was: ERROR: Database query failed for connection "MainThread"
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: Query was: "CREATE TABLE IF NOT EXISTS Experiment (
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: experiment_id integer primary key AUTO_INCREMENT,
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: name text);
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: "
7/27/16 10:31:43.527 AM CellProfiler Analyst[44138]: Exception was: near “AUTO_INCREMENT”: syntax error
7/27/16 10:31:43.566 AM CellProfiler Analyst[44138]: CellProfiler Analyst Error

The relevant block from the SQL file is:

CREATE TABLE IF NOT EXISTS Experiment (
experiment_id integer primary key AUTO_INCREMENT,
name text);

Any ideas how to troubleshoot this?


#2

I don’t recommend this way of generating an SQLite table. I’d just run ExportToDatabase and ask it to generate just an SQLite- if you need CSVs in the future or for another purpose you can easily export them from the database.


#3

Ok, fair enough. :slight_smile:

However, we’re encountering the limitations of the SQLite format- namely the concurrent writing to the database when running CP on a cluster and I’m trying to find a solution.

The feedback from CP/CPA and our knowledge of SQL is making this tough to troubleshoot. None of us here are dedicated MySQL people but we’re trying the following:

  1. Direct writing to an online SQL server (ExportToDatabase module error)

  2. writing to CSV and offline creation of a SQL database (MySQL database: failed to connect to database)

  3. anything else (for instance, the strategy described here)

And as an aside, if this feature is not supported, perhaps it should be removed from the documentation / code?

Thanks,
Brian


#4

It’s supported in general, but it’s not recommended by me. :slight_smile:

I definitely sympathize as I’m not a SQL person either and have been dealing with a bunch of SQL issues recently. FWIW, depending on your number of output CSVs and your comfort level with scripting in some language, taking the CSV files and either a) importing a few of them into a database with something like DB Browser for SQLite or b) using a script to grab a large number of them and combine them into a couple of tables are reasonably trivial. Unless your experiment is enormously complex, usually just importing the _Image and _Object tables (and relationship tables if necessary) is sufficient to get going in CPA without bothering with the Experiment tables (which can be cumbersome).


#5

Hmmm, that’s an interesting suggestion. Do you have an example script of approach B? We might be able to make that work.

Thanks!
-B


#6


I do it using the panda method described there.


#7

This is something I’ve been wrestling with recently for the same reasons of knowing zero SQL. I’ve made some fairly generic python scripts here that might be useful.