Importing CSV file data into sqlite3

by Mandar Vaze on July 13, 2009
in Code, Hack, Linux, Open Source, tips

The :en:SQLite logo as of 2007-12-15
Image via Wikipedia

I was trying to import the data from CSV file into sqlite3 database. Ideally this should be very simple task, with following the steps given in the sqlite tutorial. It is a matter of calling the sqlite command with separator argument, followed by an import operation, as listed below.

sqlite3 test.db  "create table t1 (t1key INTEGER PRIMARY KEY,data TEXT);"
sqlite3 -separator , test.db ".import some.csv t1"

Except that main attribute of my CSV file was that it could contain single records with embedded comma. I was hoping that sqlite3 would be smart enough to detect that the fields were enclosed within double quotes and then separate by comma. But I soon realized that only a code specifically dealing with CSV would know about this.  As we can see in the example above, the import is a generic code and as a user I listed comma as a separator.

My Data looked something like this :

"1","data1"
"2","data2,data3"

So like any *nix geek would do, I tried providing double quote and comma as a separator. To my surprise it worked very well. I though separator would take only single character, and I had provided two (three?). Anyway, important thing to remember is to escape the single quote with a backslash (I didn’t try it without the backslash, may be that would work too)

So here is the syntax that worked :

sqlite3 -separator \", test.db ".import mydata.csv mytbl"

Update : Turns out SQLite Manager is much better solution after all.  It is an Extension for Firefox and other apps to manage any sqlite database. Not only it took care of above situation, it also handled empty cells as well where the command line failed with following error message :

line 4: expected 3 columns of data but found 2

Data with missing cells : Notice two successive commas :

"1","data1","data2"
"2","data3,data4","data5"
"3",,"data6"
Reblog this post [with Zemanta]

Reading CSV files in IronPython

stylized depiction of a csv text file
Image via Wikipedia

This is in continuation with my previous blog post :

To get IronPython to use Standard Python Modules,  one needs to add the following two lines to C:\IronPython-2.0.1\Lib\site.py :

import sys
sys.path.append(r"C:\Python25\Lib")

While this works for most part, it doesn’t help if you are using Python extensions written in C. More about my specific problems in another post. But there is an open source project IronClad to deal specifically with this issue. In the meantime, you can check the differences between IronPython and CPython

Reading (and writing to) CSV file is critical part of my program, while in stadard python it was as easy as “import csv”, the same thing took some efforts to get it working in IronPython. I got the following error for my import statement

Error on line 7 in csv.py
from functools import reduce

I also tried using ActiveState Python 2.5.2.2 (which I already had from few months ago, didn’t feel like downloading the latest version till the problem was fixed) But that didn’t help either. With ActiveState, I got the same error on the same line, except this time it was for _csv.

To Quote from IronPython Cookbook :

For some reason the Python standard library csv module is written in C, which means that it isn’t available to IronPython.

The cookbook points to a third party library called A Fast Csv Reader . The cookbook has a nice example of how to use the said DLL with your IronPython Program.

It wasn’t clear to me as to why I had to register at Code Project to download this binary since it is provided under MIT Open Source License. But whom am I gonna complain to ? Beggars can’t be choosers :(

Reblog this post [with Zemanta]

Python : Fun with subprocess.stdin

by Mandar Vaze on January 13, 2009
in Code, Python

As part of some automation work,  I needed to execute a program which requires each command to be explictely validated by the user by expliciltely selecting “y” or “n” , on the command line. Since I was processing a large number of files, I decided to use python script.

I used the subprocess module of python to invoke the external program. It was easy to capture the output sent on stdout. In order to send ‘y‘ on stdin for each iteration,  I tried sending ‘y‘ on stdin, but that would not work.  The script would hang. After discussing this, with more experienced python programmer, it was suggested that one possible reason why the script hangs is may be because the stdin buffer wasn’t flushed.  Both of us were not sure how to do that. Then it was discussed that when we run the external program from command line, we not only type ‘y‘ as response, but we also hit Enter there after, which results in flushing the stdin buffer. So may be that is what I ought to try.

To my surprise, it worked. So the solution was to pass ‘y/n‘ instead of single ‘y

Here is how my code looked like :

import subprocess
cmd = "...." # The command you wish to execute
proc = subprocess.Popen(cmd,
                       stdin=subprocess.PIPE,
                       stdout=subprocess.PIPE)
print proc.communicate('Y\n')[0]

Next Page »