I’m working on a very lightweight ETL program. I’d like to add CSV support via ODBC. That’s no problem on Windows, but I’m trying to figure out if this can be done on Linux and Unix systems without having to resort to commercial drivers. I’ve used unixODBC and iODBC with my program and both seem to work. I read about a text driver at the unixODBC site called SQI which sounds like what I’d need. However, I could not find the code for it in the current version of unixODBC. Going back to a version that’s over 20 years old, I did see source code in unixODBC for SQI. It appears to be LGPL licensed. Does anyone know anything further about SQI or its history? I’m wondering if it would be possible to get it working with the latest version of unixODBC. Would also like to find out if there are other Open Source ODBC drivers for Linux and Unix that provide support for reading and writing CSV text files. Thanks.
There does seem to be a current package called unixodbc.
It is mentioned here
and here
It says it is open source.
Does not say anything about reading CSV files… They are easy enough to read… you could write that bit yourself. It is dealing with SQL that is more demanding.
All that I remember about SQL is from the days when it was standalone database software and you entered the SQL commands straight from a terminal.
Today there are various languages that will interface to an SQL database… for example R can read a CSV file and put its contents in an SQL database… so what you want exists in the open source world.
It might be hard to drag out the bits you need.
Reading or writing to a CSV file isn’t too difficult of a task. However, with ODBC, you have to write a driver which is much more complicated than a simple CSV library. The unixODBC project is here: https://www.unixodbc.org/ The iODBC project is here: https://www.iodbc.org/dataspace/doc/iodbc/wiki/iodbcWiki/WelcomeVisitors Both are Free and Open Source and provide a way to make the ODBC standard work on systems other than Windows. The main advantage I see to using ODBC is that you can write one set of code for accessing any database or format with a ODBC driver. The programs you write aren’t locked into using a specific database and vendor.
I tried to build the older version of unixODBC that still had text support. Needed some patching for changes to C standards, but I was able to get it to build. Unfortunately, I can’t find any documentation on setup to test if it works properly. If it does, I’m thinking of trying to update the driver to work with a later version of unixODBC. I wish I could test properly to determine whether this is worth pursuing or if this is a dead end.
OK you have it running.
You must be able to make your own test just by getting it to read some CSV files. Use small files that you can check against the database by hand.
For ODBC to work right, you have to set up the connections in the odbc.ini and odbcinst.ini files. It’s not easy. I had trouble getting the current version of unixODBC to work with modern database drivers on a current Debian system. It’s even harder dealing with an older version of unixODBC. The csv driver also requires additional set up beyond what a standard driver like Postgres needs.
Would it be easier to convert the CSV files to some other format first?
A FLOSS ODBC solution on Linux would be nice especially since there’s already a way to do it on Windows. One of the advantages of using ODBC rather than a CSV specific library is that you can use the same code for different data formats and databases. You don’t need a library to read each type of database and different calls/code to work with each. As to using a different format rather than CSV, that’s not by choice. We receive data from vendors in that format and other vendors ask us to supply data in that format.
I agree, it is a worthwhile goal.
I cant really help much.