PseuDB

Get the software: pseudb-0.1.tar.gz. Requires Python 2.2, due to lambda scoping trickery.


PseuDB grew out of my fond memories of the Perl Sprite library. That provides a very simple subset of SQL using a flat-file as the data store. So, you could easily edit the data file by hand if you want.

I spent some time thinking about how I would approach writing this in Python. A direct translation of the code didn't appeal to me. My first firm design decision was to keep the actual database functionality well separate from the parsing of the SQL. This turns out to have had some interesting consequences. In fact, I've never gotten around to writing the SQL part, the library turned out to be so useful on its own. I might eventually write the SQL part, though, the next time I have a lot of free time.

Very often we keep data in arrays of classes, or sometimes arrays of dictionaries. Not infrequently I want to say things like "give me every member of this array for which the attribute 'spam' is 'true'." Of course, you can roll your own loops for this sort of thing, but as the requirements get more sophisiticated, this gets trickier. PseuDB makes this all easier. Here is an example from the test code:

# Return every element of the array 'd2' for which the
# 'age' attribute is greater than 22 and the 'rank'
# attribute is equal to 'Moron'.

GenXMorons = pdb_select(d2, ('age', PDB_GT, 22),
                            ('rank', PDB_EQ, 'Moron'))

This sort of selection can currently be done on lists of class instances, dictionaries and lists.

The Comparison Tuple

For all the PseuDB functions which select particular data (_select, _delete, _update) can take as many requirement clauses as necessary. All of these are logically ANDed together. (See pdb_merge for logical OR.)

The comparison tuple itself will have three elements. The first element is the name of the field you're testing, which will be an index for lists, a key for dictionaries and a data attribute for class instances:

(field, operator, value)

The operator is one of:

All the operators will work according to Python's type rules, so you can compare strings or numbers, etc., as you see fit.

The PDB_MATCH operator actually uses the re module's search operation. The PDB_IN acts like the Python in operator.

Comparison Function

In addition to the comparison tuple, you can pass a function in as an argument. The function will only take one argument, the data item to be tested. So, here's a quick example that simply mimics the comparison tuple ('SN', PDB_EQ, '512-333-443'):

  pdb_select(data, lambda x: x.SN == '512-333-443')

Of course, if you need to do something much trickier than that, you should probably write a function and pass that in rather than a lambda. Also, this example is a little silly, since you could get the same results with the built in filter. However, having pdb_select work with functions this way is convenient for more complex selections.

The Helper Library Interface

The PseuDB library offers these database-like calls:

pdb_select(data, where-clause*)
This will return a list of all the elements in the list data that match the where-clause requirements.
pdb_delete(data, where-clause*)
This will return a list of all elements which do not meet the where-clause requirements. Note that this does not make any changes to the list data itself.
pdb_update(data, field, nvalue, where-clause*)
This finds all elements in the data which match the where-clauses, and then updates their field properties to the new value nvalue. So, unlike pdb_delete, this does change your original data list. This also does not check that the field you're updating already exists. So, you can do what is effectively an 'ALTER TABLE' if you're not careful.
pdb_merge(data*)
This merges data lists, avoiding duplicated records.
pdb_sort(data, by)
Sorts the data on the field given in 'by'. This changes the list, since it uses the built in Python list sort method, which sorts in place.

Note that since the data these functions operates on is just a list there is no special insert function. Just add to the list using the append method.

Since all the where clauses of all these functions are logically ANDed together, you have no immediately obvious way to OR where clauses. So there is another function, pdb_merge which will merge together the results of as many pdb_select calls as you need. The result is effectively an OR operation.

  # select * from d2 where age > 22 or age == 11
  new_d2 = pdb_merge(pdb_select(d2, ('age', PDB_GT, 22)),
                     pdb_select(d2, ('age', PDB_EQ, 11)))

The pdb_merge function will make sure there are no duplicate records if several of your selects have overlapping data.

Examples

A lot of these examples are either exact copies or slightly modified versions of the test code in test.py that comes with the PseuDB package.

I'll start off with a simple awk-like example. If for some reason I want to know everyone who is using the Korn shell on our systems, listed in alphabetical order, I could do this:

from pseudb import *
from pprint import pprint
import pwd

d = pwd.getpwall()
ksh_users = pdb_select(d, (6, PDB_MATCH, 'ksh'))
#                          ^^ the sixth column is the shell field
pprint(pdb_sort(ksh, 0))
#                    ^^ sort on the first field, the login name

Using just lists of lists as a data structure is a little unfriendly, though. Keeping track of what each index means is error prone. So, let's try something a little different, assuming a small database of people:

d1 = [{'name': 'Fred', 'rank': 'Moron', 'SN': '112-333-443', 'age': 33},
      {'name': 'Bob', 'rank': 'Twit', 'SN': '212-333-443', 'age': 23},
      {'name': 'Jane', 'rank': 'Twit', 'SN': '312-333-443', 'age': 11},
      {'name': 'Toad', 'rank': 'Moron', 'SN': '412-333-443', 'age': 23},
      {'name': 'William', 'rank': 'Loony', 'SN': '421-333-443', 'age': 32},
      {'name': 'Fred', 'rank': 'Sage', 'SN': '512-333-443', 'age': 25},
      {'name': 'Fred', 'rank': 'Moron', 'SN': '612-333-443', 'age': 32},
      {'name': 'Goober', 'rank': 'Sage', 'SN': '712-333-443', 'age': 16}
    ]

In this case, with named fields, the resulting queries are a little easier to understand:

# Prints all the people who have the rank of Moron.
pprint(pdb_select(d1, ('rank', PDB_EQ, 'Moron')))

# Get everyone who is younger than 22 and whose rank is Twit.
pprint(pdb_select(d1, ('age', PDB_LT, 22),
                      ('rank', PDB_EQ, 'Twit')))

# Forces a rank change on everyone older than 22.
pdb_update(d1, 'rank', 'MotleyFool', ('age', PDB_GT, 22))

These two examples are using hash tables as records. If instead you had a Person class with all these fields as atrributes, the code would be exactly the same. The PseuDB library figures out the type of data you're working on. The library will get horribly confused, though, if you mix types in a single list.

In the functions that take where clauses, all those clauses are logically ANDed together. If you want to use an OR relation, you need do several selects and merge the results. It will avoid duplicates:

# These give the same answers:
pprint(pdb_merge(pdb_select(d2, ('age', PDB_GT, 22)),
                 pdb_select(d2, ('age', PDB_EQ, 11)) ))

pprint(pdb_merge(pdb_select(d2, ('age', PDB_GT, 22)),
                 pdb_select(d2, ('age', PDB_GT, 22)),
                 pdb_select(d2, ('age', PDB_EQ, 11)) ))

Plans

I may eventually use this library to write something like Perl's Sprite library. For now, this simple interface works for me.

I have no plans ever to make this do things like a join. If you really need the full power of a relational database, use a relational database. There are several good free ones, even one written entirely in Python, Gadfly.