PseuDB
Get the software: pseudb-0.1.tar.gz. Requires Python 2.2, due to
lambdascoping trickery.
PseuDB grew out of my fond memories of the Perl Sprite library. That provides a very simple subset of SQL using a flat-file as the data store. So, you could easily edit the data file by hand if you want.
I spent some time thinking about how I would approach writing this in Python. A direct translation of the code didn't appeal to me. My first firm design decision was to keep the actual database functionality well separate from the parsing of the SQL. This turns out to have had some interesting consequences. In fact, I've never gotten around to writing the SQL part, the library turned out to be so useful on its own. I might eventually write the SQL part, though, the next time I have a lot of free time.
Very often we keep data in arrays of classes, or sometimes arrays of dictionaries. Not infrequently I want to say things like "give me every member of this array for which the attribute 'spam' is 'true'." Of course, you can roll your own loops for this sort of thing, but as the requirements get more sophisiticated, this gets trickier. PseuDB makes this all easier. Here is an example from the test code:
# Return every element of the array 'd2' for which the # 'age' attribute is greater than 22 and the 'rank' # attribute is equal to 'Moron'. GenXMorons = pdb_select(d2, ('age', PDB_GT, 22), ('rank', PDB_EQ, 'Moron'))This sort of selection can currently be done on lists of class instances, dictionaries and lists.
The Comparison Tuple
For all the PseuDB functions which select particular data (
_select,_delete,_update) can take as many requirement clauses as necessary. All of these are logically ANDed together. (Seepdb_mergefor logical OR.)The comparison tuple itself will have three elements. The first element is the name of the field you're testing, which will be an index for lists, a key for dictionaries and a data attribute for class instances:
(field, operator, value)The operator is one of:
PDB_EQequalityPDB_LTless thanPDB_LEless than or equalPDB_GTgreater thanPDB_GEgreater than or equalPDB_NEnot equalPDB_MATCHregular expression matchPDB_NMATCHregular expression doesn't matchPDB_INlist membership testPDB_NINlist non-membership testAll the operators will work according to Python's type rules, so you can compare strings or numbers, etc., as you see fit.
The
PDB_MATCHoperator actually uses theremodule'ssearchoperation. ThePDB_INacts like the Pythoninoperator.Comparison Function
In addition to the comparison tuple, you can pass a function in as an argument. The function will only take one argument, the data item to be tested. So, here's a quick example that simply mimics the comparison tuple
('SN', PDB_EQ, '512-333-443'):pdb_select(data, lambda x: x.SN == '512-333-443')Of course, if you need to do something much trickier than that, you should probably write a function and pass that in rather than a
lambda.Also, this example is a little silly, since you could get the same results with the built infilter. However, havingpdb_selectwork with functions this way is convenient for more complex selections.The Helper Library Interface
The PseuDB library offers these database-like calls:
pdb_select(data, where-clause*)- This will return a list of all the elements in the list
datathat match the where-clause requirements.pdb_delete(data, where-clause*)- This will return a list of all elements which do not meet the where-clause requirements. Note that this does not make any changes to the list
dataitself.pdb_update(data, field, nvalue, where-clause*)- This finds all elements in the data which match the where-clauses, and then updates their
fieldproperties to the new valuenvalue. So, unlikepdb_delete, this does change your original data list. This also does not check that the field you're updating already exists. So, you can do what is effectively an 'ALTER TABLE' if you're not careful.pdb_merge(data*)- This merges data lists, avoiding duplicated records.
pdb_sort(data, by)- Sorts the data on the field given in 'by'. This changes the list, since it uses the built in Python list
sortmethod, which sorts in place.Note that since the data these functions operates on is just a list there is no special
insertfunction. Just add to the list using theappendmethod.Since all the where clauses of all these functions are logically ANDed together, you have no immediately obvious way to OR where clauses. So there is another function,
pdb_mergewhich will merge together the results of as manypdb_selectcalls as you need. The result is effectively an OR operation.# select * from d2 where age > 22 or age == 11 new_d2 = pdb_merge(pdb_select(d2, ('age', PDB_GT, 22)), pdb_select(d2, ('age', PDB_EQ, 11)))The
pdb_mergefunction will make sure there are no duplicate records if several of your selects have overlapping data.Examples
A lot of these examples are either exact copies or slightly modified versions of the test code in
test.pythat comes with the PseuDB package.I'll start off with a simple awk-like example. If for some reason I want to know everyone who is using the Korn shell on our systems, listed in alphabetical order, I could do this:
from pseudb import * from pprint import pprint import pwd d = pwd.getpwall() ksh_users = pdb_select(d, (6, PDB_MATCH, 'ksh')) # ^^ the sixth column is the shell field pprint(pdb_sort(ksh, 0)) # ^^ sort on the first field, the login nameUsing just lists of lists as a data structure is a little unfriendly, though. Keeping track of what each index means is error prone. So, let's try something a little different, assuming a small database of people:
d1 = [{'name': 'Fred', 'rank': 'Moron', 'SN': '112-333-443', 'age': 33}, {'name': 'Bob', 'rank': 'Twit', 'SN': '212-333-443', 'age': 23}, {'name': 'Jane', 'rank': 'Twit', 'SN': '312-333-443', 'age': 11}, {'name': 'Toad', 'rank': 'Moron', 'SN': '412-333-443', 'age': 23}, {'name': 'William', 'rank': 'Loony', 'SN': '421-333-443', 'age': 32}, {'name': 'Fred', 'rank': 'Sage', 'SN': '512-333-443', 'age': 25}, {'name': 'Fred', 'rank': 'Moron', 'SN': '612-333-443', 'age': 32}, {'name': 'Goober', 'rank': 'Sage', 'SN': '712-333-443', 'age': 16} ]In this case, with named fields, the resulting queries are a little easier to understand:
# Prints all the people who have the rank of Moron. pprint(pdb_select(d1, ('rank', PDB_EQ, 'Moron'))) # Get everyone who is younger than 22 and whose rank is Twit. pprint(pdb_select(d1, ('age', PDB_LT, 22), ('rank', PDB_EQ, 'Twit'))) # Forces a rank change on everyone older than 22. pdb_update(d1, 'rank', 'MotleyFool', ('age', PDB_GT, 22))These two examples are using hash tables as records. If instead you had a
Personclass with all these fields as atrributes, the code would be exactly the same. The PseuDB library figures out the type of data you're working on. The library will get horribly confused, though, if you mix types in a single list.In the functions that take where clauses, all those clauses are logically ANDed together. If you want to use an OR relation, you need do several selects and merge the results. It will avoid duplicates:
# These give the same answers: pprint(pdb_merge(pdb_select(d2, ('age', PDB_GT, 22)), pdb_select(d2, ('age', PDB_EQ, 11)) )) pprint(pdb_merge(pdb_select(d2, ('age', PDB_GT, 22)), pdb_select(d2, ('age', PDB_GT, 22)), pdb_select(d2, ('age', PDB_EQ, 11)) ))Plans
I may eventually use this library to write something like Perl's Sprite library. For now, this simple interface works for me.
I have no plans ever to make this do things like a join. If you really need the full power of a relational database, use a relational database. There are several good free ones, even one written entirely in Python, Gadfly.