|
Add a term to the Python
Glossary!
Python is a language for agile development that has gained an
enthusiastic following. You can read all about it at the Python.org web site.
I've written or collected little bits and pieces of quasi-useful Python
stuff. What I've announced to the public is available here.
More or Less Current Stuff
- Simple Logging Wrapper for prstat
- prstat is Sun's version of top. Where I work we used
to use top as a crude logging tool, just letting it run with output redirected
to a rotating set of logfiles. prstat can almost
substitute for top in this context, however it doesn't timestamp its
output. prstat-t.py solves that
shortcoming. (last updated 2009-09-14)
- Minimalist Mailman Review Page
- If you manage a popular mailing list with Mailman these days, you know
how hard it can be to review the messages that get held for your review. mmfold.py fetches the review page for a mailing list
and presents a more condensed version of the review page in your web
browser. The new version accepts password info in the URL. (last updated
2008-06-30)
- zipargs function for shell scripts
- A colleague wanted to perform a zip operation (Python's
zip not the compression program of the same name) in a
shell script. So I wrote something for him. Of course, it uses
Python's zip() function under the covers.
- Introspective
dir() function
- If you use the
dir() function as a cheap instrospection
tool, you've probably noticed that it doesn't work very well for exploring
package hierarchies. Here's a replacement which roots
around in package directories and eggs and lets you know what submodules and
packages it contains. (last updated 2008-03-18)
- bsddb185 module
- Python 3.0 will no longer come with the bsddb185 module. While it's
rarely used, it does have some use on systems which still use the Berkeley
DB 1.85 library, mostly BSD-derived Unix systems (including Macs). I
extracted the module from the current trunk (2.6a0) and stuck it on PyPI.
- Lock resources
- Python has a couple different file locking APIs. None are portable.
The lockfile package (currently alpha - version 0.2) implements a
cross-platform API and three different classes which use that API:
LinkFileLock (relies on the atomic nature of the
link(2) system call)
MkdirFileLock (relies on the atomic nature of the
mkdir(2) system call)
SQLiteFileLock (uses an SQLite database to lock
files)
- Add or print iCal events or todos from the
command line
- I use a Powerbook but rarely take it to work. This makes it difficult
to manage events and todos with iCal. The appscript
module makes it fairly easy to script many Mac OSX applications from
Python. ical.py is a fairly simple example of
appscript usage. It also relies on the dateutil package to support
flexible date/time parsing.
- Queue based on sockets
- A thread on
comp.lang.python got into a discussion of
communication between multiple processes. I suggested creation of a
class like Python's threaded Queue class. SocketQueue.py is a trivial implementation of
the idea. (last updated 2005-09-28)
- Mmencode in Python
- Way back in the early days of MIME there was mmencode. It was a
classical Unix filter. It was small and did one thing well. Somewhere
along the way it got replaced by other tools and on my latest web
server I found it's not available (at least not without grubbing around
for the proper RPM). Here's a simple
replacement in Python. It only implements the
-q and
-u flags and only writes to stdout, but that probably
accounts for 99% of the usage. (last updated 2005-08-12)
- Autoload modules
- Someone on
comp.lang.python whose name I didn't record
(who are you?) came up with this nifty module
autoloader. I modified it slightly. (last updated 2005-03-16)
- Config file reader/writer
- In response to a ConfigParser
Shootout I wrote one such little beastie.
Its main features are: indentation-based file format, nesting to
arbitrary depth, read/write round trip (sans comments at the moment)
and attribute-style or dict-style access. (last updated
2004-10-22)
- Rebind global variables during
reload()
- The subject of the behavior
of the
reload() function came up recently in
comp.lang.python. This trival
implementation may cover most of the perceived shortcomings of the
builtin reload(). (last updated 2004-03-14)
- Decode strings heuristically
- When dealing with Unicode inputs from various sources you may or may
not know how the input is encoded. If you don't know you probably have
to guess. This little module demonstrates one
set of guesses. You will almost certainly want to modify it for your
needs. (last updated 2004-03-01)
- Session save/restore
- Gerrit Holl suggested save() and load() builtins on python-dev. He was
thinking about using pickles, but I implemented a simpleminded version using the readline
module. Unfortunately, the readline requirement means it won't work on
Windows. Feel free to fix that shortcoming and send me a patch. (last updated
2003-12-01)
- Simple progress meter
- For long-running calculations, it's nice to have a simple way to
display progress. progress.py provides a couple
classes to support this. (last updated 2004-01-24)
- Latin-1-to-ASCII codec
- From time-to-time you really, really, really just want ASCII, as when
some spammer sends you a message with the subject, "We cän makë it lönger
now" or "keep up th¯e strugglê, get out ¨of that mess" (whatever that
means). latscii.py is a simple codec which makes a
reasonable attempt to strip accents from Latin-1 letters and map other
characters to reasonable ASCII equivalents (such as mapping '¡' to
'!'). (last updated 2003-11-11)
- Regular expressions as dictionary keys
- The topic of using regular expressions as dictionary keys recently
on
comp.lang.python. (It's also come up in
the past.) I had a need for this, but with dictionaries containing
hundreds of keys, all the regular expression matching makes the
straightforward implementation a dog. REDict.REDict uses a
binary search of the keys to speed things up. has_key() is
O(log len(d)) instead of O(len(d)). Using the
REDict.FastREDict class, matching is more like O(1). More
could probably be done (caching compiled regular expressions or optimizing
the large generated regular expressions), but this suffices for the time
being. (last updated 2003-10-22)
- Bulk Discard of Queued Mailman Messages
- A recent virus attack left me trying to manually discard a thousand
or so messages per day for a Mailman-2.1 list I help administer.
I wrote
mmdiscard.py to deal with that from the
command line. (last updated 2003-10-15)
- Speeding up Python programs
- (Moved to Python wiki.)
- Date-parsing module
- I wrote this module several years ago to recognize dates in many
different formats on-the-fly. The most useful bit is the
parse_date function. You probably won't want to
use it as-is. Just nick the regular expressions. (last updated
2003-06-28)
- Persistent Sets
- 2003-06-05. I just stumbled upon this. I thought it was so cool how
easily Python's new Set objects (new in 2.3) could be made persistent that
I thought it worth mentioning here.
- Firewall-1 Logfile Summarizer
- 2003-05-29. This script summarizes the csv log file which can be
dumped by Firewall-1 NG, at least as it exists on Solaris. It requires
Python 2.3 or later, as it uses the csv module introduced in that version.
It also requires Volker Tanger's fw1rules package, which
is used to dump a csv file containing your rules.
- CGI Environment Printer
- 2003-05-08. This has come in handy on a number of occasions since I
wrote it several years ago. It simply displays information about the CGI
environment. Compare that with your shell environment to help figure out
why your CGI scripts don't work as expected. (More could be done, like
displaying the path to the executable. For some reason, I never needed
that.)
- Dynamic Instruction Frequency Collector
- 2003-01-13. Every once in awhile, someone on
comp.lang.python wonders about optimizing some bit of Python
bytecode. The discussion usually boils down to:
- You need to generate a dynamic execution profile (DXP) to decide
if the optimization is worthwhile.
- Does someone already have some DXPs laying around?
- Utter silence.
DXPserver is an XML-RPC server which is meant to collect and distribute
dynamic execution profiles. If you think it might be useful, let me know. I don't currently run it,
but would be happy to if there was some demand.
- Readline & command history
- 2002-11-08. I refer to this during interactive startup (one of the
files which gets imported via PYTHONSTARTUP. I send it to people from
time-to-time. It's a useful file and also demonstrates how to use the
atexit module.
- Marshal written in Python
- 2002-10-03. Guido sent me a version of the marshal module written in
Python a few years ago. (I no longer remember why.) Once when I
encountered a corrupted marshal file I modified it to not raise an
exception when encountering an error during load(). Instead it returns
what it has accumulated up to that point. Warning: Do not install
this as marshal.py! If you do, you will almost certainly live to regret
that mistake!
- Category deletion for ifile
- 2002-09-21. I've been experimently with ifile recently and
flubbed some Emacs macros which I was using to categorize incoming
messages. I thus wound up with some bogus categories in my .idata file.
This script allows you to delete arbitrary categories from .idata files.
- Alarms for asyncore
- 2002-01-23. I recently had a reason to start using asyncore. It's a
marvelous package for doing I/O with several network sockets. One of the
first things I wanted to do after getting it working was implement alarms.
Signal.alarm is ugly and may not work everywhere anyway, so I took
advantage of the fact that asyncore uses the timeout feature of select()
and poll().
- Weekend Edition Sunday Puzzle
- 2001-10-14. I listen on occasion to NPR's Sunday Weekend Edition.
Perhaps the best segment of the show is the Puzzle run by Will
Shortz. On October 7th, 2001, this challenge was posted:
Draw a 4 by 3 box. The object is to fill it with letters
spelling 3 four-letter words across and 4 three-letter words reading
down. The conditions: your box can not repeat any letters, and it
must use all six vowels (a, e, i, o, u, y) once. All words must be
uncapitalized, common English words.
The code in nprpuzzle.py solves this problem
using a straightforward O(N**3) algorithm. I don't claim it's the best
way to approach the problem, but it was a fun diversion for a Sunday. It
uses my little progress module to track
progress.
- Locate Division Operators
- 2001-08-13. With the coming change to the semantics of
integer division you'll probably want to run something like finddiv.py
over your code to identify potential trouble spots. It does nothing more
than identify lines containing a "/" operator. It doesn't perform any
analysis to try and prune the possible list of lines it displays. It does
display lines in a format that Emacs's next-error command understands.
- Editor Support for Python
- 2001-09-10. This is no longer maintained by me. You will be
redirected to the new editors
page on the main Python website.
- ConstantMap.py - map numeric constants to their
names
- The ConstantMap.ConstantMap class can be instantiated
from modules of constants to map "magic numbers" back to their names.
This is useful when debugging code that returns such numbers. For
example, the numeric constant modules generated by the h2py script all map
semi-meaningful names to mostly meaningless numbers. ConstantMap allows
you to map them back. (last updated 2004-03-07)
- Watch - keyboard/mouse
monitor
- This Python script (hosted at SourceForge) monitors keyboard and mouse
activity and enforces work and rest times. It currently only runs on
Linux, but it has run on Windows in the past (only directly monitoring
mouse activity) and could probably run on the Mac without a lot of effort.
- Soundex module
- 2000-12-22. This module is a Python replacement for the now defunct
soundex.c. This module is a merging of separate
ones written by Tim Peters and Fred Drake.
- SYLK file reader
- 2000-10-10. This module reads SYLK files and generates CSV files.
Note that it currently has only been tested with files generated from
AppleWorks 5.0 on a Mac.
- Rough Size Calculator
- 2000-09-27. There are three general sources of memory leaks in
long-running Python programs: cyclical objects that reference counting
can't reclaim, botches at the low-level malloc interface, and growth of
container objects that are reachable, but whose growth you're unaware of.
Neil Schemenauer's garbage collector in Python 2.0 does a good job
identifying cyclical garbage. This module attacks the hird case. The
test case uses the Cache module below.
- Simple Caching Dictionary
- 2000-09-27. Sometimes you need to cache results of long computations
or database queries, but don't want your memory consumption to grow
without bound. The Cache class subclasses UserDict.UserDict to provide a
cache that discards values based on access time.
- XML-RPC validation suite
- 2000-06-05. This server passes the XML-RPC validation suite as
implemented at validator.xmlrpc.com as of June
5th, 2000.
- Adding gzip encoding capability to XML-RPC
clients and servers
- 2000-04-19. The instructions in gzip-xmlrpc.txt describe simple mods to XML-RPC
servers and clients to allow responses to be encoded using gzip when
possible. This can help performance significantly when using XML-RPC over
wide area networks. You can also download the version of xmlrpclib.py that I use which includes one or two
other mods. It is based on version 0.9.8 of Fredrik Lundh's xmlrpclib package.
- Manipulating recurring dates
- 1999-08-09. Recurring dates occur frequently in my business, e.g.,
"The Bill Baldwin Trio appears every Thursday at 10pm". recur.py is a crack at the problem. It allows you to
intersect two recurring dates or generate a finite subset of dates that
fit the recurrence pattern. There's a long comment at the start that's
not much more than me thinking out loud, followed by a fairly small amount
of code. Feedback is much appreciated. Experimental
(last updated 1999-08-09)
- Validation of CGI script parameters
- 1999-09-30. The cgi
module that comes with Python eliminates the tedium of marshalling
script parameters from the HTTP input stream. This module enhances that
with type checking. You can indicate which parameters are required, which
are optional and what their types must be. Type information can be given
in the input form, in an auxiliary file on the server or in the CGI script
itself. The latest version also handles multi-valued parameters such as
would be encountered with
<select> tags having the
multiple attribute.
- Finite State Machine
- 2001-09-06. I recently added the ability for the states to be regular
expression objects. This makes it easier to match some inputs in a
case-insensitive manner. I'm sure creative folks will find other weird
uses for the capability. (Note that when testing for re matches, it
simply loops through the possible inputs for that state. The first re
that matches the input is considered to match. If you have multiple re's
that match a particular input, which one gets picked is
non-deterministic.)
- Browsable Python sources
- I find it convenient to use this to refer to individual files from
web pages. I used to tar it all up and use a Python CGI script
called
tgzextr.py to pull out
individual files, but I now have enough disk space to keep the sources
laying about... :-) This directory is just a snapshot of
the CVS repository. (last updated 1999-09-30)
- Simple-minded TCP client and server
- I wrote a small client and server to test TCP connection and
transmission speeds using either AF_INET or AF_UNIX sockets.
Some implementations of AF_INET sockets degrade as you perform more
and more connections in a short period of time. This is presumably
due to linear search for an available port and the fact that TCP
requires sockets to hang around for awhile after closing to catch
late-arriving packets. (last updated 1996-06-07)
- Parser for robots.txt files
- Writing a Web wanderer in Python? Here's a little piece of code to
help you along.
Delivered with the Python distribution now.
Demo Scripts
The same questions seem to pop up from time-to-time. Here are some
short scripts that demostrate various Python modules or features
- Submitting URLs through a proxy
- The Mapblast script takes two
arguments, a city and a state or Canadian Province, and tries to
extract the city's lat/long information from the Mapblast web
server. If it succeeds, it displays a line with the city,
state, latitude and longitude separated by colons. If the city
name is slightly misspelled (e.g. Pittsburg instead of
Pittsburgh), the Mapblast server does a fair job of correcting
your spelling. This script displays what Mapblast said, not
what your wrote... (last updated 1998-05-12)
- Submitting a web form using the POST method
- The sony.py script accepts a key representing a
single Sony Music artist on stdin and submits it to Sony's artist search
engine by mimicking the POST method submission that a user would trigger
by searching for their favorite artist. It's one stage in a pipeline that
looks something like:
httpget http://www2.music.sony.com/musicdb/TourInfo | \
egrep -i 'DJ HONDA' | \
sed -e 's:<OPTION VALUE="\([0-9]\+\)">.*:\1:' | \
sony.py | \
....
(last updated 1998-05-17)
More or Less Obsolete Stuff
- Getting interactive help about objects
- I include the two functions in help.py in my
.pythonrc file. During interactive sessions I can then execute
help(foo) for an arbitrary object foo. If it's a
Python function or method, it will display its declaration. If it
has a doc string it will display that as well.
- Warning about return statement usage
- This patch to
.../Python/compile.c causes the Python byte
code compiler to print warnings if it encounters inconsistent use of the
return statement within a function. This is experimental
code. Use at your own risk! (last updated 1999-09-10)
- Warning about hiding builtin names
- This script takes a list of module names and tells you which top-level
functions have local variables that hide builtin objects with the same
name. (last updated 1999-09-30)
- Paper for SPAM-7
- This paper describes some initial work I did with a peephole optimizer
for Python byte code. (last updated 1998-11-22)
- Faster stack manipulation
- This patch against the 1.5.1 source distribution reduces the number of
temporary variables and PUSH/POP operations performed by the Python
interpreter. It also cleans up a few other miscellaneous nits. See
README.stack file for a summary of the
changes. Experimental (last updated 1998-07-05)
- Peephole optimizer for Python byte code
- This patch against the 1.5.1 source distribution adds a peephole
optimizer to Python. See README.peep file
for an introduction. Experimental (last updated
1998-07-02)
- DOS-ification of file names
- 1996-12-19. I wrote this short script to map long filenames to DOS
8.3 names. I imagine there are some errors in the assumptions about what
can go in a DOS filename. Feedback is welcome. (last updated
1996-12-19)
- Test coverage tool
- This little knock-off of the profile
module provides test coverage of Python scripts in the spirit of
Sun's now ancient (and probably defunct)
tcov tool. This is
superceded by the version incorporated into the standard Python
library.
- Experimental threaded web server
- This server avoids forking by mapping URLs to modules and then
calls the appropriate function in the handler thread. This was an
interesting exercise, but you should probably use Medusa instead.
- Quick Index of Python Library Reference
- I got tired of trying to find the names of modules in the Python Library
Reference table of contents, so I wrote a little script to create an alphabetically
organized table that makes it easier (for me at least) to find what
I'm looking for.
- Python support for VTK 2.0
- The above gzip'd diffs for VTK 2.0 support integration of VTK with
Python (85839 bytes). It was obsoleted by a rewrite of the VTK
wrapper code generator, so if you are running a more recent
version of VTK than what was released in mid-May 1998, this
won't work.
- Enhanced urllib module
- CGI scripts tend to do lots of quoting and unquoting. I wrote a
small enhancement to Python's urllib module
that migrates urllib.quote and urllib.unquote into the urlop C module. The test1 function at the
bottom of the urllib module runs 50-70 times faster using the urlop
module on my P100/BSDI system.
- Improved (?) allocation of short strings (Feedback welcome)
- I wrote a special-purpose allocator for short string objects (short
being <= 128 bytes, including the object header). A modified version
of stringobject.c contains the whole
thing. A modified version of stropmodule.c defines a "counts" function
that can be called to retrieve a list of counters that track creation
and deletion of, respectively, large, <=32-byte, <=64-byte, <=128-byte
strings.
Note that this is not yet ready for prime time! This
is just experimental code at this point. It is not well enough
packaged yet.
- Partial C implementation of the regsub
module
- This is another not-quite-ready-for-prime-time module. It only
implements regsub.sub and regsub.gsub at this point. I just checked
my version of regsub.py and noticed I'm not even using it... I
suspect there's a bug lurking in there but punted on trying to find
it. Whoever wants to take on the challenge, here it is.
- SPAM-1 presentation on Python/C++
- Ages and ages ago (1994! That's fourteen Internet years as I write
this in September 1996) it seems I was doing some Python/C++ integration. It's been
superceded by lots of other work. Still, some may find the
concepts presented useful.
python ring
Skip Previous
|
Previous
|
Next
|
Skip Next
|
List Sites
|
|