50th Post – Finding Errors in a log file

This post is just bump for fifty posts. For this code snippet is quick treat for the 50th post. It’s another one of DaBeaz python samples from his talks. This one is from his Mastering File I/O talk and it’s design to find 404 errors contained in a log file. There are two different methods, one reads the file in text mode and the other uses the binary file mode.

# find404.py
#
# Find set of all URLs with a 404 error

from timethis import timethis

with timethis("Find 404 urls - text"):
    error_404_urls = set()
    for line in open("access-log"):
        fields = line.split()
        if fields[-2] == '404':
            error_404_urls.add(fields[-4])

    for name in error_404_urls:
        print(name)

with timethis("Find 404 urls - binary"):
    error_404_urls = set()
    for line in open("access-log","rb"):
        fields = line.split()
        if fields[-2] == b'404':
            error_404_urls.add(fields[-4])

    error_404_urls = { n.decode('latin-1')
                       for n in error_404_urls }

    for name in error_404_urls:
        print(name)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s