Difference between revisions of "Writing a text filter"

From EditPlus Wiki
Jump to: navigation, search
m
m (layout)
 
Line 1: Line 1:
== Writing a text filter ==
 
 
This is an outline of how to write your own text filter user tool.
 
This is an outline of how to write your own text filter user tool.
  
=== Why? ===
+
== Why? ==
 
I worked out how to do this because I wanted to filter error messages from a huge SQL script output file, but you can use this technique to manipulate a file in any way. Imagine if search and replace had even more power than regular expressions. The only limits are your programming ability and imagination.
 
I worked out how to do this because I wanted to filter error messages from a huge SQL script output file, but you can use this technique to manipulate a file in any way. Imagine if search and replace had even more power than regular expressions. The only limits are your programming ability and imagination.
  
=== How? ===
+
== How? ==
 
You set up a [[User Tools|user tool]] to run your filter - select the "Run as text filter" option. You don't need to use any of the special arguments like <tt>$(FileName)</tt> because text filters are always run on the content of the current EditPlus window. The command and argument settings will vary according to the way your filter must be called.
 
You set up a [[User Tools|user tool]] to run your filter - select the "Run as text filter" option. You don't need to use any of the special arguments like <tt>$(FileName)</tt> because text filters are always run on the content of the current EditPlus window. The command and argument settings will vary according to the way your filter must be called.
 
Of course, you also have to write the filter. My example below is Java, but any language that can read the standard input stream and write to the standard output stream is fine. If you're familiar with the idea of writing a utility that runs in a command line pipe, this is very similar.
 
Of course, you also have to write the filter. My example below is Java, but any language that can read the standard input stream and write to the standard output stream is fine. If you're familiar with the idea of writing a utility that runs in a command line pipe, this is very similar.
 
The general approach is that you are fed the content of the current file which you read. Your code decides what to do with this input. It can output some or all of the input, add or replace sections, generate something entirely new. Meanwhile, you can also do anything else you fancy with the text, like e-mail the juicy bits to your granny.
 
The general approach is that you are fed the content of the current file which you read. Your code decides what to do with this input. It can output some or all of the input, add or replace sections, generate something entirely new. Meanwhile, you can also do anything else you fancy with the text, like e-mail the juicy bits to your granny.
  
=== What if it goes wrong? ===
+
== What if it goes wrong? ==
 
Just like using search and replace, if you don't like what the filter has done to your text, you can undo.
 
Just like using search and replace, if you don't like what the filter has done to your text, you can undo.
  
=== Example 1: Java ===
+
== Examples ==
 +
=== Java ===
 
This Java code removes from SQL script output messages that indicate that things have worked correctly, leaving only error messages.
 
This Java code removes from SQL script output messages that indicate that things have worked correctly, leaving only error messages.
 
<pre>
 
<pre>
Line 56: Line 56:
 
</pre>
 
</pre>
  
=== Example 2: Perl ===
+
=== Perl ===
 
Perl code for removing leading and trailing whitespace (spaces and tabs)
 
Perl code for removing leading and trailing whitespace (spaces and tabs)
 
<pre>
 
<pre>
Line 69: Line 69:
 
</pre>
 
</pre>
  
=== Example 3: Javascript or VBScript ===
+
=== Javascript or VBScript ===
 
This example is in Javascript.  It works basically the same in VBScript.
 
This example is in Javascript.  It works basically the same in VBScript.
 
Run as:
 
Run as:
Line 84: Line 84:
 
</pre>
 
</pre>
  
=== Example 4: Python ===
+
=== Python ===
 
This example attempts to tidy XML. It can be run as an EditPlus text filter tool, or from the command line.
 
This example attempts to tidy XML. It can be run as an EditPlus text filter tool, or from the command line.
 
<pre>
 
<pre>
Line 154: Line 154:
 
</pre>
 
</pre>
  
=== Example 5: Python again ===
+
=== Python again ===
 
This is surprisingly useful. It lines up text into columns by inserting spaces, for example from:
 
This is surprisingly useful. It lines up text into columns by inserting spaces, for example from:
 
<pre>
 
<pre>

Latest revision as of 05:31, 10 December 2010

This is an outline of how to write your own text filter user tool.

Why?

I worked out how to do this because I wanted to filter error messages from a huge SQL script output file, but you can use this technique to manipulate a file in any way. Imagine if search and replace had even more power than regular expressions. The only limits are your programming ability and imagination.

How?

You set up a user tool to run your filter - select the "Run as text filter" option. You don't need to use any of the special arguments like $(FileName) because text filters are always run on the content of the current EditPlus window. The command and argument settings will vary according to the way your filter must be called. Of course, you also have to write the filter. My example below is Java, but any language that can read the standard input stream and write to the standard output stream is fine. If you're familiar with the idea of writing a utility that runs in a command line pipe, this is very similar. The general approach is that you are fed the content of the current file which you read. Your code decides what to do with this input. It can output some or all of the input, add or replace sections, generate something entirely new. Meanwhile, you can also do anything else you fancy with the text, like e-mail the juicy bits to your granny.

What if it goes wrong?

Just like using search and replace, if you don't like what the filter has done to your text, you can undo.

Examples

Java

This Java code removes from SQL script output messages that indicate that things have worked correctly, leaving only error messages.

import java.io.*;
import java.util.HashSet;
public class SPOutStripper
{
   static HashSet strippers = new HashSet ();
   static
   {
      strippers.add ( ""                  ); // a blank line
      strippers.add ( "Table dropped."    );
      strippers.add ( "Table created."    );
      strippers.add ( "1 row created."    );
      strippers.add ( "Commit complete."  );
      strippers.add ( "Table altered."    );
      strippers.add ( "1 row updated."    );
      // ...and many others
   }
   public static void main ( String [] args )
         throws Exception // Lazy programmer hopes IOException won't bite him
   {
      BufferedReader in    = new BufferedReader ( new InputStreamReader ( System.in ) );
      PrintWriter    out   = new PrintWriter ( new BufferedWriter ( new OutputStreamWriter ( System.out ) ) );
      String         line;
      // Loop through lines of input
      while ( null != ( line = in.readLine () ) )
      {
         // Check whether line should be stripped out
         if ( ! strippers.contains ( line ) )
         {
            // If it shouldn't, send it back out again
            out.println ( line );
         }
      }
      out.flush (); // Important!
      // Finished - tidy up
      out.close ();
      in.close ();
   }
}

Perl

Perl code for removing leading and trailing whitespace (spaces and tabs)

#!/usr/bin/perl
use warnings;
use strict;
while (my $text = <STDIN>) {
	chomp $text;
	$text =~ s/^[ \t]+|[ \t]+$//g;
	print "$text\n";
}

Javascript or VBScript

This example is in Javascript. It works basically the same in VBScript. Run as: cscript //NoLogo "c:\path to tool\tool.js"

var stdin   = WScript.StdIn;
var stdout  = WScript.StdOut;
var input = stdin.ReadAll();
/*
Here you do something with the input.
But since this is a demo, we're just going to write it back out.
*/
stdout.Write(input);

Python

This example attempts to tidy XML. It can be run as an EditPlus text filter tool, or from the command line.

import os,sys,re

def openAnything(source):
	"""Cribbed form diveintopython.org """
	if source == "-":
		return sys.stdin

	# try to open with urllib (if source is http, ftp, or file URL)
	import urllib
	try:
		return urllib.urlopen(source)
	except (IOError, OSError):
		pass

	# try to open with native open function (if source is pathname)
	try:
		return open(source, 'r')
	except (IOError, OSError):
		pass

	# treat source as string
	import StringIO
	return StringIO.StringIO(str(source))

def prettyUp ( xml ):
	""" Based on http://www.faqts.com/knowledge_base/view.phtml/aid/4334/fid/538 """
	parts = re.split ( '(<.*?>)', xml )
	level = 0
	wasText = False
	out = ""
	for part in parts:
		# ignore empty part
		if part.strip ( ) == '':
			continue
		# opening tags
		if part [ 0 ] == '<' and part [ 1 ] != '/' and part [ 1 ] != '?' and part [ 1 ] != '!':
			print
			sys.stdout.write ( '\t' * ( level ) + part )
			# short-cut empty tag
			if part [ -2 : ] != '/>':
				level += 1
			wasText = False
		# closing tags
		elif part [ : 2 ]  == '</':
			level -= 1
			if not wasText:
				print
				sys.stdout.write ( '\t' * ( level ) )
			sys.stdout.write ( part )
			wasText = False
		# text
		else:
			sys.stdout.write ( part )
			wasText = True

if len ( sys.argv ) == 1:
	xml = openAnything ( "-" ).read ()
elif len ( sys.argv ) == 2:
	xml = openAnything ( sys.argv [ 1 ] ).read ()
else:
	xml = None
	sys.stderr.write ( "Wrong number of arguments.\n" )

if None != xml:
	prettyUp ( xml )

Python again

This is surprisingly useful. It lines up text into columns by inserting spaces, for example from:

9 whatever
999 whatever
99 whatever
9999 whatever

to:

9    whatever
999  whatever
99   whatever
9999 whatever

Note: This code has some quirks - but you can hit Undo if you don't like the result.

You'll need to use "Prompt for arguments" ("$(Prompt)")after the script name to get a dialog where you can specify the whatever to be lined up. For a regular expression match, start with / (so /c.t will line up cat, cot, etc.)

import os,sys,re

def openAnything ( source ):
   """Cribbed form diveintopython.org """
   if source == "-":
      return sys.stdin

   # try to open with urllib (if source is http, ftp, or file URL)
   import urllib
   try:
      return urllib.urlopen ( source )
   except ( IOError, OSError ):
      pass

   # try to open with native open function (if source is pathname)
   try:
      return open ( source, 'r' )
   except ( IOError, OSError ):
      pass

   # treat source as string
   import StringIO
   return StringIO.StringIO ( str ( source ) )

def findMarker ( line, marker ):
   if "/" == marker [ : 1 ]:
      match = re.search ( marker [ 1 : ], line )
      if None == match:
         return -1
      return match.start ()
   return line.find ( marker )

def lineUp ( text, marker ):
   lines = re.split ( '\n', text )
   maxStartLen = max ( findMarker ( line, marker ) for line in lines )
   for line in lines:
      if 0 < len ( line ):
         pos = findMarker ( line, marker )
         start = line [ : pos ]
         end = line [ pos : ]
         print start + ( ' ' * ( maxStartLen - len ( start ) ) ) + end

if len ( sys.argv ) == 2:
   text = openAnything ( "-" ).read ()
   marker = sys.argv [ 1 ]
elif len ( sys.argv ) == 3:
   text = openAnything ( sys.argv [ 1 ] ).read ()
   marker = sys.argv [ 2 ]
else:
   text = None
   sys.stderr.write ( "Wrong number of arguments.\n" )

if None != text:
   lineUp ( text, marker )