Dienstag Feb 16, 2010

ElasticSearch

Shay Banon (@kimchy) of Compass fame has done it again and created a cool search project: ElasticSearch. It's a server component on top of Lucene which you interact with via JSON/REST. Unlike Solr it is schema less and focuses on the easiness of transparent multi node setups. It will be interesting to see how it compares to Solr in terms of search features like facetting and range searches. Here is what Shay has to say on the ElasticSearch vs. Solr topic. It's still early days but I will keep an eye on it.

Dienstag Feb 09, 2010

Some git resources

I've been using git a bit more recently and have stumbled across a couple of nice tips and tools.

Installation

I use macports to get a recent version of git on my mac

Setup

Well you should at least specify your name and email on a global level, like so

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com
If you are a github user clearly you want to setup your github identity like so
git config --global github.user username
git config --global github.token xyzxyzxyzxyzxyzxyzxyzxyz
(the values to set you can find here: https://github.com/account)

Documentation

There are at least 3 online books

I've also found this article on setting up a git 'server' helpful: http://toolmantim.com/articles/setting_up_a_new_remote_git_repository.
And there is alwas the 'official' slightly nerdy documemtation at http://kernel.org/pub/software/scm/git-core/docs/

(GUI-)Tools

If you installed git with macports you might as well install tig, which is quite a nice CLI git browser. I also like gitX and there is SmartGit, which is also free for non-commercial use. When using git you obviously want to have a look at github; if you are not developing open source projects trac might be a 'poor mans github alternative' with the git plugin. I also really quite like the Git Bundle for TextMate. EGit/JGit are clearly worth having a look out for for those of you, who earn there money in Java land. You can find information on how to setup FileMerge.app for diffing and merging here and here

why I like groovy

I use groovy more and more often. See this little gem for transforming the charset of a batch of textfiles. Couldn't get a lot simpler. It's basically 6 lines of code:

if(args.length != 4) {
	println "Usage: specify these arguments: input dir, input encoding, output dir, output encoding"
	return
}

def inputDir = new File(args[0])
def inputEncoding = args[1]
def outputDir = new File(args[2])
def outputEncoding = args[3]

assert inputDir.exists(), "${inputDir} must exist"
assert inputDir.canRead(), "${inputDir} must be readable"
assert outputDir.exists(), "${outputDir} must exist"
assert outputDir.canWrite(), "${outputDir} must be writable"

inputDir.eachFileMatch(~/.+\.txt/) { file ->
	def reader = file.newReader(inputEncoding)
	
	def outputFile = new File(outputDir, file.name)
	outputFile.withWriter(outputEncoding) { writer ->
		writer << reader
	}
}
Love it!

Mittwoch Nov 11, 2009

A sample of Spring AOP usage

In the last couple of years I've come across this problem several times: you've created a business application for a client with various business / domain objects, all is well and then the client decides they need to somehow be able to group various of those objects together and then they need to be able to set global filters to make sure you only see and work with objects belonging to a certain grouping. Now there is two problems here:

  • The grouping isn't restricted to one of your domain objects. You need to be able to throw all your various domain objects into those groups, i.e you can create a group that consists of people and things and emails and boxes and …
    Unfortunately that isn't a trivial thing to model in your classical rdbms/hibernate persistence application. Relational databases aren't particularly well suited to store relations from one thing (table) to all kinds of other things (tables) while still keeping it easy to query those relationships and also maintaining relational integrity.
    Well, I haven't really solved this problem. I recently implemented a simple solution that is good enough for my project albeit not very elegant I think - if someone has solved this in a better way, please let me know or point me to a good resource. Anyway I have solved it by creating link objects for each thing that needs to be grouped. The link objects form a class hierarchy that ist mapped onto a single table. That allows me to quite easily query those relationships at the cost of the link table having to have quite a few nullable columns - which C.J.Date wouldn't approve of I guess
  • The second problem is that you then depending on some contextual information in the user session of your application you need to filter all listings of your domain objects. You could of course have some if or switch statements in your controller logic but that smells. It also means that you need to touch potentially lots and lots of places in your code if anything with your grouping changes.
Now what I'd like to talk about here is how I have tried to solve the second problem in a sensible way. In my current project things in the app are grouped by some sort of 'scope'. A user can choose the scope he is currently working on or leave the scope as the global scope. This setting is attached to the current user session and from then on all views showing listings of domain objects need to filter these by the selected scope. Here is how I've done it, but first some background on the app: It's a webapp, built with spring 3, springMVC, hibernate, postgreSQL. I have a package containing all my DAOs called 'dao'. I have configured spring to autoproxy by specifying this in my application context:
<aop:aspectj-autoproxy />
(You need to configure the aop schema obviously) I have then create an Aspect class like this
…

import java.lang.reflect.Method;

import org.aspectj.lang.ProceedingJoinPoint;
import org.aspectj.lang.annotation.Around;
import org.aspectj.lang.annotation.Aspect;
import org.aspectj.lang.annotation.Pointcut;
import org.aspectj.lang.reflect.CodeSignature;


@Aspect
public class ScopeAspect {
	final static Logger logger = LoggerFactory.getLogger(ScopeAspect.class);
	
	@Pointcut("within(dao..*)") 
	public void inDataAccessLayer() {}
	
	@Pointcut("execution(public * scoped*(..))")
	public void scoped(){}
	
	@Around(value = "inDataAccessLayer() && scoped() && target(target)")
	public Object addScopeIfNeeded(ProceedingJoinPoint joinPoint, GenericDao target) throws Throwable
	{
		Scope scope = ScopeUtils.getCurrentScope();
		if(scope != null) {
			logger.debug("rewriting dao method to use scope!");
			
			// gather the argument values and types
			Object[] originalArgs = joinPoint.getArgs();
			Object[] modifiedArgs = new Object[originalArgs.length + 1];
			modifiedArgs[0] = scope;
			
			// prepend the 'scope' arg
			System.arraycopy(originalArgs, 0, modifiedArgs, 1, originalArgs.length);
			
			// calculate the new argument type list
			Class[] modifiedArgTypes = new Class[modifiedArgs.length];
			modifiedArgTypes[0] = Project.class;
			Class[] originalArgTypes = ((CodeSignature)joinPoint.getSignature()).getParameterTypes();
			System.arraycopy(originalArgTypes, 0, modifiedArgTypes, 1, originalArgTypes.length);
			
			// find the method that has the same signature apart from the preceding scope type argument
			Method m = target.getClass().getMethod(joinPoint.getSignature().getName(), modifiedArgTypes);
			
			return m.invoke(target, modifiedArgs);
		}
		else
		{
			return joinPoint.proceed();
		}
		
		
	}
	
}
So what does that do? This aspect is now called for all methods whose names starts with 'scoped' in Classes living in the 'dao' package branch. when such a method is called, the aspect wraps itself around the method call, determines whether it needs to do anything (if scope != null) and if so it uses relection to find a method with the same name and signature apart from am additional first argument of type 'Scope'. What I like about this
  • It really is an aspect of my app and therefore it is nice that this lives in exactly one place.
  • It means my I can achive the deired effect by convention rather than configuration: I just need to code multiple methods with the same name in my DAOs
  • It also means that the dao methods still fully encapsulate the query specifics and if different entities have different requirements for the queries used to list them by scope that is fine
  • Since only certain methods of my DAO's are affected I can still have lots of methods left that won't ever get touched by the aspect and so I can have getAll() or list() or whatever methods that I might need for an admin backend.
Why it's not perfect
  • Well if you forget about the aspect and forget to code the partner 'scoped' method in a dao you'll get a NoSuchMethodException
  • I doubt this, but it could be, that it affects performance
  • You still have to remember using your 'scoped' methods in your controller
  • It might actually drive you nuts if you come back to this code in 2 years time, having forgotten about this aop business, and put a breakpoint in the method called inyour controller only to wait endlessly for that method to ever be called …
If you have found a particular nice way to solve this kind of problem, especially the first part of it, please write a comment, send me a pointer, whatever. It's really bugging me that I'm not sure whether there is 'the' solution out there.

Dienstag Nov 10, 2009

trouble with spring and maven assembly plugin

I've built a couple of small java tools recently that do some background processing and get run from cron. These tools are too complex for a maintainable groovy or python script (at least for my taste) and I therefore implement them in Java using spring for its wonderful JDBC template, database connection handling and some other niceties. Now to make for a simple deploy I pack them up using mavens assembly plugin and here is where the fun starts. If you do this you will get an error like this:

Configuration problem: Unable to locate Spring NamespaceHandler for XML schema namespace
The reason is that the assembly plugin copies stuff from META-INF dirs of dependency jars into the META-INF dir of the resulting jar and quite a few of the spring jars have spring.handlers and spring.schemas files in their META-INF directory. Now when the assembly plugin packages all these together only the files of teh last jar it deals with end up in the resulting jar which causes above error when you try to run the jar. After some googling I found a couple of hints on the web, e.g. http://forum.springsource.org/showthread.php?p=190246#post190246 and http://forum.springsource.org/showthread.php?p=195655#post195655 and cooked up a fix that works ok for me. I first concatenated the contents of all spring.handlers files into a new spring.handlers file all spring.schemas into a new spring.schemas and put the resulting files into my src/main/resources directory. I then created the file src/main/assembly/my-jar-with-dependencies.xml with the following contents:
<assembly>
  <id>my-jar-with-dependencies</id>
  <formats>
    <format>jar</format>
  </formats>
  <includeBaseDirectory>false</includeBaseDirectory>
  <dependencySets>
    <dependencySet>
      <unpack>true</unpack>
      <scope>runtime</scope>
      <unpackOptions>
	      <excludes>
	      	<exclude>**/spring.handlers</exclude>
	      	<exclude>**/spring.schemas</exclude>
	      </excludes>
      </unpackOptions>
    </dependencySet>
  </dependencySets>
  <fileSets>
    <fileSet>
      <directory>${project.build.outputDirectory}</directory>
    </fileSet>
    <fileSet>
		<directory>target/classes</directory>
		<outputDirectory>META-INF</outputDirectory>
		<includes>
			<include>spring.handlers</include>
			<include>spring.schemas</include>
		</includes>
	</fileSet>
  </fileSets>
</assembly>
and added this to the plugins section in my pom.xml
<plugin>
  <artifactId>maven-assembly-plugin</artifactId>
  <configuration>
    <descriptors>
      <descriptor>src/main/assembly/my-jar-with-dependencies.xml</descriptor>
     </descriptors>
     <archive>
       <manifest>
         <mainClass>my.main.Class</mainClass>
       </manifest>
     </archive>
  </configuration>
</plugin>
If I now run 'mvn assembly:assembly' it created the file 'target/myProject-my-jar-with-dependencies.jar' which I can run without getting the 'Unable to locate …' message any longer.
P.S. There is also a JIRA issue on this: MASSEMBLY-360

Montag Sep 21, 2009

Atomkraft - nein danke!

Seit Donnerstag sind wir nun Stromerzeuger (pünktlich zum Ende des Sommers :-( ). Eine 5.6kWp Anlage aus 28 Modulen auf unserem Ost-West ausgerichteten Dach erzeugt bei dem momentan vorherrschenden freundlichen Spätsommerwetter zwischen 12 und 20 kWh Strom. Und wenn der Errichter richtig gerechnet hat sollten wir im Jahr ca. 1500 kWh mehr erzeugen als wir selber benötigen.

pv_dach_ost.jpg pv_dach_west.jpg pv_wechselrichter.jpg

Mittwoch Sep 16, 2009

Image upload from the clipboard

A client recently asked me whether it was possible to modify a webapp that we've been building for them to allow for somehow uploading graphics from the clipboard. I hesitated and said that it wasn't possible with HTML / Javascript but then did a bit of research. It quickly became clear that one would need to use some Java applet / ActiveX control (do these still exist?) or Flash app to achieve the effect. So I set out to create something.

I am a decent backend Java programmer but I've never much liked the GUI side of Java and applets are GUI to some extent, so I searched the web and nicked ideas all over the place and this is what I came up with:


	package de.woerd.applet;

	import de.woerd.io.MultiPartFormOutputStream;
	import java.awt.Color;
	import java.awt.Image;
	import java.awt.Toolkit;
	import java.awt.datatransfer.Clipboard;
	import java.awt.datatransfer.DataFlavor;
	import java.awt.datatransfer.Transferable;
	import java.awt.image.BufferedImage;
	import java.awt.image.ImageObserver;
	import java.awt.image.RenderedImage;
	import java.io.BufferedReader;
	import java.io.ByteArrayOutputStream;
	import java.io.IOException;
	import java.io.InputStreamReader;
	import java.net.URL;
	import java.net.URLConnection;
	import javax.imageio.ImageIO;
	import javax.swing.ImageIcon;
	import javax.swing.JApplet;
	import javax.swing.JLabel;

	/**
	 *
	 * @author joerg
	 */
	public class ImagePaster extends JApplet {

	    Clipboard clipboard;
	    Toolkit toolkit;
	    JLabel status;


	    @Override
	    public void init() {
	        super.init();

	        toolkit = Toolkit.getDefaultToolkit();
	        clipboard = toolkit.getSystemClipboard();


	    }

	    /**
	     *
	     * @param targetUri - relative to location of page this applet is embedded in
	     * @param format format of resulting image bytes, currently only support 'jpeg' or 'png'
	     * @return
	     */
	    public AppletResult pasteImage(String targetUrl, String format) {

	        //get image data from clipboard
	        Image image = getImageFromClipboard();

	        if(image == null)
	            return new AppletResult(false, "Kein Bild!", null);

	        if(!("jpeg".equals(format) || "png".equals(format)))
	            return new AppletResult(false, "Format nicht unterstützt. Bitte entweder 'jpeg' oder 'png' wählen", null);

	        // create a byte stream of that image, encoded in the correct image format
	        ByteArrayOutputStream imageBytes = null;
	        try{
	            imageBytes = getImageBytes(image, format);
	        }
	        catch(IOException e)
	        {
	            return new AppletResult(false, "Bild konnte nicht gelesen werden", null);
	        }

	        // upload bytes to targetUrl as
	        String result = null;
	        try{
	            result = uploadImage(targetUrl, imageBytes, format);
	        }
	        catch(Exception e)
	        {
	            return new AppletResult(false, "Bildaten konnten nicht heraufgeladen werden", null);
	        }

	        return new AppletResult(true, null, result);
	    }


	    private Image getImageFromClipboard() {
	        Transferable transferable = clipboard.getContents(null);
	        if(!transferable.isDataFlavorSupported(DataFlavor.imageFlavor))
	            return null;
	        try {
	            Image img = (Image) clipboard.getContents(null).getTransferData(DataFlavor.imageFlavor);

	            BufferedImage newImg = null;
	            int w = img.getWidth(null);
	            int h = img.getHeight(null);
	            newImg = new BufferedImage(w,h,BufferedImage.TYPE_INT_RGB);


	            ImageIcon ii = new ImageIcon(img);
	            ImageObserver is = ii.getImageObserver();

	            newImg.getGraphics().setColor(new Color(255, 255, 255));
	            newImg.getGraphics().fillRect(0, 0, w, h);
	            newImg.getGraphics().drawImage(ii.getImage(), 0, 0, is);

	            return newImg;
	        } catch (Exception e) {
	            return null;
	        }
	    }

	    private ByteArrayOutputStream getImageBytes(Image image, String format) throws IOException {
	        ByteArrayOutputStream baos = new ByteArrayOutputStream();
	        if(image instanceof RenderedImage)
	        {
	            ImageIO.write((RenderedImage)image, format, baos);
	        }

	        if(baos.size() == 0)
	            throw new IOException("No image data found");

	        return baos;
	    }

	    private String uploadImage(String targetUrl, ByteArrayOutputStream imageBytes, String format) throws Exception  {

	        URL url = new URL(getDocumentBase(), targetUrl);
	        // create a boundary string
	        String boundary = MultiPartFormOutputStream.createBoundary();
	        URLConnection urlConn = MultiPartFormOutputStream.createConnection(url);

	        urlConn.setRequestProperty("Accept", "*/*");
	        urlConn.setRequestProperty("Content-Type", MultiPartFormOutputStream.getContentType(boundary));

	        // set some other request headers...
	        urlConn.setRequestProperty("Connection", "Keep-Alive");
	        urlConn.setRequestProperty("Cache-Control", "no-cache");

	        // no need to connect because getOutputStream() does it
	        MultiPartFormOutputStream out = new MultiPartFormOutputStream(urlConn.getOutputStream(), boundary);

	        // write bytes 
	        out.writeFile("cbUpload", "image/" + format, "clipboardImageUpload." + format, imageBytes.toByteArray());
	        out.close();

	        // read response from server
	        StringBuffer buf = new StringBuffer();
	        BufferedReader in = new BufferedReader(new InputStreamReader(urlConn.getInputStream()));

	        String line = "";
	        while ((line = in.readLine()) != null) {
	            buf.append(line);
	        }
	        in.close();
	        return buf.toString();
	    }



	}
	
	package de.woerd.applet;

	public class AppletResult {
		private boolean success;
		private String message;
		private String result;

		public AppletResult(boolean success, String message, String result) {
			this.success = success;
			this.message = message;
			this.result = result;
		}

		public boolean isSuccess() {
			return this.success;
		}

		public void setSuccess(boolean success) {
			this.success = success;
		}

		public String getMessage() {
			return this.message;
		}

		public void setMessage(String message) {
			this.message = message;
		}

		public String getResult() {
			return this.result;
		}

		public void setResult(String result) {
			this.result = message;
		}
	}

Now for this to work as an applet you also need the accompanying class MultiPartFormOutputStream to do the actual multipart posting which I nicked as well and which you can download here and the applet needs to be signed.

I used Netbeans 6.7 following these instructions.

You can then embed the applet into an html page like this:

	<body>
		<script type="text/javascript">
			function upload() {
				obj = document.getElementById('paste-image');
				result = obj.pasteImage('/uploadscript', "jpeg");
				
				if(result.isSuccess()) {
					// do something with result.getResult()
					// result.getResult() will contain whatever the uploadscript returns. So it'd make some sense to make it return the URI of the new upload ;-)
				}
				else { // error
					// display result.getMessage() in some way
				}
			}
		</script>

		<object id="paste-image" classid="java:de/woerd/applet/ImagePaster.class" type="application/x-java-applet" archive="/path/to/signed/jarfile.jar" />" width="1" height="1"></object>
		
		<input type="button" value="Paste it!" onclick="upload();">
		
	</body>

When you then copy an image to the clipboard and press the 'Paste it!' button, the image content is obtained from the clipboard, transformed into a jpeg or png and uploaded to your server script.

Notes

  • What I don't really understand is what happens in the 'getImageFromClipboard()' method, which - you guessed it - is also nicked. But if I take the image that I obtain with clipboard.getContents(null).getTransferData(DataFlavor.imageFlavor) directly and pass it to ImageIO I get a black empty graphic.

  • What I haven't fully grasped yet is whether in certain circumstances I need to to a Base64 transfer encoding and how I would do that without additional libraries (I have implemented the upload with nicked code rather than with e.g. Commons HttpClient because in a signed applet context all libs need to be signed as well and I couldn't face the extra depoyment steps for that)

  • This is code that I'm currently integrating into a client project. It is only being tested for the (controlled) environment at this client and might now work in all OS/Browser combinations

  • My next step will be doing a drag'n drop file upload which is easy now that I got the basic understanding of how applets work

References

My thanks once more goes to all the people on the net sharing their findings.

Donnerstag Aug 13, 2009

Bash Quoting Trouble

Gerade hab ich mir mal wieder die Haare gerauft ob der Bash Quoting und Escaping Rules. Ich möchte in einem Skript eine Datei per smbclient kopieren (warum ist smbclient eigentlich so sperrig?) und dabei die Verzeichnisse hüben wie drüben und den Dateinamen variabel festlegen. Das richtige quoting zu finden hat mich viel nerven gekostet aber hier ist eine Lösung die funktioniert

#!/bin/bash                                                                                                                      
#set -x  # this has been very helpful in debugging the quoting                                                                                                                           

umask 022

PATH=/sbin:/bin:/usr/sbin:/usr/bin
export PATH

# copy target settings
REMOTE_BACKUP_DIR=backup
REMOTE_NAS_ADDRESS=1.2.3.4
REMOTE_NAS_SHARE=BACKUP

# source dir and file
BACKUP_DIR=/srv/backup/project
TODAY=`date +'%Y%m%d%H%M%S'`
BACKUP_FILE=db_bak_$TODAY.bz2

# create the file to copy
...

SMB_CMD="cd $REMOTE_BACKUP_DIR; lcd $BACKUP_DIR; put db_bak_$TODAY.bz2"

smbclient -A /root/smbauth.conf -c "$SMB_CMD"  //$REMOTE_NAS_ADDRESS/$REMOTE_NAS_SHARE

exit 0

Montag Jun 29, 2009

A trip to charset/encoding hell

I spent far too much time the last couple of days trying to solve an encoding problem in a springMVC/jquery based web application and here is a potential solution for those who run into the same or similar issues. I have a webapp that is mostly pure HTML but has some ajax comfort features in the backend. Everything worked for a long time and then the client started testing and immediately turned up a problem with text put into ajax forms turned into gibberish when containing Umlauts and such like. I started to investigate and went to hell. I quickly discovered that the same form when submitted via the jquery form plugin would screw up the text, whereas when submitted calssically without any javascript interference it would work. I thought that somewhere I must have forgotten to be explicit about the content-type or charset-encoding and made sue that each page would have the content-type header set to "text/html, charset=UTF-8". Now the ajax submitted form worked but the classical non-ajax submit would screw up the content. It seemed like there was also a difference between POSTing the form and submitting it via GET, with one generally working and the other not. I tried all sorts of combinations of headers and meta tags - alas no luck: I could have either one or the other. I searched the web and didn't find anything helpful for quite a while until I stumbled across this post, which basically introduced me to the Springs org.springframework.web.filter.CharacterEncodingFilter which can be used to enforce a charset header for every request. You just specify a filter like so:


<filter>
		<filter-name>characterEncodingFilter</filter-name>
		<filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
		<init-param>
			<param-name>encoding</param-name>
			<param-value>UTF-8</param-value>
		</init-param>
		<init-param>
			<param-name>forceEncoding</param-name>
			<param-value>true</param-value>
		</init-param>
	</filter>
	<filter-mapping>
		<filter-name>characterEncodingFilter</filter-name>
		<url-pattern>/*</url-pattern>
	</filter-mapping>
(in your web.xml).
I then also made sure that all my template based responses send a content-type header by setting
<property name="contentType" value="text/html;charset=UTF-8" />
for my view resolver.
Then I double-checked that the URIEncoding parameter of my tomcat connector is set to "UTF-8" and that my main layout template sends the
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"/ />
header and now everything seems to work. I hope this might help somebody as desperate as I was.

Freitag Jun 05, 2009

soundtrack

Bin gerade aus ungeklärten Gründen über die Rainbirds und ihre Sängerin Katharina Francke gestolpert - ist immer noch gute Musik. Dabei ist mir aufgefallen, dass das Debüt Album der Rainbirds wohl als Soundtrack zu meiner ersten Liebe gelten kann - zusammen mit U2s "Joshua Tree", Udo Lindenbergs "Phönix", Men at works "Cargo" und einigen weiteren Alben (u.A. The Communards).
Vielleicht liegt es ja am kürzlich stattgefundenen 20-jährigen Abiturtreffen, dass ich solche Rückerinnerungsmomente habe …

Dienstag Jun 02, 2009

google wave

Es ist nicht gerade originell sich zu Googles Wave zu äussern - viele, zu viele kluge und weniger kluge Leute haben sich schon geäussert (und die meisten davon auschliesslich aufgrund eines YouTube videos) - aber ich tue es trotzdem, da mich diese Sorte Technologie einfach stark interessiert und ich einiges in dem Bereich über die Jahre ausprobiert habe, da ich die meiste Zeit in geographisch mehr oder minder weit verstreuten Teams arbeite. Am besten gefiel mir dabei bisher Groove (das 2005 von Microsoft gekauft wurde und leider Windows only war) - Wave könnte ein würdiger Nachfolger sein und durch das offene Protokoll und das open sourcing des Codes mehr Momentum gewinnen als das Groove jemals getan hat. Ausserdem ist die Eintrittsschwelle natürlich weit geringer, wenn man lediglich einen aktuellen Browser benötigt. Sollte das alles so funktionieren, wie in dem Demo vorgeführt, kann ich mir vorstellen, dass sich Wave zu einem sehr nützlichen Tool entwickeln könnte. Bleibt als Nachteil nur die Skepsis gegenüber der Datenkrake Google - aber man wird ja angeblich seinen eigenen Wave Server betreiben können. Ausserdem frage ich mich schon, wie die Technik skaliert, wenn ein Wave Server Hunderte oder Tausende gleichzeitige Edits an alle Nutzer streamen muss.

Donnerstag Mai 28, 2009

HTML Inhalte säubern

Ich musste heute eine Lösung finden, um sicherzustellen, dass Inhalte, die auf öffentlich zugänglichen Formularen erstellt werden und dann öffentlich sichtbar werden nicht irgendwelche bösen HTML Hacks enthalten können. Zunächst wollte ich die Sache 'händisch' mit ein paar Regexen lösen, aber dann dacht ich mir, dass das ineffizient wäre und habe ein bisschen recherchiert. Dabei sind mir zwei Tools in die Hände gefallen:

Ich habe mich dann für NekoHtml entschieden, weil es mir vertraut war und folgende Utility Methode kreiert (Vorsicht - ist noch nicht wirklich ordentlich) getestet)
	import java.io.StringReader;
	import java.io.StringWriter;

	import javax.xml.transform.OutputKeys;
	import javax.xml.transform.Transformer;
	import javax.xml.transform.TransformerFactory;
	import javax.xml.transform.dom.DOMSource;
	import javax.xml.transform.stream.StreamResult;

	import org.apache.html.dom.HTMLDocumentImpl;
	import org.apache.xerces.xni.parser.XMLDocumentFilter;
	import org.cyberneko.html.filters.ElementRemover;
	import org.cyberneko.html.parsers.DOMFragmentParser;
	
	import org.w3c.dom.DocumentFragment;	
	import org.w3c.dom.html.HTMLDocument;
	import org.xml.sax.InputSource;

	org.apache.commons.lang.StringUtils

	…

     /**
	 * cleans up (user provided) string input and makes sure no dangerous markup is left.
	 * 
	 * Allows b, p, br, i, ol, ul,  li, a (with href) and img (with srw, width, height, title) tags
	 * All others are removed with their text content left intact
	 * 
	 * script and iframe tags are removed entirely (including textual content
	 * 
	 * @param input
	 * @return
	 */
	public static String cleanupHtmlFragment(String input) {
		if(isBlank(input))
			return "";
		
		try {
			// create element remover filter
			ElementRemover remover = new ElementRemover();

			// set which elements to accept
			remover.acceptElement("b", null);
			remover.acceptElement("p", null);
			remover.acceptElement("br", null);
			remover.acceptElement("i", null);
			remover.acceptElement("ol", null);
			remover.acceptElement("ul", null);
			remover.acceptElement("li", null);
			remover.acceptElement("a", new String[] { "href", "title" });
			remover.acceptElement("img", new String[] { "src", "width",	"height", "title" });

			// completely remove script and iframe elements
			remover.removeElement("iframe");
			remover.removeElement("script");

			// create writer filter
			org.cyberneko.html.filters.Writer writer = new org.cyberneko.html.filters.Writer();

			// setup filter chain
			XMLDocumentFilter[] filters = { remover, writer, };

			DOMFragmentParser parser = new DOMFragmentParser();
			parser.setProperty("http://cyberneko.org/html/properties/filters",filters);
			parser.setProperty("http://cyberneko.org/html/properties/default-encoding","UTF-8");
			parser.setProperty("http://cyberneko.org/html/properties/names/elems","lower");
			parser.setProperty("http://cyberneko.org/html/properties/doctype/pubid","-//W3C//DTD XHTML 1.0 Transitional//EN");

			HTMLDocument document = new HTMLDocumentImpl();

			DocumentFragment fragment = document.createDocumentFragment();
			parser.parse(new InputSource(new StringReader(input)), fragment);

			Transformer transformer = TransformerFactory.newInstance().newTransformer();
			transformer.setOutputProperty(OutputKeys.INDENT, "yes");
			transformer.setOutputProperty(OutputKeys.OMIT_XML_DECLARATION,"yes");

			StreamResult result = new StreamResult(new StringWriter());
			DOMSource source = new DOMSource(fragment);
			transformer.transform(source, result);
			return result.getWriter().toString();
		} catch (Exception e) {
			log.warn("Couldn't parse string fragment with nekohtml", e);
			return "";
		}
	}

Maven rocks

Ich gestehe: Ich bin ein grosser Maven Fan. Maven ist eines jener Tools, die mein (Entwickler-)Leben wirklich einfacher und besser gemacht haben. Ich habe mich am Anfang schwer getan mit der dynamischen Natur von Maven fertig zu werden (am Anfang ist es schwierig zu verstehen, warum es kein Äquivalent zu 'ant -projecthelp' geben kann) - aber mit der Zeit will man maven nicht mehr missen - zu gut ist das dependency management (vor allen im Zusammenspiel mit dem vorzüglichen Nexus Repository) und der Komfort vollautomatischer Releases mit dem release plugin. Letztens habe ich ein Kommandozeilen Tool gebaut und mich gewundert, wie ich alle Dependencies komfortabel gepackt bekomme- und siehe da - es gibt mehrere Möglichkeiten:

  • Das Assembly Plugin generiert ein Über-Jar, welches alles Klassen aller Dependencies erhält und sich dann sehr simpel über java -jar starten lässt (wenn man die entsprechenden Mainfest Einträge vornimmt). Nachteil: sollte ich viele Dependencies haben gestaltet sich das Deployment beim Vorhandensein dünner Uplinks langwierig.
  • Das Dependency Plugin kopiert mir alle Dependency Jars nach target/dependency. Diese kann ich dann bequem in ein lib Verzeichnis o.ä. kopieren
Tag gerettet - und ich wollte schon anfangen mir die jars mühsam von Hand aus ~/.m2/repository/… zu kopieren.

Montag Mai 25, 2009

Google Streetview car

Gerade hat einen Google Streetview Kamera Auto vor meinem Bürofenster geparkt (Kennzeichen GG-HH 2257). Es fühlt sich immer noch irgendwie komisch an wenn die virtuelle Welt in die reale vorstösst. Karte

Donnerstag Apr 02, 2009

Dissonanzen mit der Exekutive

Heute morgen, auf dem Weg in den Kindergarten, sind meine Tochter und ich von einem Polizisten zurechtgewiesen worden, der dafür extra seinen Wagen wendete und 200 Meter zurückfuhr, weil wir auf dem falschen Gehweg fuhren. Der Herr hielt mir vor, ich würde meine Kind zu gefährlichem Verkehrsverhalten erziehen. Nun ja, wir fahren immer so, weil wir so eine gefährliche Strassenquerung vermeiden können. Ich war ganz schön in Rage und habe dem Herrn Polizisten auch gesagt, das ich die Maßregelung völlig unangemessen finde und mir viele andere Dinge einfielen, um die er sich besser kümmern sollte. Allerdings ist mein Ärger schnell verflogen, da er einerseits sehr freundlich war und sich dann auch sehr nett mit meiner Tochter unterhalten hat. Außerdem muss man sich ja manchmal freuen, wenn sich Dritte einmischen und nicht wegschauen - lieber einmal zuviel eingemischt als einmal zuviel weggeschaut.
Was bleibt ist ist zum einen die Frage, ob ich in Zukunft die gefährliche Strasse quere, um dann auf der richtigen Seite zu fahren (so wie ich es ohnehin tue, wenn ich ohne Tochter unterwegs bin) und ein gewisses Gefühl des "ungerecht behandelt seins" - fast täglich erlebe ich Autofahrer, die sich regelwidrig verhalten (teilweise in sehr gefährdender Weise) und fast noch nie habe ich erlebt, dass einer von diesen von der Polizei gemaßregelt worden wäre. Irgendwie fehlt mir da die Verhältnismäßigkeit der Mittel …