Roolastic - Part Four
Since the last part I've added Spring security to Roolastic (it's not really used yet) but it helped me understand a bit more about Roo. So while it is still early days, I'd like to come up with some general notes and opinions on Roo and ElasticSearch.
Notes
Database
I've chosen to use H2 as it comes with a handy little web console. If you start that (as I do here) you can then point your browser to http://localhost:8082/and have a look at and modify what's in your db - neat!
IDE
I've been using STS and it's nice enough even though a bit lacking in the responsiveness department. If you use 'plain' Eclipse make sure you have a least m2eclipse installed (and you probably want to have the AspectJ plugin as well). Also if you use any of those - do yourself a favor and rightclick on the 'target' folder in Navigator view and check the Derived box in Properties to prevent files in target from popping up when you do an 'Open Resource...'
Serialization
I've been using Jackson to serialise my entities to JSON which I then feed into ElasticSearch and I really like it. What's a bit annoying is that Jackson serialises some fields named 'ajc$interField$...' which AspectJ is responsible for and which are not needed really. The only way to turn that off that I found was to specify every single of these fields in the @JsonIgnoreProperties annotation which is a bit painful - some pattern matching would be helpful here
Opinion
Spring Roo
What I like
- You must love the speed with which you can setup a fully functional webapp and start toying with your ideas. And once you accept the AspectJ stuff it is actually all fairly transparent
- The scaffolding is cool. I think it was one of the most envied feature of Ruby on Rails in the Java community. While I think it isn't really that important in a project in the long run (because you will deviate from the generated stuff) it is really useful at the beginning of a new project.
- I really like the whole 'shell/console' idea. It comes natural (to me at least) and feels very lightweight
- I know that Maven had to take quite a lot of criticism recently, but I like it and I like the fact that it standardises the project layout and so I'm quite happy that Roo makes use of Maven as well (even though there is talk of switching to ivy)
- I'm also very happy with Tiles being used in the view layer. I think it's quite a clean and clever way to keep your views mean and lean
What I'm not sure about
- My main gripe is with JSP actually. I never really got into JSPs and have always preferred Velocity and Freemarker - so getting Freemarker as an alternative view technology would be perfection. I'm also at odds with the way javascript is woven into the scaffolded views. The fact that the script tags get sprinkled all across the file rather than in one place in some sort of document.ready listener is not to my liking (even though I can imagine the generation would get more complicated that way)
- I also have the feeling that the aspectJ weaving adds a couple of seconds to the app startup time (having embedded database, JMS and ElasticSearch doesn't help there either) and a couple of seconds for every restart quickly add up over a days work - but I guess that's the price you pay (and it can serve as an excuse to get a new MacBook)
Overall Roo is growing on me quickly and I think it will become a permanent part of my toolbox. It gives you part of that RAD feeling that Rails and Grails are all about while letting you stay in beloved Java land and also feeling more transparent (to me at least - I was frightened by Grails' stacktraces)
ElasticSearch
While I haven't spent as many words on ES as on Roo, I really think ES rocks! Even though it's such a new project, it already feels really very useful. The flexibility it gives you is great, the effortless clustering is a boon and the JSON driven interface makes it easy to intergrate with almost anything (modern) you care to think of. The fact that you don't need to define a schema really makes it so easy to get started with (unlike Solr - which I never the less also really like). And it's getting new features almost on a daily basis. I will keep an eye on it and hopefully find the time to advance my little project over the coming months while blogging about my findings. (I especially need to research the facetting capabilities)
Posted at 05:06PM Mrz 25, 2010 by joerg in Allgemein |
Roolastic - Part Three
Now that we have our little webapp running and we can add images to it and they get indexed, we need a way to query the index.
So we quickly create a SearchController by running
then we add a little search form to the index view, create a new view for the results (don't forget to adapt the tiles config in views.xml) and implement the controller logic for stuffing the result in our model like here.
controller class --class ~.web.SearchController
I've done a couple of things here:
- I put a JSON expression into the search field to give you a head start
- I implemented two different controller methods linked to the two different search buttons. One just stuffs the json coming back from elasticsearch into the model and one deserialises the returned results back into entity instances and stuffs those into the model
Posted at 01:35PM Mrz 25, 2010 by joerg in Allgemein |
Roolastic - Part Two
Now that we have the basics running let's have a look at some interesting bits of code (have a look at the roo.log to see how I build what is there so far using roo). Now the only domain object we have so far is Image. This is the object that holds the data we want to persist for each uploaded image and it's simple enough to understand. The two interesting properties of this class' source are that there are no bean style getters and other boilerplate code, as Roo has generated that and 'outsourced' it into AspectJ .aj files (like this). That is one of Roos main design decisions. The other notable thing here is the @JsonIgnoreProperties annotation - but we get to that later
CRUD operation for this object are handled by the ImageController which I created by first running
in the Roo Console and then copying the methods that Roo generated in the .aj file directly into my Java class (which required a bit of search and replace to get it working). Here is my first gripe with Roo: If you want to override a generated controller method, Roo gets rid of the whole .aj file, which is inconvenient - I might have overlooked something here but that's what it looked like to me.
controller scaffold --class ~.web.ImageController --entity ~.model.Image
Now my controller still basically does what the autogenerated controller did, it just adds a bit of code for the file upload handling (for which to work I obviously had to change the form definition in the form view and add the 'enctype="multipart/form-data"'. That code also extracts the image metadata using the metadata extractor library.
OK, so where does elasticsearch come in now? The answer is: asynchronously, but almost in realtime ;-)
In a recent project I had some problems (which I still don't fully understand) with Compass (which is another project by the creator of elasticsearch), whereby the in-transaction indexing of domain objects was causing issues which in turn caused my database transactions to roll back - pants. Anyway hence my ventures into asychronicity
So how do we do this here? First we register a listener with hibernate (our JPA provider); to do this we add the following to our persistence.xml:
Then in that listener we override the postFlush method, which is the point in time when in the persistence lifecycle we already have a primary key set for our model object (which we want to know when we add it to the elastic search index). We then take our model objects/entities and pass them on to IndexService (to the static method 'index', as I can't think of a good way on how to have an IndexService bean instance injected into the HibernateInterceptor).
<property name="hibernate.ejb.interceptor" value="de.woerd.blogs.roolastic.persistence.HibernateInterceptor"/>
Now another technology comes into play. To really be asynchronous we use JMS to decouple the indexing from the persisting. so IndexService just stuffs the content to be indexed into a JMS queue which we setup by simply running
in the Roo shell. We then create a listener that listens on that queue by doing this (Roo shell again)
jms setup --provider ACTIVEMQ_IN_MEMORY
That class IndexerSlave then uses ElasticSearchTemplate to stuff the content into the index. ElasticSearchTemplate is supposed to be the equivalent to Springs Jdbctemplate, JmsTempate etc. It should provide a simple API to interact with ElasticSearch. It currently uses ElasticSearch' TransportClient but Shay Banon advises on rather using the embedded server client (which I'll try soon).
OK that's indexing done. In the next part we look into running searches against the index.
jms listener class --class ~.search.IndexerSlave
Posted at 05:31PM Mrz 24, 2010 by joerg in Allgemein | Kommentare[5]
Roolastic - Part One
Roolastic is a little research project by me. It is a project using Spring Roo and integrating ElasticSearch for full text search. It is motivated by my both in web frameworks and in everything that is happening in the full text search and especially in the Lucene world. While it is interesting to read about emerging technologies, nothing beats hands-on experience with a framework or project, so I came up with Roolastic. I hope it will serve me as a playground for both Roo experiments and getting to grips with ElasticSearch.
Objectives
A webapp that lets users upload images and search for images by their user provided properties and extracted metadata. This is admittedly rather primitive but I think it's useful enough to try out quite a lot of things and get a feeling for the quality, usefulness and 'taste' of the tools
Prerequisites
You need
- a recent version of Java obviously
- a working version of maven
- a Roo Installation or Spring Tools Suite (which includes Roo)
- ElasticSearch
Get up and running
It would probably be useful to read up on Roo as I don't intend to make this yet another Roo tutorial - my aim is more to review it.
You should then get a recent copy of ElasticSearch (I'm working with the latest development sources, checking them out with git). Once you have got the sources, did your first successful build with gradle and verified that it's working, do a
to get a snapshot into your local maven cache (0.6.0-SNAPSHOT at the time of writing). You need to do this to satisfy Roolastics elasticsearch dependency.
./gradlew elasticsearch:install
Then start up elasticsearch by doing
(UPDATE: that's not required anymore as I changed to using an embedded ElasticSearch node)
Once elasticsearch is up and running you can then go to your Roolastic source directory and do
cd 'directory where your elasticsearch sources are'
cd build/distributions/exploded
bin/elasticsearch -f
Now if you point your browser to http://localhost:8080/roolastic/ you should see something like this: Voilà
mvn compile
mvn tomcat:run (OR) mvn jetty:run
Posted at 04:31PM Mrz 24, 2010 by joerg in Allgemein | Kommentare[4]