It's all connected

Somehow

Testing Symfony With Sqlite::memory:

One thing I love about Java is that there are a number of databases that you can load into memory and run functional tests against. This is also an option in PHP thanks to Sqlite.

To do this in Symfony, you must configure the test db like this:

test:doctrine:param:      dsn:"sqlite::memory:"

 

And setup the test to reload the db each time you call it:

<?php require_once dirname(__FILE__).'/../bootstrap/functional.php';$configuration = ProjectConfiguration::getApplicationConfiguration('admin','test', true);new sfDatabaseManager($configuration);Doctrine::createTablesFromModels(dirname(__FILE__).'/../../lib/model');Doctrine::loadData(sfConfig::get('sf_test_dir').'/fixtures');$t = new lime_test(2);

Update:

I had some trouble loading fixtures where the ids could not be autogenerated. In the end I ended up defining a separate function for loading files like these. Here’s the complete bootstrap file I’ve ended up with:

<?php/** This file is part of the symfony package.* (c) Fabien Potencier<fabien.potencier@symfony-project.com>** For the full copyright and license information, please view the LICENSE* file that was distributed with this source code.*/$_test_dir = realpath(dirname(__FILE__).'/..');// configurationrequire_once dirname(__FILE__).'/../../config/ProjectConfiguration.class.php';$configuration = ProjectConfiguration::hasActive() ? ProjectConfiguration::getActive() : new ProjectConfiguration(realpath($_test_dir.'/..'));// autoloader$autoload = sfSimpleAutoload::getInstance(sfConfig::get('sf_cache_dir').'/project_autoload.cache');$autoload->loadConfiguration(sfFinder::type('file')->name('autoload.yml')->in(array(sfConfig::get('sf_symfony_lib_dir').'/config/config',sfConfig::get('sf_config_dir'),)));$autoload->register();// limeinclude $configuration->getSymfonyLibDir().'/vendor/lime/lime.php';//$configuration = ProjectConfiguration::getApplicationConfiguration('admin','test', true);new sfDatabaseManager(ProjectConfiguration::getApplicationConfiguration('admin','test', true));Doctrine::createTablesFromModels(sfConfig::get('sf_root_dir').'/lib/model');Doctrine::loadData(sfConfig::get('sf_test_dir').'/fixtures');/*** Load testdata from an sqldump. This is used when running tests where the* id's are significant (i.e. fylker, kommuner and postnummer)* this function expects a connection to exist* @param $file string path to the file to load*/function loadSqlDump($file) {$sql = file_get_contents($file);if ($sql =='') throw new Exception("No sql found in $file");$conn2 = Doctrine_Manager::connection();$sql = trim($sql);$dbo = $conn2->getDbh();// sqlite doesn't like multiple statements$statements = explode(";", $sql);foreach ($statements as $stmt) {try {$n = $dbo->exec($stmt);} catch (Exception $e) {print"Error loading statement: \n$stmt\n:". $e->getMessage();}}}

Note that I place these DB tests in the unittest directory.

Converting From DBCP to C3P0 With Spring

Just a few quick notes on things to do when converting from using the DBCP connectionpool implementation to using C3P0

You start by changing the dataSourcein your Spring config from:


        class=”org.springframework.jdbc.datasource.DriverManagerDataSource”>

       

       

       

   

To:
 

          class=”com.mchange.v2.c3p0.ComboPooledDataSource”

          destroy-method=”close”>

       com.mysql.jdbc.Driver

       

       

       

       3

       200

       0 

       3

   


1. Name differences:
There are some variable name differences in the xml above. Note:

  • url becomes jdbcUrl
  • driverClassName becomes driverClass
  • username becomes user

2. Some variables must be moved into the sessionFactory configuration
According tothisyou must configure some of the variables into the hibernate config. The article is from 2006 so this might not be true anymore.

I add the properties like this so that they can be in the same config file:
    

       

       

           

               true

               create

          1

           3

         50

               org.hibernate.dialect.MySQLDialect

           

       

Some other notes regarding connection pooling with the above:

 

  1. If you use DBUnit, remember to close the connection so it gets returned to the pool after you have set up the db.

 

  1. If you use SessionFactoryUtils to get a session, remember to do SessionFactoryUtils.releaseSession() when done.

Using MySQL for Storing Images

You do not always have the luxury to decide how things are done. This weekend the image service at work decided it was its turn to wreck itself.

The whole story starts of with me rewriting a part of the image service - the component that inserts an image into the service.

The service has two parts: One that saves images to the datastore and one that retrieves images from the datastore and delivers them to some client, usually as a thumbnail. This should be fairly simple stuff that should never go wrong and usually just work. The problem was in the chosen datastore: MySQL.

MySQL is good for many things, but far too often have I seen it used as a hammer where a screwdriver should have been used.

The schema for image database looked like this:

  CREATE TABLE IF NOT EXISTS \`image\` (  
      \`objectid\` int(11) NOT NULL default'0',  
      \`imagedata\` longblob,  
      \`imagewidth\` int(4) default NULL,  
      \`imageheight\` int(4) default NULL,  
      \`imagemime\` varchar(4) default NULL,  
      \`imageurl\` text,  
      PRIMARY KEY  (\`objectid\`) ) ENGINE=MyISAM; 

I’ll comment on a few of the fields above. The worst field here is probably the imagemime field. Notice that it is a varchar of max 4 characters. The idea is to only store the specific part of the mimetype, i.e. pjpeg from image/pjpeg. Notice that the author thought that imagetypes only come in names of less than 4 characters (jpeg, png and gif) and thus we would have no problems with the size of the field.

The error caused by the service trying to thumnail pjpe images was what made me look at the image service.

To fix this bug we issued

 "ALTER TABLE \`image\` CHANGE \`imagemime\` \`imagemime\` VARCHAR( 10 )"

, thinking that this should work ok.

Bad idea.

The problem is that because this command should work atomically, MySQL will do all the work in a separate file and then mv the file over the original file when done. What do you think happens when there is too little space left on the disk? MySQL hangs, crashes and burns.

Lesson learned: Always have more space left on the MySQL data partition than the size of the largest table(.MYD) file.

If not, you cannot perform recovery!

Or in my case, change the table.

The next thing I tried was deleting a large amount of the images in the store as they were old and had no value. This again blew up because of too little space.The solution to this mess was to move the whole database over to another partition, run myisamchk and then move it back.

How should this service have been designed?

I am not an expert on designing online storage facilities for images, but one thing that I would like to note is the storing of metadata together with the file - but not in the file itself.

Most image formats for the web store image height and width in the start of the image and you can use metadata formats like IPTC to store even more information into the image. Storing metadata in the file makes it easier to retrieve the file since you do not need to fetch additional metadata that is already in the file.This would have made it possible to simplify the storage schema. You may want to keep a copy of the metadata in a store to be able to query it, but for most operations it will not be needed since you want the file at the same time anyway.

Storing the files in the db is also a bad idea (IMHO). It may well be faster to serve binary objects from a database, but this comes at a price. The service is harder to backup, dependent on an extra component (the db) and thus much more complicated.There are two designs I’ve considered for a simple image service.

Either build a tiny layer aroundMogile FSor useGlusterFSto replicate the filesystem where the files are stored.

Note: I would probably have done the same mistake myself if I got asked to create a system like this a year ago. I just hope someone reads this in time to not repeat the mistakes.

Attendum about ActiveMQ: Sometimes you do not have the luxury to keep every service separated on different machines. When the disk filled up due to the MySQL problems above I also learned that if you run Active MQ with the Kaha persistent storage db then you should keep it on a separate partition - or else loose all messages when you run out of disk space.

Using Pythons Logging Module With Unittest

Problem: setUp() is called for every method but I only want to start logging once. Solution:Add a check on logger.handlers to see if any handlers have been registered with the logger:def startLogging():    logger = logging.getLogger()    if len(logger.handlers):        return logger    stdout = logging.StreamHandler()    stdout.setLevel(logging.DEBUG)    logger.addHandler(stdout)    return loggerclass MyTest(unittest.TestCase):     def setUp(self):         startLogging()    def test_something(self):        some_function_that_uses_logging()

TransactionManagerLookupFactory - No TransactionManagerLookup Configured

I spent the best part of yesterday afternoon fighting the message above. I post this so that the next person fighting it may save some time. One of the problems was that none ofthesolutionssuggestedhelped. After some beer, sleep and a fresh start I foundthis messagethat talks about replacing

<tx:annotation-driven transaction-manager="transactionManager" />

With

<code><bean class="org.springframework.transaction.aspectj.AnnotationTransactionAspect"factory-method="aspectOf"><property name="transactionManager" ref="transactionManager"></property></bean></code>

And suddenly it all worked. Another example of GDD - Google Driven Development. If anyone knows why this works - please explain.

 

Update: There seems to be an issue when it comes to combining @Confgurable and @Transactional in a project that relates to the comment above. Also, I later experienced that doing sessionFactory.getCurrentsession() is not a good idea when using spring transactions. Instead, use

 

Update 2:None of the examples of Transactional tests using spring show it, but you’ll need the TransactionalTestExecutionListener to get @AfterTransaction and @BeforeTransaction to work. Det it up like this:

@TestExecutionListeners( { TransactionalTestExecutionListener.class })
 
 

Umount: Device Is Busy

#umount /var<br /><br />umount: device is busy<br />

Who hasn’t experienced this one?

Anyhow, I foundthisshellscript that does the important checks:

!/bin/sh

Attempts to umount a mountpoint or device. If it fails, give info on why it might be busy.

/bin/umount $ || (
    if [ -z”$
” ]
    then
       exit
    fi
    echo `pwd`/”$1” | sed’s/.*\/\//\//g’| ( read abspath
   
    mtab_entry=`grep”$abspath”< /etc/mtab`
    echo $mtab_entry | if grep”/”> /dev/null
    then
       device=`echo $mtab_entry | sed”s/ .$//g”`
       mountpoint=`echo $mtab_entry | sed”s/[^ ]
//” | sed”s/ .*$//g”`
       #echo $device $mountpoint

       ( echo The mountpoint $mountpoint is in use by:
        lsof | grep” $mountpoint” | head     #list files open
           cat /etc/mtab | grep” $mountpoint/”  #list sub mount points
       ) 1>&2
    fi
))

 

Securing the Web: PHP CMSs Should Autodestruct

I’ve been using Drupal a bit lately for some simple sites and just happened to come overthisdebate on the relative security between Drupal and Plone. I think we need to go beond looking at the number of Security advisories and start discussing how we protect vulnerable versions of software. The problem with the larger PHP CMS’es (Drupal, Joomla etc) isn’t that they do not care about security - but that that their users do not.One thing I’ve noticed is that there are a lot of simple sites out there that get hacked because they do not upgrade their CMS’es to the latest version. Often the hacker keeps the site up so that it is impossible to notice that that the account is being used for all kinds of spamming - for example by adding a bunch of”invisible”links at the bottom of the site. I’ve tried to contact some ISP’s that hosts sites like this (Servetheworld amongst others), but their reply is that they cannot disable a customers account or clean up the customers site just because it is hacked. IMHO we need a way to protect users against themselves here. A simple protocol that checks the version of the running CMS against a registry somewhere and then either shuts down the CMS and/or notifies the user.At least 50% of the people running small sites do not know enough about the security to keep their site up to date. We need to provide protection for them.

MYSQL Datatruncation Errors

Few things can provide as much frustration as refactoring a legacy system bit by bit. Today I have been wrestling with a field defined in MySQL as”timestamp default’0000-00-00 00:00:00’. In my (reveng’ed) POJO the field is defined like this:

@Temporal(TemporalType.TIMESTAMP)    @Column(name ="httpstatus_time", nullable = false, length = 19,
    columnDefinition="timestamp default'0000-00-00 00:00:00'")    
    public Date getHttpstatusTime() {       
      return this.httpstatusTime;
    }

Now, it turns out that JDBC cannot deal with date values that are before UNIX epoch, something 0000-00-00 clearly is. So, what does JDBC do in these situations? It will throw an exception. To get around this, you can add’zeroDateTimeBehavior=convertToNull’to your jdbc url. After setting zeroDateTimeBehaviour, the value will be set to a Javanull. This provides you with another issue: When you get an object from the db, and you modify it, you may have to also modify some other values and set them to something other than null.

Note: I tried setting the dates to”new Date(1)” but that didn’t work - it is too close to 0 so JDBC still throws a fit. I ended up with”new Date(10000)”.

If you do not check your null’s, you’ll get this: (I’ve omitted parts of the stacktrace for brevity)

org.springframework.dao.DataIntegrityViolationException: not-null property references a null or transient value: com.scanmine.scheduler.server.model.Source.httpstatusTime;
 nested exception is org.hibernate.PropertyValueException: not-null property references a null or transient value: com.example.Myclass.httpstatusTime    at org.springframework.orm.hibernate3.SessionFactoryUtils.convertHibernateAccessException...Caused by: org.hibernate.PropertyValueException: not-null property references a null or transient value:com.example.Myclass.httpstatusTime...Caused by: org.hibernate.PropertyValueException: not-null property references anull or transient value: com.example.Myclass.httpstatusTime

Not fun.Bonus tipIf you use DBUnit to test your legacy db and want to add values lik’0000-00-00 00:00:00’you will hit the same bug as described above. To get around it, add jdbcCompliantTruncation=false to the connection url.

 

Links

http://bugs.mysql.com/bug.php?id=3331

 

TransactionManagerLookupFactory - No TransactionManagerLookup Configured

I spent the best part of yesterday afternoon fighting the message above. I post this so that the next person fighting it may save some time.
 
One of the problems was that none ofthesolutionssuggestedhelped. After some beer, sleep and a fresh start I foundthis messagethat talks about replacing
 
<tx:annotation-driven transaction-manager=”transactionManager” />
With“


factory-method=”aspectOf”>


And suddenly it all worked. Another example of GDD - Google Driven Development.
 
If anyone knows why this works - please explain.
 
 

Monitoring Twisted Applications With Nagios

This post is inspired by a question on theTwisted mailinglist by Michele:

I'm currently working on a PoC with twisted, Python, to prove the<span class="moz-txt-citetags"></span>technology as an alternative to more<span class="moz-txt-citetags"></span>established enterprise choices (java app servers, etc..).<span class="moz-txt-citetags"></span>the question is: if I have N number of processes running in a M number<span class="moz-txt-citetags"></span>of machines, given that there are no network restriction,<span class="moz-txt-citetags"></span>and that at least http and hhtps are always available, how these<span class="moz-txt-citetags"></span>services would be efficiently monitored?<br />

What I do is to have the service listen to a http connection that returns some monitoring data. The service returns a simple xml message containing some datapoints.

Then I use a custom Nagios plugin to request this url and Nagios-PNP to make graphs of the datapoints the monitor produces. It it doesn’t return anything I know the service is down.

The plugin setup is done by providing an url (sayhttp://myservice:893/monitor) to monitor. The plugin will then both monitor the application and graph it.

I have also added an extra field for”errors”that I use to report odd exceptions or other types of failures that are non-fatal but should be investigated. If the error count exceeds a set level the may also go into the reporting.

The twisted page
The page looks like this:


 1

 2

 3from cStringIOimport StringIO

 4from nevowimport loaders, rend, static, inevow, guard, url, tags

 5from xml.etreeimport cElementTreeas ET

 6

 7class Monitor(rend.Page):

 8    ”””

 9    Basic monitoring interface 

10    XML format:

11    

12     

13    min=”0”max=”300”/>

14    

15        Exception

16        …

17    

18    

19    UOM (unit of measurement) is one of:

20      no unit specified - assume a number (int or float) of things (eg, users, processes, load averages)

21      s - seconds (also us, ms)

22      % - percentage

23      B - bytes (also KB, MB, TB)

24      c - a continous counter (such as bytes transmitted on an interface)

25

26    ”“”

27    def init(self, config):

28        self.isLeaf =True

29        self.config = config

30

31    def renderHTTP(self, ctx):

32        inevow.IRequest(ctx).setHeader(‘Content-Type’,‘text/xml; charset=UTF-8’)

33

34        #que_length = str(len(getter.get_ids(0, -1)))

35        #num_updated = str(get_nr_updated_last_day(self.config.get_db()))

36        num_errors =str(len(self.config.getErrors()))

37                

38        _root = ET.Element(‘status’, {‘service’ :‘PDFIndexer’})

39        if ‘total’ in self.config.stats:        

40            _doc = ET.SubElement(_root,‘total’, {‘value’:str(self.config.stats[‘total’]) })

41        

42        _doc = ET.SubElement(_root,‘runnerStatus’, {‘value’:str(self.config.checksStatus()) })

43        #doc = ET.SubElement(root,’itemsAddedlast24’, {‘value’: num_updated})

44        _errors = ET.SubElement(_root,‘errors’, {‘value’: num_errors,‘critical’:“3” ,‘warning’:“2”})

45        

46        for errorin self.config.getErrors():

47            t = ET.SubElement(_errors,‘error’, text = error)

48        

49        _xmlcontainer = StringIO()

50        ET.ElementTree(_root).write(_xmlcontainer, encoding=“UTF-8”)

51        return _xmlcontainer.getvalue()


The Nagios plugin is foundhereand should be fairly self explanatory.

Note: I haven’t managed to get all the bugs out of it yet with regard to graphing differend datapoints. Also, it is probably not the most efficient Nagios plugin around.