Introducing FLV4PHP

This weekend I found some time to work in a new project. In the last post I wrote about streaming video for Flash players, I went thru some different options for that task and one of them was the HTTP progressive download of FLV files.

There is an interesting idea to make seeking possible when using a simple HTTP download (I think Google Video does just this), it has been already mentioned in a few places.

There are some tools already which can inject the needed metadata into an flv file, namely, FLVTool2 and flvmdi. The ffmpeg project can create FLV files and there seem to be already patches to support the creation of metadata entries too.

Over the years I’ve come to love to develop readers for obscure (not standard neither well documented) file formats, it’s quite a challenge to do it but since the file format spec is not public you have a very good excuse when bugs are found :)

So I’ve started to work on a system to serve video (and also audio only) files to Flash based players with support for seeking. The idea is to just limit ourselves to use PHP and Flash technologies, no shell scripts or compiled programs, since they are difficult to get running on most shared hosts.

Right now the FLV analyzing library is finnished and the sample metadata extraction tool can process a 140Mb FLV in just 1.3 seconds on my cheap DreamHost test system.

So take a look at the project page, right now I’m looking for an Action Script guru to help out with the development of the player. I’ve tried to get my brains around Flash but I haven’t got much success :(

By the way, all this work will be used for an upcoming project I’m planning. Am I the only one who think that current web photo gallery systems suck?

URL argument separator

Quite so often a web developer starts thinking what’s wrong with the guys who make the rules of how Internet works. And if we dig a bit in the problem, we can many times see that the ‘non-sense’ is there not because the specs were flawed but because the popular implementations didn’t follow completely the spec.

The ‘non-sense’ I’ve been thinking about lately is why the hell we use the ampersand & to separate the arguments in a URL. My problem with using the & as separator is that in todays world of XML crazyness, the ampersand must be handled with care. It seems that the specs said that any character could be used after the initial query char mark ? but the # which marks the start of the fragment. Actually, in the current specs for URI/URL syntaxis they recommend the semicolon ;

So the decision to choose the ampersand as the defacto standard was taken by the guys who took the lead with the CGI scripts and the dynamic web content generation.

Since the specs recommend the use of the semicolon I thought “ok, no problem, I’ll just use it from now on” with a grin in my face. If you are thinking the same keep reading because it’s not that easy. My server-side language of choice right now is PHP, so I build up a simple test script to check out if the semicolon worked ok, but it didn’t. After consulting php.net for a bit more of information I found the solution, there are two ini directives which define the argument separators, one for input and the other for output (I guess this last one is used to manage the session id without cookies). arg_separator.input and arg_separator.output are the names of the directives. The first one takes a string as argument and every char in that string becomes an argument separator, the second one defines what will PHP use when building URLs as separator.

The bad news is that the default value is just the ampersand &. The good news is that both directives can be defined per directory, so we can put them in our .htaccess in a shared host. If you control the host just edit the php.ini file. Logically we can’t modify them with ini_set() because the URL is already processed when our script starts execution (see below for a solution to this).

php_value arg_separator.input "&;"
php_value arg_separator.output ";"

Note that I left both: the ampersand & and the semicolor ; as valid argument separators, I’m far too used to the ampersand and I don’t want to expend my whole life debugging an application just because I forgot that I decided that the semicolon was a better suit :)

Even if we can’t use the ini_set() function to modify the PHP behaviour for this, we can still make use of a different argument separator by processing the $_GET and $_REQUEST arrays and splitting them by our preffererd argument separator. The following code snippet show do it but please don’t use it in production environments, it’s probably flawed!

<?php
// first join the PHP splitted arguments
$get = '';
foreach ($_GET as $k=>$v) {
          $get.= $k . '=' . $v . ';';
          // remove from the arrays, we'll add the correct ones later
          unset($_GET[$k]);
          unset($_REQUEST[$k]);
}

// split the query string using our desired char
$args = split(';', $get);
// process the query string and rebuild the _GET and _REQUEST array
foreach ($args as $arg) {
          // now separate the key from the value and assign them to the arrays
          list($k,$v) = explode('=', $arg);
          if (! empty($k)) {
              $_GET[ $k ] = $v;
              $_REQUEST[ $k ] = $v;
          }
}

Changing the argument separator does not only help out when working with XML based technologies but also to give your site URLs a geeky look ;)

Doing Ajax? you'll need transactions!

In an embarrassing attempt to bring some more visits to this blog I’ll start to post about buzzwords. After some months of fighting against it I’ve finally understood that there is no point in teaching people that AJAX, Web 2.0 and the like are just silly buzzwords about quite old technologies. So from now on I’ll get into the bandwagon and start using those buzzwords.

Getting into the stuff that matters, lets start with the basics. As you should already know, the web works on top of the HTTP protocol, which by design is based on a request-response paradigm. It’s a proven design which works like a charm for the typical web, where static pages are loaded from an URL, however the Web is evolving to a new kind of pages. The current trend is to make web pages which act like the typical thin client, where the bulk of the operations and domain logic (model) is performed on the server but the application flow (controller) and display (view) is handled on the client side with the help of JavaScript and DOM.

Looking at the whole picture, you’ll need perspective here, AJAX web pages are Web Service Brokers. This exposes a big problem, since we’re sill using server side software focussed (read optimized) on the traditional web, while the client side is moving to support the thin client paradigm. The main trouble here is the design concept of the web where each page is an atomic unit, so we need to take some considerations when developing ajaxified web pages if we don’t want to hurt performance seriously.

Most AJAX frameworks expose some simple examples to demonstrate their functionality. They work ok, they do quite cool things, but looking at them a bit closer there are some grey areas. Besides the popular live-search, update on checkbox click or the plain spellchecker, there is a lot more functionality promised by the AJAX hype. Take for example a datagrid widget, like an Excel spreadsheet, there are some complex processes running on the background when working with it. When editting a cell we need to send the new value to the server, check if it worked ok and fetch any other updates on calculated cells which could have been affected by that change. So we have several weak points (mainly by the inherent network unreliability) which would need checks. In the previous example of the datagrid, imagine that the fetch of the affected cells fails, the application state on the client side and on the server side will be different and this is a major concern when working with thin clients.

To solve this problem we have to use transactions (already popular on the RDBMS world). Transactions will allow us to emulate the web principle that each page/action is atomic. In the case that something goes wrong we can rollback the changes and our application state will be kept the same on both sides.

Extending on this idea, we can even optimize the request-response cycle by using an Unit of Work pattern for example. We compose a list of actions and send it when finished to the server instead of sending individual actions. Network latency is always an issue and the asynchronous nature of AJAX is not used too often in practice, when we make a change we expect an inmediate update. By packing the actions in Units of Work we can optimize the client-server interaction and support transactions quite easily.

Remember that for the tipical RPC the web server needs to spawn (or fork) a new process, load the dynamic language interpretter, parse thousends of lines of code, perform a DB connection and do the actual work. Even when having a properly setup backend (opcode cache, fast-cgi), the overhead of a RPC is huge. Packing actions together seems like a great solution for performance problems.

Como escribir números

Vaya por delante que el tema no está demasiado estandarizado a nivel internacional. Lo que realmente supone un problema a la hora de interpretar números en un mundo globalizado como el actual.

Una cosa si que hay en común, según el Sistema Internacional, no se deben separar los millares, en el caso de que haya de hacerse para mejorar su lectura, se hará mediante el uso de un espacio de no rotura, que viene a ser un poco más estrecho que un espacio normal. Este espacio puede introducirse en un editor de textos como Microsoft Word mediante la combinación Ctrl+Shift+Espacio (Ctrl+Espacio para OpenOffice). El código unicode es U+00A0 y en HTML es el conocido &nbsp.

Para los decimales la cosa es algo más complicada. Hay dos grandes zonas geográficas con dos estilos distintos. La notación anglosajona usa un punto, mientras que en la mayoria del resto de paises se usa la coma.

Así pues todas las siguientes formas son correctas:

  • 123456.78
  • 123 456.78
  • 123456,78
  • 123 456,78

A la hora de implementar una aplicación, donde se deben introducir números, lo mejor es soportar todos los formatos en la entrada de información y normalizarlos a la hora de operar con ellos. Si nuestra aplicación solo operará en una región, podemos limitarnos al uso de ese formato específico, aunque hoy en día, y especialmente en apliaciones web, nunca es bueno suponer que solo será usada en una región límitada.