A thought on fast parsing


Regular parsing process is simply de-serializing entire message to an object. Then this object would be passed to upper layers to process…

The problem with this approach is that we have to parse all fields in the message in advance, which we mostly don’t use. With the exception of displaying the message, any particular layer in  back-end message processing would mostly look at few fields in the message to filter/re-route/discard…

In a performance system,  we should parse on needed basis only. How do we do this? To best explain, I’ll use the below analogy.

Assuming each incoming message is printed on 8×11 paper, the input parsing fields will be printed on a 8×11 transparent film (because needed fields at any particular layer doesn’t change from message to message). We then superimpose the transparent film on the paper and we can extract needed field easily and bypassing all other fields.

At different layers, we need to parse different things; thus, we need to have mutiple transparent films.  Upper layers may need results from lower layers. Careful design of the transparent film is needed to avoid duplicates. Results should be stored in a hash table to quick lookup later by upper layers.

Advertisements
A thought on fast parsing

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s