Shayan's Software & Technology

My adventure as a Software Engineer continues..,

Software Innovation Strategy: Identify Weak Business Systems


From Uber, Airbnb to Shopify, Software companies have freed up access to key services and business opportunities that were otherwise constrained and difficult to manage.

This trend does not completely involve technological innovation. The aforementioned apps are among many that use old technology. Client server model, REST API’s, a database technology and some GPS location services.

The innovation isn’t as technological as it is very fundamentally a redesign of old cumbersome business systems.



Aside from the GPS technology and the obvious underlying match making algorithms that allow users and drivers to link up. The underlying business system this app effectively overtakes: The conventional Taxi system.

Under the old system it is very time-consuming and expensive to become and stay a taxi driver. A taxi license in Toronto at the time of writing this can cost more than 100k+. This barrier alone adds stress to the taxi driver who is a key actor in the business, and it is very exploitative and prevents the underlying business to be efficient. The cost must be passed on to the consumer and both parties end up dissatisfied.

Imagine having to pay an amount close to a low-end condo unit just to share your vehicle and make money on the side. It is absurd. The driver in this case has to charge more to cover the cost of the license and breaking even becomes a priority.

Uber resolves this by allowing the ride sharing system to be more efficient than the taxi system by making it accessible both to the service provider the driver and the consumer in a cost-effective manner.

The consumer has greater accessibility through the app relative to their location. This component can be easily copied by the taxi companies and it definitely has. What the Taxi companies cannot copy is the quality of the ride and the pricing structure. Uber facilitates the driver with how it charges consumers for rides. Uber facilitates the consumer in terms of quality by ensuring the vehicle is decent at a minimum and with allowing drivers to be rated.

These practices allow for both parties to be satisfied and the overall system can be used with greater access and efficiency. Again the technology isn’t very new but the access to the system is greatly increased at an affordable price.


Hotels and resorts are largely very big buildings with limited space and a high cost to build and maintain. Houses and condos usually do not take as much capital to maintain for individuals. However, both can provide similar experiences in terms of location.

In real estate, location is the central component. Big clunky hotels cannot be everywhere and being around other tourists may not yield an authentic experience. A full hotel or resort can be a nightmare for tourists!

Therefore a system that would be pleasing to both parties would be a piece of real estate that is not crowded, affordable and financially profitable for the renter. Enter Airbnb.

Airbnb solves this problem by allowing a user to rent out their property to other users. Although the location could be in a crowded area the price is usually lower than the equivalent at a hotel. Also because the app has a wider reach it can access more locations and provide access to anyone trying to rent out their unique property.

This access does not limit the user to a huge resort or hotel. It allows them a degree of privacy depending on the location at a much lower price. For the renter they get to list their unique property to tourists without charging excess fees profitably or having to deal with tour operators and any other unnecessary sub systems.

The fact that you can access or sell access to a location simply is the main business driver for this innovation.


Starting a business can be cumbersome. If you don’t know web development it can be a nightmare. Even if you know software development. Why would you waste your time coding a shopping cart, authentication, inventory management, payment gateway integration yourself? Not to mention testing for quality and following good development practices for success.

Shopify allows access to the online market. That simple. You can set up a website and set up what payment channels you accept. The overhead of building your own site or hiring an intern to do a sloppy job doesn’t exit. The fast access that this service provides the merchant to consumer and consumer to merchant is what makes Shopify successful at a fundamental level.

The best way for a business to succeed early on is to be accessible online as fast as possible. Shopify gets the merchant online and makes them available. This effectively reduces the break even time and allows consumers to access a potentially innovative product faster than they would in the past. Another scenario where everybody wins!

What do all of these software companies have in common?

Aside from technological innovation these firms have intrinsically found a vulnerability in an old and inefficient business system and have provided more efficient access to that system via their own software. Whether that is a ride, a rental property, or selling nick knacks online.

Innovation opportunities

When deciding on what to build or where to innovate. Look at existing business systems and ask yourself: What’s annoying to use? What should be available faster in terms of product or service? Is there unnecessary red tape in the way? Is the middle-man truly justified?

Seller’s perspective: If I want to sell a product or provide a service what types of unnecessary constraints or obstacles are places in front of me? Can they be removed, replaced or ignored if out of date?

Consumers perspective: why can’t I purchase something on time? Why is it so expensive? Does the service have everything I need? Can I get help or information efficiently? Can I use the product efficiently? Availability?

The benefit of identifying a weak system is to exploit its vulnerability. Once you have a target. Building software to cover the problem will allow you to get ahead and the weak system will effectively fund the new more innovative system by granting more efficient access.

This approach in my opinion allows the developers to steer clear from gimmicky apps and it provides a strong business baseline in terms of user roles: customer and seller. The end goal is more efficient access. The domain and the domain specific implementation details and business rules can be hammered out during development process quite easily since an inefficient solution already exists. Most of that logic can be reused and tweaked fundamentally.

Each of the above apps have done this to great success and it leads to rapid growth. Technological and business innovation in tandem can be a very disruptive and high growth ordeal.

Most Software developers need grounding(Myself included) when it comes to coming up with cool ideas to work and I think this strategy can provide some of that.

After such a process has occurred the innovation can focus on the technological side such as using machine learning, AI, internet of things, drones, virtual reality, augmented reality and more. But those are other things among many that I hope to learn in the future!


What little habits made you a better software engineer?

I remember when I used to do this to a lesser extent. need to start it up again!

Answer by Jeff Nelson:

One habit I've clung to is writing small prototypes when I'm trying to learn new concepts.

For example, I'll sit down with a book or a web page, and over the course of a few hours, write 30 or 40 programs all of them only a few dozen lines long.  Each program intended to demonstrate some simple concept.

This prototyping makes it very easy to try out many concepts in a short period of time.

View Answer on Quora

Web Development: Safely Loading JQuery Libary from a Content Delivery Network (CDN) using JavaScript


Content Delivery Networks or CDNs are quite popular for loading third party resources, Often times firms are reluctant to trust these CDNs because they may not be reliable or may go down leading to a poor user experience. Although there is potential for this to be true we will discuss ways to mitigate that circumstance via some JavaScript code.

Loading JQuery

You may be thinking: I know how to load JQuery it’s quite simple using Google’s CDN:

<script src=

Yes, this is the correct way of using a CDN. But if you take this to your manager, he will say something along the lines of:

Why should we trust this CDN? What if it goes down somehow how will the user get the client-side experience they truly deserve?!

Your Reply

As a programmer, we don’t have all of the answers is to why something is beneficial. We know from reading about CDN’s that they are beneficial because they minify the library and the fact that the library data is coming externally. These are the basic benefits. For a more comprehensive answer basic research would lead us to the following stackoverflow article:

One of the most important considerations is the fact that the library among other CDN hosted libraries may already be cached on the browser of the user visiting your site. This may be due to the fact that the user visited another website that utilizes the CDN you are using.

This approach also cuts down on the bandwidth traffic to your site. If someone else is hosting a library for you, take advantage of it.

In all honesty, it is not likely that a CDN hosted by Google will be down for a long time. But it is possible to mitigate the 1% likelihood of this situation occurring.

Still Not Convinced

Well what do we do if we want to run the application on our local environments and we are somehow not connected to the internet?

This is when you take your JavaScript skills to the man:

<script src=


if (!window.jQuery) {
   var localJS = document.createElement('script');
   localJS.src = '/localpath/jquery.min.js';

How it works:

if JQuery was loaded successfully, it will populate the window.JQuery property. This property has a variety of other uses but we are using it to see if jQuery is loaded. If it isn’t, load the local minified copy. The local copy is loaded into a new script tag and is sent to the browser. In order to load the library dynamically we need to insert the HTML element <script> within the <head> element of the web page. This is done through the Document Object Model(DOM) by appending the built <script> element to the child of the <head> element.

Notice how I don’t need to append the type=”text/javascript” to the <script > element. This is because in HTML5 this is not needed and is applied to all <script> elements by default.

This way, your boss can have the satisfaction of knowing that when all systems are down. The user has the best user experience possible in terms of access to libraries!

Parallel Programming: CImg Open-Source Library using nVidia CUDA 5.0 Toolkit on GPU


In the last parallel programming blog post. I analyzed and profiled the filled triangles routine and determined that I could do some tasks in the routine in parallel. This blog post is the log of the measures I took to achieve this goal of parallelizing the routine. It was not an easy task but with a little help from the community I was able to overcome obstacles. It is now time to give back to the community by publishing my findings, trials and struggles.


In order to implement parallel programming concepts in present day using nVidia CUDA 5.0 Toolkit. There must be a CUDA enabled nVidia graphics card installed on your machine as well as the CUDA Toolkit.

Not sure if you have any of these? Find out here:

My GPU has a compute capability of 1.2 (not that good, current max is 3.0) but good enough!

This means that I can run about 512 threads concurrently per block I allocate on the device itself.

Initializing Parallel Arrays Simultaneously using the GPU

Let’s look at the code that we are going to run concurrently on the GPU:

* Define random properties (pos, size, colors, ..)
*for all triangles that will be displayed.
float posx[100], posy[100],
      rayon[100], angle[100],
      veloc[100], opacity[100];
 unsigned char color[100][3];

for (int k = 0; k<100; ++k) {
     posx[k] = (float)(cimg::rand()*img0.width());
     posy[k] = (float)(cimg::rand()*img0.height());
     rayon[k] = (float)(10 + cimg::rand()*50);
     angle[k] = (float)(cimg::rand()*360);
     veloc[k] = (float)(cimg::rand()*20 - 10);
     color[k][0] = (unsigned char)(cimg::rand()*255);
     color[k][1] = (unsigned char)(cimg::rand()*255);
     color[k][2] = (unsigned char)(cimg::rand()*255);
     opacity[k] = (float)(0.3 + 1.5*cimg::rand());

What’s wrong with the above code? It seems very standard to populate an array in such a fashion in the software industry. The drawback is that each element for a given array has to be populated one at a time for a maximum of 100 times in this particular case. This is largely due to the fact that this code is executing on a single thread on a CPU (Intel Core i7 1.6 GHz).  What if we could populate each of these arrays all at once concurrently? This would allow us to get the data we need faster without having to wait for the computer to process each element of an array in serial(one at a time).

Can we do this on the CPU? That would involve creating 100 threads and executing them simultaneously on the CPU. My CPU only has 8 cores and 4 multiprocessors. It cannot accomplish this task. The GPU on my nVidia GT 230m has about 500 cores. In our particular case we only need to use 100 of these cores. In order to run this code concurrently, the code must be executed on the device(GPU). nVidia CUDA Toolkit allows for such transfers from device memory to host memory. Let’s look at this technology in action. Here is my code for the above function but in the form of a CUDA Kernel:

 * Setup and initialize curand with a seed
__global__ void initCurand(curandState* state){
int idx = blockIdx.x * blockDim.x + threadIdx.x;
curand_init(100, idx, 0, &state[idx]);

 * CUDA kernel that will execute 100 threads in parallel
 * and will populate these parallel arrays with 100 random numbers
 * array size = 100.

__global__ void initializeArrays
 (float* posx, float* posy,float* rayon, float* veloc,
  float* opacity ,float* angle, unsigned char* color, int height,
  int width, curandState* state, size_t pitch){

  int idx = blockIdx.x * blockDim.x + threadIdx.x;
  curandState localState = state[idx];

  posx[idx] = (float)(curand_normal(&localState)*width);
  posy[idx] = (float)(curand_normal(&localState)*height);
  rayon[idx] = (float)(10 + curand_normal(&localState)*50);
  angle[idx] = (float)(curand_normal(&localState)*360);
  veloc[idx] = (float)(curand_uniform(&localState)*20 - 10);
  color[idx*pitch] = (unsigned char)(curand_normal(&localState)*255);
  color[(idx*pitch)+1] = (unsigned char)(curand_normal(&localState)*255);
  color[(idx*pitch)+2] = (unsigned char)(curand_normal(&localState)*255);
  opacity[idx] = (float)(0.3f + 1.5f *curand_normal(&localState));


Upon analyzing the two code examples above, you may be wondering why the cimg::rand() function did not get carried over to the function executing concurrently. The answer is that the above function: initializeArrays is known as a kernel function. A kernel function is a device function and can only execute other variables and functions that are allocated on device memory. Therefore, I needed to use device functions to calculate random numbers. These particular device functions are from the CURAND API which is a library that allows random number generation on the device and host.

nVidia maintains many libraries that execute on the device and these can be found on its website. Notable examples include: CUBLAS, CURAND, and Thrust. I have had the opportunity to work with all of them. In addition, the programmer can write their own device functions by prefixing the function header with the __device__ or __global__  for a kernel. Alternatively, a function that executes on the host can be prefixed with the __host__ prefix. It should be noted that the API’s nVidia maintains are optimized, so it is better to use them if necessary.

Another question you may have is: how did I manage to get those arrays on to device memory from the host? The following code will answer these questions by including calls to cudaMemcpy and calling the kernel function above:

// check for any errors returned by CUDA API functions.
void errCheck(cudaError_t err, const char* msg){
if (err != cudaSuccess)
std::cout<< msg << ": " << cudaGetErrorString(err) << std::endl;

// Define the same properties but for the device
 float* d_posx, d_posy, d_rayon, d_angle,
 unsigned char* d_color;

// CURAND state
 curandState* devState;

// allocate memory on the device for the device arrays,
// check for errors on each call
 err = cudaMalloc((void**)&d_posx, 100 * sizeof(float));
 errCheck(err, "cudaMalloc((void**)&d_posx, 100 * sizeof(float))");
 err = cudaMalloc((void**)&d_posy, 100 * sizeof(float));
 errCheck(err,"cudaMalloc((void**)&d_posy, 100 * sizeof(float))");
 err = cudaMalloc((void**)&d_rayon, 100 * sizeof(float));
 errCheck(err,"cudaMalloc((void**)&d_rayon, 100 * sizeof(float))");
 err = cudaMalloc((void**)&d_angle, 100 * sizeof(float));
 errCheck(err,"cudaMalloc((void**)&d_angle, 100 * sizeof(float))");
 err = cudaMalloc((void**)&d_veloc, 100 * sizeof(float));
 errCheck(err,"cudaMalloc((void**)&d_veloc, 100 * sizeof(float))");
 err = cudaMalloc((void**)&d_opacity, 100 * sizeof(float));
 errCheck(err,"cudaMalloc((void**)&d_opacity, 100 * sizeof(float))");
 err = cudaMalloc((void**)&devState, 100*sizeof(curandState));
 errCheck(err,"cudaMalloc((void**)&devState, 100*sizeof(curandState))");

size_t pitch;

//allocated the device memory for source array
 err = cudaMallocPitch(&d_color, &pitch, 3 * sizeof(unsigned char),100);
 errCheck(err,"cudaMallocPitch(&d_color, &pitch, 3 * sizeof(unsigned char),100)");

// launch 1 grid of 100 threads
 dim3 dimBlock(100);
 dim3 dimGrid(1);

 /* Kernel call for initializing CURAND */

// synchronize the device and the host

 /*Kernel for initializing Arrays */
 initializeArrays<<<1, 100>>>(d_posx, d_posy, d_rayon, d_veloc, d_opacity,
 d_angle, d_color, img0.height(), img0.width(), devState, pitch);

// synchronize the device and the host

I wrote an error function to ensure each call to a cuda function was not returning any error code. Memory errors are difficult to spot so this extra measure is necessary. I wrote a tiny function to modularize this process called errCheck() and I’ve included it at the very top of the above code example. The above code executes at the speed of 0.150 seconds concurrently.


I had many issues getting this code to execute correctly and upon reading code and framework documentation I decided that I had enough. So I decided to ask some questions on Some friendly developers decided to help me and I am very appreciative of the open-source community. Here are the questions in case you are having any problems:

Next Steps:

The next step for this code is to further optimize it. How am I going to accomplish this? I don’t know yet but you can be assured I will have a follow up blog post coming up in April! see you then!

Which company will invent better motherboard architecture to support efficient parallel computing? Intel, ATI/AMD, nVidia, or Samsung etc…

Mozilla WebVTT 0.6: GitHub Issues, Bug Fixes, Refactoring, and Enhancing the WebVTT Parser


The pressure is on to get this WebVTT parser on the road and it has been a fun 2 weeks trying to get there! I have encountered a lot of Issues a long the way (GitHub Issues). For those of you unfamiliar with GitHub, it is simply a cloud storage facility for your code base with version control. It also supports online collaboration in a ‘social network’ like setting. Our team takes full advantage of this and this post will detail all of the fun stuff that has happened. I have recently had the honour of being a part of the Mozilla Corporation’s mozilla/webvtt repository. It is there where we discuss, probe, change and enhance the official WebVTT code base.

Here’s a Link to the Mozilla/WebVTT Repository:


So I am tasked with finding, fixing, enhancing the WebVTT code base. Where do I start? I had moved my development environment from Windows 7(not hating, still love you), to MacOSX Lion and promptly I ran a ‘make check’ command on our WebVTT Parser Code. What I found shocked me: none of my unit tests were running! They were not failing, they were not passing they did not even get a chance to try.

This made my primary focus to determine why this was occurring. Our code base had gone through some changes leading up to my discovery and I quickly found out that our unit tests needed to be upgraded with the current stylings and functionality of our new WebVTT Code base. Here come the Issues:


The unit tests Rick and I wrote for escape character checking were using #define directives that have been deprecated in the current version of the code base. The directives referred to UTF-16 CodePoints. The problem is that I remembered that we had decided to use UTF-8 encoding and this proved that what we had now was incorrect. This means that I had to upgrade the existing unit tests to comply with what should be tested in the code base right now.

The solution to this problem after much debate was to create some extended functionality in our string library to allow us to compare the UTF-16 values and convert the UTF-8 values in the codebase to UTF-16. Caitlin(caitp), a member of our team went ahead and implemented this functionality. It was up to me to use it properly. Before I could use it, I had to enumerate the UTF-16 CodePoints I needed to test in the upper level cue_testfixture. I changed this file to support this global enumeration:

enum{ rlm = 0x200F, lrm = 0x200E, nbsp = 0x00A0 };

The enum will definitely grow over time and as soon as I start upgrading the other unit test files it will be quite large. The actual unit tests now use the new string library and now look like this:

 * Verifies that multiple escape characters on multiple lines are parsed.
 * From (11/27/2012)
 *  Cue text text consists of one or more cue text components optionally separated by a single line terminator which can be:
 *    1. CR (U+000D)
 *    2. LF (U+000A)
 *    3. CRLF pair
TEST_F(PayloadEscapeCharacter, MultilineMultipleEscapeCharacter)
  loadVtt( "payload/escape-character/multiline-multiple-escape-character.vtt", 1 );

  const TextNode *node = getHeadOfCue( 0 )->child( 0 )->toTextNode();

  /* verify that it is a TextNode */
  ASSERT_EQ(Node::Text, node->kind());

  /* create a UTF8 representation of NBSP and compare with textnode */
  ASSERT_EQ( nbsp, node->content().utf16At( 0 ) );

It should be noted that this is subject to heavy amounts of change in the near future. The reason for this is due to the fact that ASSERT_EQ only evaluates pointers and not the value of the content. An alternative approach of using ASSERT_STREQ has been proposed. This alternative is also not perfect and discussion will occur on what the best approach is for properly dealing with these unit tests.

The discussion can be found here:

The Infinite Loop – file_bytes assumes slen > len

There was an issue that specifically outlined the need for us to check our function parameters before specifically continuing on with the functions main code. The code we are concerned with is the file_bytes() function, a simple strnstr copy located in parser.c. In this case, not checking the function parameter ‘slen’ the code implicitly makes the assumption on its value that it is greater than another parameter ‘len’. That is not a reliable assumption for every case. The need for this was simple: To ensure that the parameters passed to our functions are reliable   to a certain degree. I decided to get on this issue early on and proposed the correct solution. Here it is:

 * basic strnstr-ish routine
find_bytes( const webvtt_byte *buffer, webvtt_uint len, const webvtt_byte *sbytes, webvtt_uint slen )
  // check params for integrity
  if( !buffer || len < 1 || !sbytes || slen < 1 ) {
    return 0;

  webvtt_uint slen2 = slen - 1;
  while( len-- >= slen && *buffer ){
    if( *buffer == *sbytes && memcmp( buffer + 1, sbytes + 1, slen2 ) == 0 ) {
      return 1;
  return 0;

This code solves the improper assumption issue but it also introduces an Infinite Loop in the code making this fix not compilable in our current WebVTT codebase. Which means that we will have to solve the infinite loop before we can apply this patch. The Infinite Loop exposes a vulnerability in other parts of the code base. This code helped uncover this issue. We are currently working on this issue and subsequent posts will update further on our progress.


I also worked on refactoring some of the code base because it contained awkward pointer representation. A function in the codepage took a **variable (a pointer to a pointer) and in the calling function the parameter was sent as &(*var). This is correct, but it is just so much harder to read than just ‘var’. ‘var’ is the address of the variable var and so is &(*var) because the pointer is de-referenced. I went ahead and made these changes in cuetext.c.

To check out all of the other issues I am currently involved with check out our Issues page:

Parallel Programming: Profiling the Open-Source Image Processing Application CImg


The purpose of this experiment is to determine if the “Filled Triangles” function inside the CImg_demo.cpp file which can be found in the CImg Open Source Image Processing Library is suitable and worthwhile to optimize. The optimization will involve using the GPU to handle some tasks and by using Parallel Programming Techniques. This optimization would be done using nVidia’s CUDA Framework on a CUDA enabled GPU.

The advantage the GPU has over the CPU is the fact that it has many cores. Modern CPU’s only have about 7(Intel Core i7 comes to mind). The efficiency is utilizing the many core chip which may have over 100s of cores to run a specialized thread for each core and thus performing tasks in parallel and increasing overall efficiency of the application. This reduces waiting time of threads to execute which is the case in most serial applications. Determining what can be executed in parallel requires some analysis which is what we will be doing shortly.

Here’s a Link to the Open Source CImg Image Processing Framework:

Here’s a Link to my CImg GitHub Repository:

The Code – Filled Triangles

This function when executed will produce colorful rotating and moving Triangles of various sizes on the screen.

In order to determine whether or not an application is eligible for an upgrade we need to determine the current running time of the application and apply Amdahl’s Law to determine if any speedup is achievable based on the results. The function I will be looking at is locating in Cimg/examples/CImg_demo.cpp and it is called ‘Filled Triangles’. I have modified this code to generate the triangles for 1000 iterations of the while loop. The primary reason for this was due to the fact that this application originally required user input to determine when to start and stop. I had to modify it to work automatically when executed and terminate after this given condition is met. Here is a version of that code, I have modified it to extract some basic timing data:

// Include static image data, so that the exe does not depend on external image files.
#include "img/CImg_demo.h"

using namespace std;
// Include CImg library header.
#include "CImg.h"
using namespace cimg_library;
int main() {

// start timing
time_t ts, te, l1s, l1e, l2s;
ts = time(NULL);

// Create a colored 640x480 background image which consists of different color shades.
CImg background(640,480,1,3);
cimg_forXY(background,x,y) background.fillC(x,y,0,
x*std::cos(6.0*y/background.height()) + y*std::sin(9.0*x/background.width()),
x*std::sin(8.0*y/background.height()) - y*std::cos(11.0*x/background.width()),
x*std::cos(13.0*y/background.height()) - y*std::sin(8.0*x/background.width()));

// Init images and create display window.
CImg img0(background), img;
unsigned char white[] = { 255, 255, 255 }, color[100][3];
CImgDisplay disp(img0,"[#6] - Filled Triangles (Click to shrink)");

// Define random properties (pos, size, colors, ..) for all triangles that will be displayed.
float posx[100], posy[100], rayon[100], angle[100], veloc[100], opacity[100];
int num = 1;
std::srand((unsigned int)time(0));

l1s = time(NULL);
for (int k = 0; k<100; ++k) {
posx[k] = (float)(cimg::rand()*img0.width());
posy[k] = (float)(cimg::rand()*img0.height());
rayon[k] = (float)(10 + cimg::rand()*50);
angle[k] = (float)(cimg::rand()*360);
veloc[k] = (float)(cimg::rand()*20 - 10);
color[k][0] = (unsigned char)(cimg::rand()*255);
color[k][1] = (unsigned char)(cimg::rand()*255);
color[k][2] = (unsigned char)(cimg::rand()*255);
opacity[k] = (float)(0.3 + 1.5*cimg::rand());
l1e = time(NULL);
// elapsed time
cout << setprecision(3);
cout << "Elapsed time : " << difftime(l1e, l1s) << endl;

// measuring time it takes for triangle animations in 1000 iterations
int i = 0;

l2s = time(NULL);
// Start animation loop.
while (!disp.is_closed() && !disp.is_keyQ() && !disp.is_keyESC() && i < 1000) {
img = img0;

  // Draw each triangle on the background image.
        for (int k = 0; k<num; ++k) {
            const int
            x0 = (int)(posx[k] + rayon[k]*std::cos(angle[k]*cimg::PI/180)),
            y0 = (int)(posy[k] + rayon[k]*std::sin(angle[k]*cimg::PI/180)),
            x1 = (int)(posx[k] + rayon[k]*std::cos((angle[k] + 120)*cimg::PI/180)),
            y1 = (int)(posy[k] + rayon[k]*std::sin((angle[k] + 120)*cimg::PI/180)),
            x2 = (int)(posx[k] + rayon[k]*std::cos((angle[k] + 240)*cimg::PI/180)),
            y2 = (int)(posy[k] + rayon[k]*std::sin((angle[k] + 240)*cimg::PI/180));
            if (k%10) img.draw_triangle(x0,y0,x1,y1,x2,y2,color[k],opacity[k]);
            else img.draw_triangle(x0,y0,x1,y1,x2,y2,img0,0,0,img0.width()-1,0,0,img.height()-1,opacity[k]);

            // Make the triangles rotate, and check for mouse click event.
            // (to make triangles collapse or join).
            if (disp.mouse_x()>0 && disp.mouse_y()>0) {
                float u = disp.mouse_x() - posx[k], v = disp.mouse_y() - posy[k];
                if (disp.button()) { u = -u; v = -v; }
                posx[k]-=0.03f*u, posy[k]-=0.03f*v;
                if (posx[k]<0 || posx[k]>=img.width()) posx[k] = (float)(cimg::rand()*img.width());
                if (posy[k]<0 || posy[k]>=img.height()) posy[k] = (float)(cimg::rand()*img.height());

// Display current animation framerate, and refresh display window.
img.draw_text(5,5,"%u frames/s",white,0,0.5f,13,(unsigned int)disp.frames_per_second());
if (++num>100) num = 100;

// Allow the user to toggle fullscreen mode, by pressing CTRL+F.
if (disp.is_keyCTRLLEFT() && disp.is_keyF()) disp.resize(640,480,false).toggle_fullscreen(false);
te = time(NULL);

// elapsed time
cout << setprecision(3);
cout << "Drawing Triangles Loop: "<< difftime(te, l2s) << "Elapsed time : " << difftime(te, ts) << endl;

return 0;

Potential Candidates

Upon analyzing this function I discovered two possible areas where I could optimize the code using threads sent to the GPU. The first is a for loop which sets the attributes for 100 triangles in serial. This task can be done in parallel using 100 threads on the GPU.

for (int k = 0; k<100; ++k) {
        posx[k] = (float)(cimg::rand()*img0.width());
        posy[k] = (float)(cimg::rand()*img0.height());
        rayon[k] = (float)(10 + cimg::rand()*50);
        angle[k] = (float)(cimg::rand()*360);
        veloc[k] = (float)(cimg::rand()*20 - 10);
        color[k][0] = (unsigned char)(cimg::rand()*255);
        color[k][1] = (unsigned char)(cimg::rand()*255);
        color[k][2] = (unsigned char)(cimg::rand()*255);
        opacity[k] = (float)(0.3 + 1.5*cimg::rand());

The second instance where this is possible is a bit tricky. It involves another serial for loop. The purpose of this loop is to draw each of the triangles on the screen and manipulate them later on. I am not 100 percent sure this can be done in parallel in practice but in theory it should be possible because the application is drawing out each triangle one by one.

  // Draw each triangle on the background image.
        for (int k = 0; k<num; ++k) {
            const int
            x0 = (int)(posx[k] + rayon[k]*std::cos(angle[k]*cimg::PI/180)),
            y0 = (int)(posy[k] + rayon[k]*std::sin(angle[k]*cimg::PI/180)),
            x1 = (int)(posx[k] + rayon[k]*std::cos((angle[k] + 120)*cimg::PI/180)),
            y1 = (int)(posy[k] + rayon[k]*std::sin((angle[k] + 120)*cimg::PI/180)),
            x2 = (int)(posx[k] + rayon[k]*std::cos((angle[k] + 240)*cimg::PI/180)),
            y2 = (int)(posy[k] + rayon[k]*std::sin((angle[k] + 240)*cimg::PI/180));
            if (k%10) img.draw_triangle(x0,y0,x1,y1,x2,y2,color[k],opacity[k]);
            else img.draw_triangle(x0,y0,x1,y1,x2,y2,img0,0,0,img0.width()-1,0,0,img.height()-1,opacity[k]);

            // Make the triangles rotate, and check for mouse click event.
            // (to make triangles collapse or join).
            if (disp.mouse_x()>0 && disp.mouse_y()>0) {
                float u = disp.mouse_x() - posx[k], v = disp.mouse_y() - posy[k];
                if (disp.button()) { u = -u; v = -v; }
                posx[k]-=0.03f*u, posy[k]-=0.03f*v;
                if (posx[k]<0 || posx[k]>=img.width()) posx[k] = (float)(cimg::rand()*img.width());
                if (posy[k]<0 || posy[k]>=img.height()) posy[k] = (float)(cimg::rand()*img.height());

Profiling The Application

In order to profile I will be using the gprof GNU profiler. In order to profile this application I need to insert ‘-pg -g’ into the compilation statement of G++. and “-pg” into the linker ld call. It should be noted that gprof will not work on a mac machine with an Intel processor. I found this out the hard and annoying way.

After the application has compiled. It is time to Run the program, I named my executable CImg_demo. After running the program run the following command:

gprof -b -p <name of exe> > <name of new flat file to be generated>.flt

This is my profiling output:

Flat profile:

Each sample counts as 0.01 seconds.

  %   cumulative   self              self     total           

 time   seconds   seconds    calls  us/call  us/call  name    

 82.26      2.55     2.55  4820368     0.53     0.53  frame_dummy

 12.26      2.93     0.38                           draw_line

  2.58      3.01     0.08    10965     7.30     7.33  draw_image

  1.94      3.07     0.06                           draw_triangle

  0.32      3.08     0.01   298115     0.03     0.03  cimg_library::CImg

  0.32      3.09     0.01                             item_3d_reflection()

  0.32      3.10     0.01                             fillC

  0.00      3.10     0.00    14040     0.00     0.00  assign

  0.00      3.10     0.00    13270     0.00     0.00  assign

  0.00      3.10     0.00     9053     0.00     0.00  cimg_library::cimg::X11_attr()

  0.00      3.10     0.00     7130     0.00     0.00  ~CImg()

  0.00      3.10     0.00     2305     0.00     0.00move_to(cimg_library::CImg<float>&)

  0.00      3.10     0.00     1793     0.00     0.00  assign

  0.00      3.10     0.00     1024     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00     1000     0.00     0.53  render

  0.00      3.10     0.00     1000     0.00    80.40  _draw_text

  0.00      3.10     0.00     1000     0.00     0.00  assign

  0.00      3.10     0.00     1000     0.00    81.11  cimg_library::

  0.00      3.10     0.00      769     0.00     0.70  cimg_library::CImg

  0.00      3.10     0.00      769     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00      769     0.00     0.70  cimg_library::CImg

  0.00      3.10     0.00      768     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00      702     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00      513     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00      512     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00      189     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00      189     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00       67     0.00     0.00  cimg_library::CImg

  0.00      3.10     0.00        6     0.00     0.00  cimg_library::CImgList

  0.00      3.10     0.00        3     0.00     0.35  cimg_library::CImgDisplay

  0.00      3.10     0.00        2     0.00     0.00  cimg_library::CImgList
  0.00      3.10     0.00        2     0.00     0.00  cimg_library::CImgList

  0.00      3.10     0.00        1     0.00     0.00_GLOBAL__sub_I__Z22item_blurring_gradientv

  0.00      3.10     0.00        1     0.00     0.00 CImgDisplay::_map_window()

  0.00      3.10     0.00        1     0.00     0.00  CImgDisplay::_assign

  0.00      3.10     0.00        1     0.00     0.00  CImg<unsigned char>::~CImg()

  0.00      3.10     0.00        1     0.00   136.13 CImgList<float>::font

  0.00      3.10     0.00        1     0.00   136.13  CImgList<float>::_font

  0.00      3.10     0.00        1     0.00     0.00  CImgList<float>& cimg_library::CImgList

  0.00      3.10     0.00        1     0.00     0.00  CImgList<float>::CImgList

  0.00      3.10     0.00        1     0.00   136.13  CImgList

Summary of Findings

The execution of the program takes roughly 3.10 – 20 seconds (depending on how long you are measuring the calculations of triangle animations). it should be noted that this application initially was an application that relied upon user input for execution and for termination. I have modified this initial behavior by ensuring the while loop (which generates the triangles) executes only for a maximum of 1000 iterations. The time measured in this assignment is for every 1000 iterations of this loop.

 Profiling Results

The results if the initial profile shows that the execution time is most greatly consumed when drawing the triangles out to the screen one at a time. It seems like this can be optimized by offloading this drawing to n threads based on n triangles to be drawn. But this is subject to change because of any additional complexity that may be introduced that may include interoperability with both the GPU and CPU.

There is another for loop which sets the dimensions for each triangle one by one in linear time O(n ). This process can also be out-sourced to the GPU in n threads for n triangles. I would need to determine if this process also involves interoperability between the CPU and GPU.

The complexity of the entire program is O(n^3). There is a for loop for setup, a while loop for accepting user input and another for loop for drawing the triangles.

Also the times recorded can be increase if the maximum loop iterations increase ie: 10000,100000,1000000. This will identify the same relationship but with higher task time.

Amdahls Law Calculations

Amdahls Law measures the potential efficiency that can be achieved by adding numerous cores to an existing application that only uses 1 core in the CPU.

Since there are 100 Triangles generated then we can theoretically create 100 threads for each triangle. The draw_line, draw_triangle, and draw_image functions take up 16 percent(0.38 + 0.08 + 0.06 / 3.10) of the execution time of the application. Plugging that into the equation using 100 cores we get:

S100 = 1/ 1 – 0.16 + 0.16 / 100

= 1.18 or 1.2 speedup is theoretically achievable rounded up PER 1000 iterations of the while loop to draw these triangles.

Will I work on this Project? If I can optimize this function or any other function within the CImg library I will continue with this project. If it is not possible to optimize this project within the given time of the course then it will be difficult to continue on with this project and I will have to work with someone else’s project. But my initial plan is to continue with this project unless I am told otherwise.

 Issues Encountered

The profiling tool gprof does not work on the macbooks with an Intel processor installed (I have Intel Core i5). This was verified by numerous internet resources and annoying personal experience.

WebVTT 0.5 Release: WebVTTLoadListener and WebVTT Parser Issues


We have been hard at work at trying to integrate our WebVTT parser with the Mozilla FireFox Web Browser. What we needed to do according to Chris Pearce was to implement the TextTrackDecoder. This is a C++ layer that interfaces between our C WebVTT Parser and the DOM(Document Object Model) of the Mozilla FireFox web browser.

In parallel to these tasks we have also been actively working on issues with the current C Parser. The issues are listed on the GitHub Repository page for the Parser which is currently in Mozilla’s repository tree. I’ve worked on and made some issues based on some observations I made while looking at the code base.

Here’s a Link to our GitHub repository for the WebVTT Parser:

Progress for me personally, has been slow because of the fact that I need to grasp the components of the browser I have to interface with when I am implementing the WebVTT Load Listener. There’s a lot of reading to do and while I understand the basic concepts I need to fully understand how the little components work technically.

WebVTT LoadListener

The first thing we were supposed to do was to migrate the existing LoadListener inner class located in HTMLTrackElement.cpp to it’s own file. Thus the WebVTTLoadListener.cpp was created. we moved the class over. Added a  new member: webvtt_parser_t and some functions to support it. We implemented the existing methods but my partner Rick and I are still trying to clear up some of the confusion on how to properly implement some of the new functions we think are needed to link the Parser with the DOM.

Here is a link to my GitHub repository:

GitHub C-Parser Issues

I also worked on some GitHub C-parser issues. More Specifically, Issue 21:

I also created two new Issues about the general coding structure of our parser.c file. I suggested that we should check the arguments of functions to better identify some of the problems we are experiencing with getting some of the unit tests to pass. In addition I suggested we should also ensure that all variable declarations have their proper initialization before use to prevent unexpected behavior in code.


Integrating The WebVTT HTMLTrackElement into Mozilla Firefox on Windows 7 x64 bit


We have been hard at work trying to get the WebVTT Parser integrated into Mozilla Firefox! We have to accomplish this goal in steps. Before the actual parser can be integrated into the browser we need to include an old patch that Ralph Gilles, a developer, worked on for the WebVTT Specification. His patch involved incorporating the HTMLTrackElement into the browser DOM (Document Object Model).

The Track element is necessary because it allows subtitle and caption tracks to be specified in the <audio> and <video> HTML5 elements. This patch was built to support the old WebVTT parser that was originally authored by Ralph so we had to import his code after we applied the patch to an old version of Mozilla Firefox. Luckily this part was easy to do using an executable script. Then all we needed to do was to build the browser with the integrated code and check for the HTMLTrackElement.

Here is a Link to the patch that we integrated on BugZilla

Here is a link to the original WebVTT Parser Associated with this Patch:


We needed to have an instance of mozilla-central (mozilla firefox repository) available for us to use. We also needed to have an old commit of mozilla-central that was live near the date of March 14 2012 because the patch Ralph wrote was targeted for this specific version of Mozilla Firefox. Because I will building the specified version of Mozilla Firefox on Windows I had to use the MozillaBuild program with the MSVC2010 compiler.

Here is a link to mozilla-central the public GitHub repository for the Mozilla Firefox web browser:

Here is the GitHub commit hash that I used from the public mozilla-central repository:


In order to apply the patch to that particular commit I need to create a new branch that contains the old version of the browser code. we will call this new branch: ‘mozilla-old’ Here is the Git command to do this:

git checkout -b mozilla-old b984fc4495f6a4b2e44417168db4187a44514341


In order to integrate the patch with the track element i had to first check to see if the patch is going to merge correctly into the mozilla-old repository we need to ensure there wont be any errors when the patch is applied. the git command ‘apply’ has a check option to allow us to make this verification:

git apply –check track-element.patch

If there aren’t any errors displayed on the screen it indicates that the patch can cleanly merge. I ran the command git apply track-element.patch to apply the patch to the repository. The next thing we need to do is to add Ralph’s source files to the mozilla-old repository. Luckily, Ralph wrote a neat little script that pulls in that code from his repository. His script is located in: mozilla-central/media/webvtt. switch into the webvtt directory and run the script. It will create an ‘src’ folder within the webvtt directory and it will contain the WebVTT source code.

Now that we have everything we need in place it is time to compile the code and run the web browser. To compile the code run the following command in the msvc terminal after navigating to the mozilla-central directory:

build/pymake/ -f -j7

the ‘7’ after -j is the number of cores i used to compile the source code. I have an intel core i7 processor on my machine. you can use any number of available cores to compile the mozilla code.

Immediately after starting to compile the code I noticed some errors. These errors were associated with an incompatibility of the WebVTT code between different revisions of the C Programming Language. MSVC10 uses C98 compiler while the latest WebVTT code is compiled in C99. Therefore, I had to make some changes to webvtt.c and ensure that all variables are declared at the head of a function definition.

Upon recompiling I noticed another error with snprintf. I opened the file and replaced snprintf with PR_snprintf and #included the correct library. The code compiled successfully this time. Now it is time to run it and to check if the HTMLTrackElement exists in the browsers web console:

Here is an image of the HTMLTrackElement in Mozilla Firefox!


Implementing the WebVTT Standard: Development Agenda of 2013


2013 is going to be a big year for us in terms of WebVTT Development. In the last year we have accomplished a lot and it is looking like this year will be focused on taking what we have done and building further on top. At the end of 2012 we have managed to code an entire WebVTT Parser which was implemented in the C programming language. Further, we have written some tests for the parser using Google’s GTest Unit Testing Framework.

Given the previous accomplishments they come at some cost to us. The code is riddled with errors and poor developmental style. The tests are well written, but the vast majority of them do not pass and the code is very hard to read and understand.

My Solution

Although I wanted to contribute significantly to the code base this year, I do not think it will happen unless the following is completed to a reasonable degree. The answer to the problems seems very obvious to me: to fix the code and to ensure all of the current tests are passing. This however is going to be a difficult and tedious task because most of the code for parser is not written by me and it will require a great deal of collaboration in order to truly address these issues. For my part, I have been analyzing the code and building some hypothesis on why errors are being thrown in our code base independently.

Luckily, I took a free online debugging course during the break which taught and reaffirmed some of the knowledge it takes to efficiently and intuitively debug code.

Here’s a link to a Free Software Debugging course:

Here’s a link to a Free Software Testing course:

Once I am finished with debugging the code. I will focus on ensuring the webvtt spec is fully developed. I will also make a contribution to the WebVTT C parser while debugging and this may involve programming better algorithms and styling to ensure the code performs well and looks like code that is maintained by a Professional Software Development Team.

The Agenda

My Agenda is written with the focus of ensuring the parser is robust and has the required integrity needed to be integrated to the mozilla firefox browser. Most of this testing will involve some programming to fix or re-code parts of the code base. The focus will be to solve the main objectives of the problem areas and to ensure they are coded in the simplest and most efficient manner. Styling will also need to be improved.

Date:                            Goal

Jan 14  –> Build Hypothesis on the problem areas in the code base

Jan 28 –> Refine Hypothesis by testing the code base

Feb 11 –> get all of my Unit Tests to Pass the parser

Mar 04 –> get some more unit tests to pass the parser (10-20 or more)

Mar 18 –> get more tests to pass (10-20 or more)

April 1 –> Hopefully finish testing everything or continue testing

April 15 –> Finish and polish everything up or work on coding and integrating the parser with mozilla.

This agenda is subject to change but it outlines what I hope to accomplish by these dates. I hopefully want to finish everything as early as possible so I can add more work for myself in terms of actual programming and development.

%d bloggers like this: