Shayan's Software & Technology

My adventure as a Software Engineer continues..,

WebVTT 0.2 Release!

Overview

It has been quite a busy week we have spent reading, writing and discussing code! and I have been busy trying to write simple, efficient and easy to understand code. I can say that we haven’t reached our goal yet but we are definitely getting close and have learned a lot on this journey. We have thrown away and made some heavy changes in our code base none the less. This indicates progress and we are determined to continue moving forward. Let’s look at what I’ve done:

Here’s a link to your GitHub C-parser branch:

https://github.com/ShayanZafar/webvtt/tree/c-parser/webvtt-parser-new

Converting code into C-Style

yes, this had to be done. when you code in a particular programming language such as C you must code in its own style. The C code we wrote previously was not “C code”. It looked more like C# code trying to imitate C. I spent some time enforcing C style code conventions before moving forward with development.

Error Object

There has been some confusion is to what the purpose of having this module is. “Error” seems to be too generic a term to describe what the parser does with and considers to be something that is not valid input. We are deciding on how to properly segregate the types of errors that are encountered and representing the errors in such a manner that is user friendly and that conforms to standards.

The Error object went through a very thorough upgrade over this week. I decided to follow the GNU standard for error handling. This standard can be found here:

http://www.gnu.org/prep/standards/html_node/Errors.html

Here is the new Error Object Code:


/* Represents an Error Logging Type and follows GNU style error logging standards:
 * http://www.gnu.org/prep/standards/html_node/Errors.html
 * These error standards are followed for source files and webVTT files.
 */
typedef struct {
	char *error_code;
	char *error_message;

	int source_line_number;
	int webvtt_line_number;

	char *webvtt_file_name;
	char *source_file_name;

} error;

void print_error_list(webvtt_buffer_info *webvtt) {
	FILE *fp = NULL;
	int i;

	fp = fopen("errorlog.txt", "w");
	for (i = 0; i < webvtt->error_list_size; i++) {

	    /* Try to add entry to the log file. */
		if (fp) {
			/* save to file */
			fprintf(fp, "Source File: %s Line: %d  Error Code: %s %s: VTT File: %s Line: %d\n",webvtt->error_list[i].source_file_name, webvtt->error_list[i].source_line_number,
				webvtt->error_list[i].error_code, webvtt->error_list[i].error_message, webvtt->error_list[i].webvtt_file_name, webvtt->error_list[i].webvtt_line_number);
			/* print to stderr */
			fprintf(stderr, "Source File: %s Line: %d  Error Code: %s %s: VTT File: %s Line: %d\n",webvtt->error_list[i].source_file_name, webvtt->error_list[i].source_line_number,
				webvtt->error_list[i].error_code, webvtt->error_list[i].error_message, webvtt->error_list[i].webvtt_file_name, webvtt->error_list[i].webvtt_line_number);
		} else {
			fprintf(stderr, "Error could not be logged!\n");
		}

		/* Deallocate the current Error Object in the list. */
		free(webvtt->error_list[i].error_code);
		free(webvtt->error_list[i].error_message);
		free(webvtt->error_list[i].webvtt_file_name);
		free(webvtt->error_list[i].source_file_name);
	}
	fclose(fp);
}

void create_error(webvtt_buffer_info *webvtt, char *code, char *message, char *source, char *vtt, int source_line, int vtt_line) {
	/*
	 * Create and initialize the error structure.
	 * Error messages are represented with ASCII character strings
	 */
	error *er = (error *)malloc(sizeof(error));
	er->error_code = (char *)malloc(sizeof(char) * strlen(code));
	strcpy(er->error_code, code);
	er->error_message = (char *)malloc(sizeof(char) * strlen(message));
	strcpy(er->error_message, message);

	if (!source) {
		er->source_file_name = (char *)malloc(sizeof(char) * strlen("webvtt.c"));
	} else {
		er->source_file_name = (char *)malloc(sizeof(char) * strlen(source));
		strcpy(er->source_file_name,source);
	}

	if (vtt) {
		er->webvtt_file_name = (char *)malloc(sizeof(char) * strlen(vtt));
		strcpy(er->webvtt_file_name, vtt );
	}

	er->source_line_number = source_line;
	er->webvtt_line_number = vtt_line;

	/* Add to the end of the error array. */
	add_to_error_list(webvtt, er);
}

void add_to_error_list(webvtt_buffer_info *webvtt, error *er) {
	/*  Make use of the first allocated element before reallocating. */
	if(webvtt->error_list_size == 0)
		webvtt->error_list[0] = *er;
	else {
		webvtt->error_list = (error *)realloc(webvtt->error_list, sizeof(error) + sizeof(webvtt->error_list));
		webvtt->error_list[webvtt->error_list_size] = *er;
	}
	webvtt->error_list_size++;
}

Parsing Cue Text

I also wrote a throw-away algorithm for parsing the cue text track.The Purpose of this algorithm was to learn from mistakes and to learn how the cue text in a vtt file should be properly parsed. The parsing of the cue text is recursive in nature. Therefore i would assume that an efficient algorithm employs recursion to successfully parse cue text. I took an iterative approach. I took this approach to gather information on what needed to be done. To learn about it, so that I can come up with a recursive solution with my partner. We are currently working on a recursive solution and we were refine it so that it works best for parsing the cue text.

My partner Rick is working hard on making a recursive counter part. Once we refine it we will have a good cue text parsing algorithm. wish us luck!

The Test Subject. In order to do this iteratively, I needed a stack. So I made one.

/* Stack that holds the end tags to find */

typedef struct {
	int top;
	char** to_find;
}tag_stack;

void initialize_tag_stack(tag_stack *end_tags);
void destroy_tag_stack();
/* push end tag at top of stack */
void push_endtag(tag_stack *end_tags, char* end_tag);
/* pop end tag at top of stack */
char* pop_endtag(tag_stack *end_tags);

void initialize_tag_stack(tag_stack *end_tags) {
	end_tags->to_find = (char **)malloc(sizeof(char**));
	int top = -1;
}

void destroy_tag_stack(tag_stack *end_tags) {

	while (end_tags->top-- > -1)
		free(end_tags->to_find[top]);
	if(end_tags->to_find)
		free(end_tags->to_find);
}

char* pop_endtag(tag_stack *end_tags) {
	return end_tags->to_find[end_tags->top--];
}
void push_endtag(tag_stack *end_tags, char* endtag) {
	/* create more space for the new end tag */
	end_tags->to_find = (char**)realloc(end_tags->to_find,sizeof(end_tags->to_find) + sizeof(endtag));
	strcpy(end_tags->to_find[end_tags->top++], endtag);
}

The purpose of the stack is to keep track of which tags should be closed first. So that we have some sense of ordering and patterning within the cue text parsing process.

The algorithm to parse the cue text is massive. It tries to find and validate the tags properly and to store a successfully closed tag in the cue text track. This part was not implemented. But it would be a few lines to add it to a list of Node objects. The Algorithm iterates through a preloaded array called line_position. This array is pre-loaded with a cue text portion of a vtt caption.

Rick made changes to his load_line function to accommodate for this need. we check the characters at position until we find something that looks like a tag. If it does look like a tag, we attempt to find it’s remaining parts. we store the endtags in a a stack to be popped out later for end tag validation. The algorithm also has comments to where errors will be thrown to the error module. We have decided to leave this as comments and add the function calls to the error module at the end of development to ensure we have less tedious work to do in the end and to ensure errors work properly.

/*
 * This algorithm will parse cue text
 * This algorithm performs steps some of which repeat depending on the input encountered
 * Step 1: Validate an open tag from begining to end
 * Step 2: Store the end tags that are expected to be found in a stack
 * Step 3: Find the end tag in correct order (inner most first) using the stack
 * Step 4: Register the tag as a node object into the text_track_cue
 * The algorithm throws errors to the error object if they are encountered.
 */
int parse_text_track_cue_text(webvtt_buffer_info *webvtt, text_track_cue *cue) {

	tag_stack *end_tags;
	initialize_tag_stack(end_tags);

	int open_tag_found = 0;
	int complete_open_tag = 0;

		while (*webvtt->line_position++ != NULL_BYTE) {
			/* '<' character encountered, look for a valid opening tag */
			if (*webvtt->line_position == LT) {
				/* look for tags that are greater than one character */
				if(open_tag_found) {
					/* Throw Error, open tag improperly formed */
				}else if (complete_open_tag && webvtt->line_position[1] == '/' && end_tags->top > -1) {
					char* to_find = pop_endtag(end_tags);

					/* if a < is encountered and it is followined by a '/' find the closing tag */
					if (!memcmp(webvtt->line_position,to_find,sizeof(to_find))){
						/* Tag is ready to be encapsulated as a node in the text_track_cue list */
					}else {
						/* throw error: tag not found in correct order */
					}
					/* move position nbytes over */
					*webvtt->line_position += sizeof(to_find);

				}else if (!memcmp(webvtt->line_position, "ruby", sizeof("ruby "))) {
					/* move position nbytes over */
					*webvtt->line_position += sizeof("ruby");
					push_endtag(end_tags,"</ruby>");
					open_tag_found = 1;

				} else if (!memcmp(webvtt->line_position, "rt", sizeof("rt "))) {
					/* move position nbytes over */
					*webvtt->line_position += sizeof("rt");
					push_endtag(end_tags,"</rt>");
					open_tag_found = 1;
				}else {
					/* go to the next character position, and look for tags with one character  */
					*webvtt->line_position++;

					switch (*webvtt->line_position)
					{
					case 'b':
						push_endtag(end_tags,"</b>");
						open_tag_found = 1;
						break;
					case 'i':
						push_endtag(end_tags,"</i>");
						open_tag_found = 1;
						break;
					case 'u':
						push_endtag(end_tags,"</u>");
						open_tag_found = 1;
						break;
					case 'c':
						push_endtag(end_tags,"</c>");
						open_tag_found = 1;
						break;
					case 'v':
						push_endtag(end_tags,"</v>");
						open_tag_found = 1;
						break;
					default:
						/* Throw error if no known tags were found: dangling < character */
						break;
					}
				}
			}else if(*webvtt->line_position == GT && open_tag_found){
				/* opening tag is complete */
				complete_open_tag = 1;
				/* open tag is no longer seeking completion */
				open_tag_found = 0;
			}

		}// while position != null byte
		/*  clean up stack */
		destroy_tag_stack(end_tags);

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: