With the time passing by, my final product for the Google Summer of Code 2016 is starting to get the shape. The user’s workflow for the translation with TMGMT will be easier than ever with the CKEditor plugins that I am working on for this summer project. I clearly divided the content into segments and displayed them in the editor before the midterm evaluation period in my first segments plugin. You can read more about that in my blog post from week 5 - here.

The second plugin called tags is currently being developed. With it, we want to mask HTML tags inside segments, so that the user can get a better idea of the structure of the content, see which tags are set, opened and check, if they are properly closed.

Achievements

The seventh week was in a sign of blockers - parts of code that are complex, not yet implemented, not functioning properly or just simply need discussion. I started the week with some code refactoring.

We have discussed the structure of the masked tags. We do not support masked tag pairs. Instead, every masked tag will consist of two parts - their opening and closing element.

  • element - the name of the masked HTML tag
  • raw - contains the encoded tag, together with attributes

This is is our definition of the masked tag’s structure:


<tmgmt-tag element=”b” raw=”&lt;b&gt;”>This is a masked tag<tmgmt-tag element=”b” raw=”&lt;/b&gt;”>
<tmgmt-tag element=”img” raw=”&lt;img src=... alt=... title=...&gt;”>

The reasons for this redefinition are many. Firstly, it is really important that the editor displays the masked tags properly with their respective names. This is why we have the element attribute inside the tag. Secondly, the raw attribute contains the encoded tag. In the case when the tag has some attributes like src, alt and title, we will easily get them from here, decode them, and display them to the user when the tag is clicked. The downside is, this makes these attributes untranslatable and they will be just placed back untranslated. When unmasking, the process will simply replace <tmgmt-tag> with its raw property. Last, but not least, this structure helps us when looking for open and closed tags inside segments.

Also note, the tags above are different. The first one is a tag that requires a closing pair, the second one is a single tag (could also be <br />, <hr /> etc.). If the closing pair of a tag would be missing, we’d color the tag’s border in red as a sign of warning.

As mentioned before, I stumbled upon some blockers during this week. The first one was an issue related to closing the tags. Since I created some dummy segments with masked tags inside, the editor’s behaviour was expected. The tags didn’t have their closing elements (</tmgmt-tag>), so they were added automatically. My mentor pointed me out to check how the <img> tag is defined in CKEditor. We found out that this is defined the CKEDITOR.dtd object, which holds the representation of the HTML DTD to be used by the editor in its internal operations. I need to define the tag in the $empty object, which contains the list of empty (self-closing) elements.

I also started working on the topic of perfect matching. This consists of getting the translation of a segment from the translation memory and checking for a full match. This means, that the text and the HTML structure of a selected segment both match with the translation in the memory. Fuzzy matching would be, if only one of that would match. The segments in the memory are unmasked, while we can only get the masked segments from the editor and send them through the HTTP request to the service.

The workflow of masking the tags would be the following:

  1. Mask the tags when the editor loads.
  2. Send the masked tag through the HTTP request to the service controller.
  3. Unmask the tag.
  4. Query for the translation in the memory.
  5. If we have a match, mask it.
  6. Return it.

The problem is that the masking and unmasking functions are not supported in tmgmt yet. Remember, I wrote the dummy segments with some masked tags in the tmgmt_ckeditor.install file solely for the purpose of testing. Hopefully, this will be done and committed in the following days, so I can continue my work on that part.

I also did some improvements, like fixing the dependency to the segments plugin, display the tags in the editor pairs properly and made the tags look nicer by adding some CSS styles.

Goals for next week

The plan for next week is to fix the blockers described above and to fully restructure my code. I created a prototype object last week (described here) containing the relevant information of the editor pair. We want to implement all the relevant functions as prototype methods and have them accessible for both plugins, to prevent code duplications and make the code cleaner.

My GSoC project is available in my Github repository.

Another week passed by of my Google Summer of Code 2016 project. My goal for this summer is to create a revamped user interface for the module called Translation Management Tool, with many new features that will simplify the translation process for both, end-users and translators. I divided my goals into two parts:

  • creating segments out of the content, mark them in the editor and perform actions based on the clicked word and segment
  • masking HTML tags inside segments

As we already passed the midterm evaluation, half of the project is already done in the previous weeks. I moved to the second part of my project this week. More details about my work is described below.

Achievements

As for every week, I started my weekly development cycle by refactoring the code based on my mentor’s comments. Since last week I fixed an important blocker and added the support for multiple editors on the translation and review pages, we agreed that the editors should only work in pairs. We now clearly mark the same segments in editor pairs and toggle the CKEditor plugin buttons accordingly. I did this by implementing a new JavaScript Object Prototype, which contains many relevant information that we want to store about the editor pair, such as:

  • ID of the editor pair
  • the selected editor name
  • the DOM element of the area below the editor (in which we display selected segments, words and suggestions)
  • the selected word
  • the selected segment’s id
  • the counter of segments, that are marked as completed

Other than refactoring and fixing smaller issues, I started working on the new plugin for displaying the masked HTML tags inside the segments. The purpose of masking the tags is to help us understand the translating text’s structure better and cleanly show which opening/closing tags are missing inside a segment. Firstly, I extended the dummy translatable node in the tmgmt_ckeditor.install file with a masked <b> tag that looks like this:


<tmgmt-tag tag=”&lt;b&gt;”>text inside a masked tag</tmgmt-tag tag="&lt;/b&gt;">

Once I had a tag to work on, I created a simple plugin that just displays some arrows instead of the opening and closing of the masked tags. It is fully dependent on the plugin that is displaying the segments, which means it is only enabled and available to toggle when the segments are shown, and disabled otherwise.

For the next steps, we should discuss about the attributes of the masked tags. We should for sure preserve them throughout the translation process. In the example of alt and title tags that would possibly need translation, we could have them hidden and display them on mouse hover, so that the user is aware of them being present.

First version of masked tags
First version of the masked tags plugin.

Goals for next week

The plan for next week is to continue on developing a valid definition of tags and extending it’s functionality by handling data attributes properly.

Feel free to check my Github repository for being updated on the constant progress during my weeks.

Week 5 is over and I have successfully passed the midterm evaluation for Google Summer of Code 2016. I really challenged myself by choosing a project that required from me a lot of learning and constant adaptation to changes in code. As building a nice UI like Google Translate has wasn’t enough for us, we wanted to make the Translation Management Tool module a better CAT tool by creating a completely new user interface, with a lot of new cool features, in connection with a translation memory - which would hold translations, their usage, quality, source, etc. The main goal I wanted to reach before the midterm was to build the UI that displays segments of text in the CKEditor, their suggestions from the memory and all relevant information in an area below the translation editor. I am proud to say the progress is well visible and we are slowly but surely fulfilling all of our project requirements.

Achievements

This week was full of personal issues, but nonetheless I managed to meet all of my goals that I set in our last weekly meeting (described here).

Based on a quick review by my mentor miro_dietiker, I fixed quite a few code issues. Since last week I implemented the connection between my plugin (the UI part) with the tmgmt_memory), I got a few comments about the structure of the http request and response. I had to extend the request with source and target languages, the response on the other hand was lacking of more info about the source of the translation, stripped text and quality. I also changed the code to support multiple translations by adding a new level of nesting to the response with clearly describing keys and corresponding values. This means we can have multiple translations in the memory for one segment, which means that the user can choose between them which one he thinks is the best and use it as a translation by just a single click on a button.

Another cool feature I implemented is the listener for the changes, made by the user in the editor. I faced some trouble doing that, but with some helpful input in this stackoverflow thread that I opened, I managed to do that with a timer and the CKEditor onChange event. This might be a bottleneck in pefromance perspective, since we are calling the http request many times because of the timer, but I found this to be the only viable solution for that issue for now. Once this is implemented, the addition of the data attributes for the source of the translation was easily done (if it comes from the user or from the memory).

The markup to do real pairing of the editors was my main focus this week. We need to support more than just two editors on the page, for example when translating multiple fields like paragraphs, we can end up having n-pairs of editors. To do this, a lot of refactoring was needed, which resulted in more than 300 lines of code changed. Firstly, I created a new node with paragraphs, which contain segments with specific ids. This was done in tmgmt_ckeditor.install, so that it happens when the module is enabled - which makes the usage and the progress of the plugin available right out of the box. After that, we needed an initial for loop over all source editors to populate the translation editors with data. This might be the only for loop in code that can exist. I removed the for loops that were present when searching for same segments and marking them as active, since this is very time consuming and might result in a bad performance when having many editors on page. We are pairing the editors now according to their id and name (in regex: *id*value-source-value$ and *id*value-translation-value$. This change also affected the way that the areas below the editors are displayed and how the plugin works in general.

Fifth version of the plugin
With this iteration of the plugin, we got rid of one of the biggest blockers and made it (almost) fully usable.

Goals for next week

Finally, the moment has come to do a full code cleanup! This will help me significantly in future development cycles as I believe the code will be structured much better and will be easier to read.

As for now, the only thing missing is that the segments don’t cover a case with HTML tags inside. I planned in my GSoC proposal to start working on it this week, and so I will. The masking of HTML tags will help us to understand the structure better and cleanly show which opening/closing tags are missing inside a segment. I will firstly display an icon per defined tag (<p> and <div>), then I will create a toggle button to show and hide them.

As always, my progress and code can be found in my Github repository.

The 4th week of coding for Google Summer of Code is over and the evaluation period is starting. My goal for the first half of the coding process was to create a functional CKEditor plugin for displaying segments in connection with tmgmt_memory. My main part is the UI with specific focus on segments semantics and functionalities, the memory, on the other hand, contains suggested translations of segments, their quality and other useful information. Together with my mentors, we want to make TMGMT as pleasant as possible for the end users (translators). I believe my work is currently progressing really well and I think I’ve met most of our expectations so far. You can check my progress here.

Achievements

First of all, I wanted to make the usage of my plugin easier and right out of the box. That is why, I’ve updated the tmgmt_ckeditor.install file to preload a new text format (translation_html) and a dummy translatable node with some dummy segments. This is all loaded when the module is enabled. I’ve also added new dependencies for the tmgmt_demo module and tmgmt_memory. With this small additions, I managed to speed up my development process, but it also allows others to easily reproduce my development status. Note: this addition of a dummy segment is just a temporary setup until tmgmt_memory becomes fully functional.

My main focus was on connecting the plugin with the backend. For this, I used the setup described above with two simple functions from the API - addSegment and addSegmentTranslation, to make sure that there is a translation for the segments in the example translatable node. After that, I implemented a route for the lookup of a selected segment in the translation memory. I also provided a controller to answer the AJAX call that I do from my JS file. To test my API responses I used Postman. This is an app that allows you to send POST/GET/PUT/DELETE etc. requests to see API response. The code can be viewed in this commit.

I had a smaller issue here. My first version of the request was synchronous, which is deprecated because of its detrimental effects to the end user’s experience. In fact, synchronous requests block the execution of the code and waits until we get a valid response from our server. This can create “freezing” on the screen and an unresponsive user experience. I fixed it by following this JSON Http Request example. I created a callback function that gets the translation of the segment from the memory and displays it as a suggestion in the area below the editor. The user can then click on the button and replace the segment with the suggested translation. With a simple right click, it can be marked as completed (and the counter below is increased). I implemented the context menu item last week - see here for more details.

As my mentor miro_dietiker pointed out, “classes are pseudo application states and usually done for very simple indication so that you can use CSS for applying styles”. In other words, since we are building a complex application with various states, we will be using data-attributes to address them. So I dropped the class usage and started working on adding more data attributes to segments. I started with toggling active status when the segment is clicked and adding the source attribute. We will have to store if the translation was done by a user, came from a machine translator or from the memory.

There is still work to be done on the data attributes, as seen below. We need to properly define their naming and replace ids with data-ids.

Fourth version of the plugin

Goals for next week

I plan to continue working on implementing the remaining data attributes. Another issue will be to support multiple translation memory matches. We will display N results with N buttons to accept each for every translation suggestion.

My main focus will be on fixing the markup to do real pairing of the editors, because we have to support more than 2 editors on the page - paragraphs translation for example might result in many editor instances. For now, we have a for loop that goes through all editor instances on the page and checks for same segments. This adds lags in interaction and could result in a real bottleneck, that’s why it’s on top of our priority list.

We are also in discussions with my mentors about whether or not to use jQuery helpers in the plugin. Our initial plan was to have it purely written in Javascript, but I believe jQuery would provide helpers for many DOM lookups and would speed up my development process, other than significantly clean up the code.

This week of my Google Summer of Code project went by pretty fast, with lots of discussions and new ideas for future work with segments semantics, related to my CKEditor plugin for TMGMT. My initial goal is to extend the CKEditor with a new plugin written in Javascript, that defines segments as parts of the content and perform specific linguistic actions on them.

Achievements

My third week of work was quite busy, since I had to fix some issues, work on making the UI fully functional and add some new functionalities, as defined last week - described in my previous blog post.

The first fix I needed to do was to populate the segments in the translation editor from the source when the ckeditor is loaded. Users actions can then be performed on this segments. All active segments are marked with colors in both editors. We could have an issue, if there are more than two editors on page - for example if the source is a node with paragraphs, it could end up in having 10 or more wysiwyg editors on page. For now, we support only one wysiwyg field. In later iterations we might add the support for more and marking segments in the corresponding ones.

Apart from fixing stuff, I also added some new functionalites this week. For now, I hardcoded a translation proposal that will end up being a returned string from translation memory. On a click of a button, the user can confirm the suggestion is right and replace the whole content of a segment with it. After that the user can right click on a segment and select the option to set the segment’s status as completed. This ends up in marking the segment in both editors in green for easier visualization. We set also a counter in the area below to display the number of completed segments.

I also worked on the area below the editor. All relevant data is displayed in a much more simple and understandable way. I will make some propositions in the future to restyle it for optimal (proper) usage, based on the UI of Google Translate and some notable CAT tools.

The image below demonstrates the current progress.

Contextual action on segments

Goals for next week

As on schedule, we’re moving to the most intriguing part, connecting the frontend (my plugin) with the backend part. I will start the integration of tmgmt_memory (can be found in edurenye’s sandbox) by saving a few segments through the API and displaying the dummy translation as we are doing it now below the editor. These two functions are our starting point and will build on top of that once it’s done.

This operations could raise some extra complexity. We will have to define what happens when we will have multiple matches, for example. But this kind of questions are not in our priority list as for now.

Other than that, I will also need to fix a few smaller things, like enabling the contextual menu (right click item) only when the segments are displayed.

When all of the above is done, we will do a full code cleanup. The priority after that will be to make it really usable for users. I think one of the main blockers now is the fact that we support only two editors per page and I will focus on that part.

As the discussions with my mentor always result in a lot of new scopes and ideas. We defined new meta data for the segments to mark the quality of the translation and it’s source - if it comes from the user, machine translation or from the translation memory. We might do this by just setting the status to “needs review” and “needs work” initially. Also, another tag should be added if a segment is modified. With all this complexity of meta tags, we should define them clearly and keep good track of them. If I will find time, I might implement them this week, but the priority is connecting the translaton memory.

All my code can be viewed on my github project. Feel free to check my progress over the weeks.