Arabic , PDF And Joomla!

Update August 2nd, 2007: I will be releasing this hack in the following few days Insha’Allah.

Update September 4th, 2007: I’ve released the code on SourceForge you can access it by checking out the CVS there.
The project url: http://sourceforge.net/projects/arabic-tcpdf

CVS Checkout command

cvs -z3 -d:pserver:anonymous@arabic-tcpdf.cvs.sourceforge.net:/cvsroot/arabic-tcpdf co -P artcpdf

It is a known fact that Joomla! 1.0.x can export content into pdf format, however, this feature doesn’t seem to be working well when used to export Arabic content to pdf format, either windows-1256 or UTF-8 encoded.

To address this issue, The guys at the Joomla! core has moved to TCPDF library to export pdf files, this library has support for images and more importantly UTF-8 which means it can -theoritaclly speaking- be used to handle pretty much ANY langauge, This is true for many Left-To-Right languages, however, For a Right-To-Left language like Arabic the situation is quite different I’ll try to brief the problem here.

First off, Arabic is a complex script language, This means that putting letters side by side is not just enough, A typical Arabic letter has different combining forms based on its context in the word.
For a PDF generator to handle Arabic text correctly it must pre-process the input text and determine which combining form to render “Shaping” ,otherwise the text will appear as separated Arabic letters as shown in the following image, So far Joomla! 1.5 PDF engine doesn’t take that in consideration which means any Arabic text exported by Joomla! will appear as separated letters.

orignal article

Original text as seen in the HTML version of the sample article

PDF output without modifications

PDF output of the same article (Note that letters are separated and rendered from left to right),Note that letters are separated and rendered from left to right.

To tackle this issue I’m using a class called ArGlyphs by Khaled Al-Shamaa.

Quoting from the project description on phpclasses.org

The class takes as input Arabic text encoded using Windows-1256 character set and performs Arabic glyph joining to output a string encoded using UTF-8.

So basically this class will fix this shaping problem , I had to modify it a bit to accept UTF-8 without messing it up, I used the phpUTF8 library which is bundled with Joomla! 1.5

Now the output looks like this

PDF after shaping

The letters are no longer separated but are still rendered from left to right.

Which brings us to the second problem,The TCPDF Library only outputs text from Left-To-Right which causes RTL scripts to be displayed in reversed order, simply reversing the string before passing it to the rendering engine won’t fix this problem, I think FriBidi may fix this but it is not available with every php Installation so I can’t relay on it to fix this problem for Joomla! ,So I gave it a shot and written my own code (I always like to write my own code ;P), The code I wrote simply breaks down text into pieces based on its UTF-8 range and then reverses the pieces “runs” that are identified as RTL “like Arabic” and leaves other runs without modification.

PDF after shaping and RTL’ing

Output after shaping and RTL’ing

Letters are correctly shaped “with minor errors” and are rendered right to left, however, the lines are reversed , you should be able to read it correctly by starting from the bottom line and proceed upwards,I’m yet to fix this problem , If you have any suggestions on how to fix this or an easier way of doing the whole thing please let me know, I think there gotta be a better way of doing this.

Edit ( 8/5/2007 ) : I have managed to fix that issue but the code still needs to much clean-up and bug-proofing , here’s a sample of the output.



17 Responses to “Arabic , PDF And Joomla!”

  1. jawad says:

    Salam 3alaykom
    I’m am from morocco .I want to develop a web sit in arabic using Joomla
    if there is a version in Arbic or any help please tell me as soon as possible

  2. srini says:

    Hi,

    I am using the TCPDF for pdf creation.As you said it does not prints the arabic characters right to left and charcters are not joined as required.I have tried ArGlyphs class to fix this problem but still no luck.Could you elaborate how do you used this class to tackle this issue?.Thanks in advance

  3. مصطفى says:

    Unfortunately I’m having exams these days and I don’t have much time on me, I’ll try to brief it here for now, I’ll write a more elaborate explanation of the process as soon as I have more time, The basic idea is that I pass the Arabic text to ArGlyphs for reshaping, after that I break the reshaped text into Lines based on its length, reverse each line and send text line by line to the PDF Engine.

  4. Srini says:

    Thanks for your reply!

    “The code I wrote simply breaks down text into pieces based on its UTF-8 range and then reverses the pieces “runs” that are identified as RTL “like Arabic” and leaves other runs without modification.”

    Could you please post the code for the modifications?

  5. wow, amazing. You did a nice job. but can you pleas tell me how to download the modified files, pleas.

  6. Specialist says:

    عمل قمة في الروعة يا دوك
    بس لو تسمحلنا نستخدمه بشكله الحالي ونحاول نعدل فيه ونطوره،

    Please tell me how to Download the modified files.

  7. مصطفى says:

    الملفات موجودة في الـ cvs , لم أقم بنوفير الكود الخاص بجملة حيث أني توقفت عن العمل فيه , أقوم حالبا بالتعامل مع مكتبة tcpdf مباشرة و الكود الموجود في الـ cvs هو الكود الخاص بالمكنبة و ليس جملة.

  8. عبدالرحمن says:

    هذا عمل مفيد ويسد ثغرة هامة في استخدام العربية على الانترنت
    ويخدم ملايين البشر ، منهم من يعرف عنه ومنهم من يستفيد منه دون ان يعرف عنه
    فبالنيابة عن هؤلاء الملايين من البشر الذين يتحدثون ويقرأون العربية ونيابة عن تريلونات الكلمات والصفحات

    شكرا لك بعددهم … شكرا شكرا

    عبدالرحمن – الرياض

  9. مصطفى says:

    أشكر جميع الأخوة على تشجيعهم , الكود لا يزال تجريبيا و أمامه الكثير , يمكن لكل من لديه خبرة في الـ PHP ان يقوم بالحصول على الكود ومحاولة تحسينه .

  10. Joomfa says:

    Hello Doctor drsh,
    Thanks
    but i can`t download it
    do you can take better link?
    this is just svn version.

    Thanks

  11. صابر says:

    شكرا على المجهود الجبار
    إني أحاول المساعدة في تحسين خاصية بي دي إف في جملة 1.5 و الحمد للله لقد وجدت بعض الحلول خاصة لإضهار صورة
    GIF
    التي لا يدعمها تي سي بي دي إف و هذا بعد تعديل الكود الخاص ب البي آش بي 5

    هنا

    و قمت بتعديلع ليتناسب و البي آش بي 4 حيث أن جملة 1.5 يستعمل نسخة البي آش بي 4
    الحل موجود هنا
    http://forum.joomla.org/index.php/topic,228896.msg1130678.html#msg1130678

    و بقيت إضهار اللغة العربية في البي دي إف

  12. Rashid says:

    Hi, can someone please help me setup an arabic joomla website using joomfish. The default language is English. I want to add the arabic language. Well I have been trying to do so, but the arabic charachters dont appear on the right hand side, as it should be.

    Thanks.

  13. محمد says:

    مش فاهم حاجة , أنا عندي موقع (جملة) وعندي مشكلة الـ Arabic Pdf

    أرجو حد يفهمني الحل ! ! !

  14. Erika says:

    The current original TCPDF version (http://www.tcpdf.org) fully supports all RTL languages including arabic using the bidirectional algorithm. Starting from 3.0.007 version, TCPDF includes all fixes needed to properly display Arabic texts.

  15. مصطفى says:

    @Erika, Will test it and update this post if necessary.

  16. Syed Naqvi says:

    Salaams..how can we get the files? The link you mention at sourceforge doesn’t have any downloads.

  17. Traig says:

    I am looking for an Arabic Pdf for Joomla 1,5/ I am also looking for som eone to redesing my joomla home page. Thanks

Leave a Reply