Bug #183

wrong charset do not display foreign characters

Added by patrick_elx over 2 years ago. Updated over 2 years ago.

Status:Closed Start:10/02/2009
Priority:High Due date:10/16/2009
Assigned to:jacobsj % Done:

100%

Category:-
Target version:Caller ID Superfecta v 2.2.1 Maintenance Release

Description

When a source lookup has a result that include foreign extended characters (ie accents), the get_url_contents function in callerid.php did not translate the charset correctly.

I added a $ret=utf8_encode($ret) in the code to force an utf8 encode from iso8851 (SVN rev 141)

So far on all the source I've tested it's working well as iso8851 is used by all of them. However when we will have a source with another charset in the html page, I'm not sure it will fix it for them, but at least it won't be worse than it is right now.

Guru's of php and charset issues please advise if there's a better solution.
Meanwhile, I would suggest to adopt these patch (after some additional testing to check there's no regression) as it fix the issue with the sources that we have right now.

badchars.JPG (4.2 KB) tshif, 10/14/2009 07:38 pm


Related issues

related to Caller ID Superfecta - Bug #203: source-infobel returns odd characters in CID Closed 10/15/2009 10/26/2009
related to Caller ID Superfecta - Bug #215: Choose the CLID charset Closed 10/26/2009 10/26/2009

History

Updated by patrick_elx over 2 years ago

  • Target version set to Caller ID Superfecta v 2.2.0

Updated by tshif over 2 years ago

  • Target version changed from Caller ID Superfecta v 2.2.0 to Caller ID Superfecta v 2.2.1 Maintenance Release

Updated by jacobsj over 2 years ago

  • % Done changed from 70 to 100

Updated by tshif over 2 years ago

  • Status changed from Resolved to QA Testing
  • % Done changed from 100 to 90

Updated by tshif over 2 years ago

  • Assigned to set to tshif

Updated by tshif over 2 years ago

  • Status changed from QA Testing to Closed
  • % Done changed from 90 to 100

Accepted for build 2.2.1

Updated by patrick_elx over 2 years ago

I'm reopening this bug as I've seen some regression with infobel.

It's not a blocking issue, however we should find a way to fix it more nicely.

We should probably in the function get_url_contents($url) test the charset in use in the page, and then apply the proper de/encoding to utf8 if its needed.

Updated by tshif over 2 years ago

  • Due date set to 10/16/2009
  • Status changed from Closed to Assigned
  • Assigned to changed from tshif to jacobsj
  • Priority changed from High to Urgent
  • % Done changed from 100 to 50

Ok Patrick - I agree. The tickets open again and placed on Jeremy my work page. You are welcome to work it also of course - Jeremys availability is very tight right now.

Updated by jacobsj over 2 years ago

Patrick

take a look at the second function on this page:
http://mobile-website.mobi/php-utf8-vs-iso-8859-1-59

does that look like it might address our problem?

Updated by tshif over 2 years ago

Gentlemen - what is our present thinking on this bug?
I am concerned with this - as it effects one area that this update is hoped to address - that of NON USA CID, where we are most likely to find this occuring.

Updated by jacobsj over 2 years ago

Patrick, can you give me a phone number and a source that is giving the character problems?

Updated by patrick_elx over 2 years ago

it's with infobel, and only with private numbers return (not business).
In fact the more I'm looking at it the more it does not seem to be a charset issue, but a source problem.

try for instance: 33145229525
I have the last name then a black question mark then first name.

If you try a company 33153760789, it's ok.
A company with an é whose result is ok: 33145327298

A private number with an é: 33145482390. the é is ok, but still the black question mark after the name.

It could be that they added a character that we need to get rid of directly in the source-infobel file.

I won't be able to work on this problem for the next two days, then if you have a patch for it...

Updated by tshif over 2 years ago

If I understand correctly - we suspect this may be a source issue, rather than a generic charset issue. If that turns into a concensus - I will remove the infobel source from the live update repository untill we get it hammered out.

Please advise -

Updated by tshif over 2 years ago

Pending further assessment and discussion - the infobel source has been removed from the live update repository.

Updated by tshif over 2 years ago

Here's my example.

View the results in web form here:

http://www.infobel.com/en/france/Inverse.aspx?q=France

The other view is from inside debug in superfecta.

The number I use is: 33145482390

In debug, The name is captured ok - even the accented e in Dupré André. But at the end of Dupré there is a very odd character. (I think its a form-feed character)

I tried a cut and paste here to demonstrate, well see how it goes:

Dupré? André

Weirdly enough - its visible when I save this bug ticket! So - thats what it looks like.* FOr me, I see the letters FF at the top of a square box, and the letters FD at the bottom.

When I saved the ticket, the special character was lost.

heres what I saw:

Updated by tshif over 2 years ago

Updated by tshif over 2 years ago

  • Priority changed from Urgent to High

So far, for me, only infobel produces this result. Can anyone verify?

If this only occurs with the infobel source - then the problem is not with the core module, but in the infobel source - this means good news for release on Friday.

Updated by tshif over 2 years ago

  • Status changed from Assigned to Closed
  • % Done changed from 50 to 100

Ok - I cant duplicate this except with the infobel data source.
This case is being closed as it seems that the problem is with the data source, not the module core itself.
See #203 for the continuation of this ticket.

Also available in: Atom PDF