Feature #248

User request data source for Argentina

Added by tshif about 2 years ago. Updated about 1 year ago.

Status:Closed Start:11/26/2009
Priority:Normal Due date:12/20/2010
Assigned to:tshif % Done:

100%

Category:-
Target version:Caller ID Superfecta Source Files

Description

By SamK, Friday, October 30, 2009 @ 8:19 pm

Yell Group (http://www.yellgroup.com/) has reverse name look-up pages for Argentina, Chile and Peru:

Argentina - http://www.paginasblancas.com.ar/Telefonos.action
Chile - http://blancas.amarillas.cl/ See feature 400
Peru - http://m.paginasamarillas.pe/yell2mobile/pbol.do?method=init See feature 401

areacodes.txt (23.3 KB) jkiel, 12/17/2010 08:39 am

getar.php (702 Bytes) jkiel, 12/17/2010 08:39 am

areacodes.txt - Deduplicated (2.4 KB) jkiel, 12/17/2010 09:18 am

getar.php - With deduplication (961 Bytes) jkiel, 12/17/2010 09:18 am

History

Updated by lgaetz over 1 year ago

Since I wanted to learn how to extract CNAM data from a webpage and since my Spanish is not terrible, I cobbled together a rough source for Argentina. This needs a bit of work, there is no number checking for acceptable format because my knowledge of number format for Argentina is nil. I also did not test this exhaustively so I don't know if say business results are reported differently, or if multiple results are displayed on some searches, etc. It is a very good start tho. CID must be passed to this lookup source as 11 digits with no non-digit characters.

<?php
//this file is designed to be used as an include that is part of a loop.
//If a valid match is found, it should give $caller_id a value
//available variables for use are: $thenumber
//retreive website contents using get_url_contents($url);

//configuration / display parameters
//The description cannot contain "a" tags, but can contain limited HTML. Some HTML (like the a tags) will break the UI.
$source_desc = "Searches http://www.paginasblancas.com.ar/ requires number in format XXXXXXXXXXX <br>This data source requires Superfecta Module version 2.2.1 or higher.";

//run this if the script is running in the "get caller id" usage mode.
if($usage_mode == 'get caller id')
{
    if($debug)
    {
        print "Searching http://www.paginasblancas.com.ar/ ... ";
    }

//    url requires number in this format xxx-xxxx-xxxx 

        $number_1 = substr ($thenumber,0,3);
        $number_2 = substr ($thenumber,3,4);
        $number_3 = substr ($thenumber,7,4);
//    build the string that does the search
        $number_4 = $number_1."-".$number_2."-".$number_3;

        $url="http://www.paginasblancas.com.ar/busqueda-telefono/argentina/".$number_4;
        $value = get_url_contents($url);
        $notfound = strpos($value, "Su búsqueda no produjo ningún resultado");
        $notfound = ($notfound < 1) ? strpos($value, "Su búsqueda no produjo ningún resultado") : $notfound;

        if($notfound)
        {
            $name = "";
        }
        else
        {
            if ((strpos($value, '<TITLE>PaginasAmarillas.com.ar - Telefonos</TITLE>')) > 0)
            {
                $begin = strpos($value, ">", strpos($value, '<H2 class="alta">')) + 1;
                $end = strpos($value, "<", $begin);
                $name = trim(substr($value, $begin, $end-$begin));
            }
        }

        if(strlen($name) > 1)
        {
            $caller_id = strip_tags($name);

        }
        else if($debug)
        {
            print "not found<br>\n";
        }

}
?>

Updated by lgaetz over 1 year ago

Did a bit of research on Argentina phone numbers on Wikipedia: http://en.wikipedia.org/wiki/Telephone_numbers_in_Argentina As stated above, the code requires 11 digits which it passes to the website in the form xxx-xxxx-xxxx, which I belive will only work for Buenos Aires (BA) numbers. Argentina numbers outside of BA are composed of 3 parts, first is area code, second and third are the subscriber number split in two parts. The subscriber number can be 6 or 7 digits so either XX-XXXX or XXX-XXXX, and the area code can be 4 or 5 digits starting with a 0. There are at least 5 pattern combinations (maybe more that I am missing) and the website requires the number to be split properly:

011-XXXX-XXXX (BA 011 plus 8 digits 3-4-4 so all 11 digits starting with 011)
0XXX-XX-XXXX (0 plus 9 digits 4-2-4 so all 10 digit numbers)
0XXX-XXX-XXXX (0 plus 10 digits 4-3-4)
0XXXX-XX-XXXX (0 plus 10 digits 5-2-4)
0XXXX-XXX-XXXX (0 plus 11 digits 5-3-4 so all 12 digit numbers)

I need some logic that will separate 3 and 4 on the list above so that the number can be split properly as 4-3-4 or as 5-2-4.

Updated by lgaetz over 1 year ago

I have uploaded source for testing called source-Paginasblancas_AR.php. It seems to work well, but some of the area codes are missing. I need a better source of area codes besides wikipedia.

Updated by lgaetz over 1 year ago

I am stalled on this one ...

Argentina source works. I need someone with knowledge of AR phone numbers to help with the missing area codes and do some testing.

The Chile webpage listed above hides the URL when you search so I don't know what URL to grab the contents of, if anyone can find this out, I can do some work on this.

I can't figure out how to enter a phone number in the Peru webpage to get a unique result, ie: If I do a search for "hotel" and get a number that looks like this: " (+51) (43) 352281 ", I can only get a result if I do a number search with "352281" which yeilds more than one match. Anyone who can educate me on how to use the peru reverse number serch can get me started on coding.

Updated by tshif about 1 year ago

  • Status changed from Reviewed to Feedback
  • % Done changed from 0 to 10

lgaetz - should we consider releasing this data source now? As an Argentina only source?

Updated by lgaetz about 1 year ago

This ticket is actually three different feature requests, I have a good start on Argentina, the other two have not been addressed at all (yet!).

I am just starting to get a handle on the new functions, I am thinking of rewriting this to use John's area code function. It also needs a more complete list of area codes, so give me the weekend and with a bit of luck it should be ready for the grand unveiling.

Updated by tshif about 1 year ago

Awesome! :)

Feel free to consider breaking the data source into three seperate data sources if you feel thats the right way to go.

Updated by lgaetz about 1 year ago

r318 is totally revised Argentina lookup source. I have tested a fair bit using debug and numbers taken at random from google everything looks clean to me.

It is still short of area codes, and this page [[http://www.cnc.gov.ar/infotecnica/numeracion/indicativosinter.asp?offset=0]] seems to indicate that there are nearly 3000 (!) area codes. If anyone can figure out how to extract them all from this website then this may well be complte

Updated by jkiel about 1 year ago

Attached are the area codes harvested from http://www.cnc.gov.ar/infotecnica/numeracion/indicativosinter.asp and the simple script I wrote to harvest them.

Updated by lgaetz about 1 year ago

r319 updated with 2900 plus area codes and passed my testing

Updated by jkiel about 1 year ago

I noticed that there is a great amount of duplication in the area codes harvested. I'll change the script to eliminate duplicate area codes....

Updated by jkiel about 1 year ago

See attached for updated harvest script with deduplication, and a much smaller area codes file.

Updated by lgaetz about 1 year ago

r320 fixes duplicate area codes

Thanks for the script and the number checking, John. You would have thought that I might have glanced at the area codes at least once and noticed the duplication. In my defence it is Egg Nog season.

Updated by lgaetz about 1 year ago

  • Status changed from Feedback to QA Testing

Updated by jkiel about 1 year ago

I wish I could use Egg Nog season as my excuse .. but I don't have one other than simple absent mindedness.

I wish I knew more about phone numbers in Argentina.
Looking over the area codes further, I see allot of potential matching issues, like:
"0220","02202"

A phone number like 02202555555 would match with 0220 before 02202, simply because 0220 comes first in the array. The question is, will this stop lookup from working?

Updated by lgaetz about 1 year ago

jkiel wrote:

Looking over the area codes further, I see a lot of potential matching issues, like: "0220","02202"

Yes this will almost certainly be a problem with the way it is written at the moment. Number formats in Argentina are really messy, and without intimate knowledge of how they work, the code as it exists now is as good as it will get. I spent (too much!) time on this, mostly as a learning exercise. I fear that my efforts may well be wasted; there doesn't appear to be a demand for it.

Updated by jkiel about 1 year ago

I'll update the cisf_find_area function in callerid.php to search for the longest match, rather than the first. I think that should take care of at least that issue.

Updated by lgaetz about 1 year ago

instead of changing the area code function, can we not just reorder the area code list for the lookup source such that longer area codes are listed first? Delving into the fuction code means testing again ...

Updated by tshif about 1 year ago

jkiel wrote:

I'll update the cisf_find_area function in callerid.php to search for the longest match, rather than the first. I think that should take care of at least that issue.

I know it means testing again - but this seems like the most robust way to fix it. If we depend on the order of the area codes - it will be much harder to maintain, and future maintainers might miss the requirement to 'specially organize' them. Just my 2 cents.

Updated by jkiel about 1 year ago

Looking forward, I think it's best to make the function easier to use and more flexible -- especially with how long area code lists can get.

I wouldn't worry about the testing -- any issues with the relatively simple function should make themselves apparent very quickly.

Updated by jkiel about 1 year ago

See r324 for changes to cisf_find_area

Updated by tshif about 1 year ago

  • Due date set to 12/20/2010
  • Assigned to set to lgaetz
  • % Done changed from 10 to 50

lgaetz - do you have time for acceptance testing this final upgrade/change?

Updated by lgaetz about 1 year ago

Tested using debug for dozens of numbers across lots of area codes, no issues no php errors.

I only now realize that the website search will still work if the leading 0 in the area code is ignored, and judging by how the number appear in google searches, they seem to treat the leading 0 as optional For the lookup to work, the leading 0 must be present, so I commited a fix to the description that explains that. Users may have to use CID rules to add the 0 if not present. See r330

Updated by lgaetz about 1 year ago

  • Subject changed from User request data soures for Argentina, Chile and Peru to User request data soures for Argentina

I have split off the two other lookup source requests into separate features, 400 and 401.

Updated by lgaetz about 1 year ago

  • Subject changed from User request data soures for Argentina to User request data source for Argentina

Updated by tshif about 1 year ago

Is this data srouce ready for QS/ release?

Updated by lgaetz about 1 year ago

Yes.

Updated by lgaetz about 1 year ago

In case you missed it, this lookup depends on the 2.2.4 functions

Updated by tshif about 1 year ago

  • Assigned to changed from lgaetz to tshif
  • % Done changed from 50 to 70

Updated by tshif about 1 year ago

  • Status changed from QA Testing to Closed
  • % Done changed from 70 to 100

QS: Pass. No Ill effects noted from this data source.
Accepted for Build 2.2.4
Cant be added to live update untill 2.2.4 is released.

Also available in: Atom PDF