Improve our PHP address scraper.

Closed Posted 7 years ago Paid on delivery
Closed Paid on delivery

We require a basic postal address scraper – a function to scrape and parse postal address data from arbitrary web pages – by improving or replacing supplied source code function (with helpers).

This is part of a larger project. Success with this component may lead to opportunities for other components also. We are also looking for one permanent employee to move to Australia (if necessary).

Summary

There is no international standard for postal addresses and they differ, even in same country from different data input users. The problem requires a definition as to what an address is.

The suggested answer(but not necessarily the only answer) to this question is: a text string which normally contains a postcode and/or country or and/or city name and at least one of the words or abbreviations - "st,rd,street,road, center, mall,plaza etc etc". For full lists of indicators and country/state,county, postcodes, postcode regexes, see supplied source code.

Step one is to find the webpage from the supplied url that is most likely to contain an address – normally called “about us”, “contact us” or similar. Once the html from the appropriate page is obtained, we look for the indicators.

If we can find any of these indicators in the text from an arbitrary web address then we extract the text either side of the indicator and start pruning off things that do not make sense - like HTML tags, punctuation etc.

The address string is then parsed, either using an API like google maps or a library, to place the data into known fields such as “street number”, “street”, ”suburb”, “state”, “postcode” and “country”.

An existing PHP function will be provided that shows one attempt to accomplish this. It is only 20-50% effective and its effectiveness need to be increased to 80%+ for this project to be completed successfully. It is also missing the parsing stage, the successful developer can choose the library he or she prefers.

Existing function relies on several helper functions which will also be supplied. The purpose of these helper functions can be provided on request. All helper functions and supplied source code are copyright protected.

Terms and Conditions

An NDA and copyright assignment needs to be executed before work begins.

A small coding skills test is required to be passed before any proposal will be considered:

“Tell me what lines 1214-1220 of the supplied sample source code does?”

No funds will be released until a working beta function is delivered, when 33% of funds will be released. The beta function must be an improvement over the existing source code. Final funds will be released when finished function is delivered, and it can extract valid, parsed address data from an arbitrary list of 100 urls (that are linked to pages that contain or contain themselves, valid address data) with an 80% success rate (list to be supplied on request for final payment).

Needs to be php 5.4.3 script function. Needs to return array for use in another PHP function. You do not need to use HTMLSimple Dom - you can use any parser you want (as long as you provide any dependencies with finished code) or no parser library. No frameworks! I am only using straight php 5.4.3.

Function to return following array (with sample data):

$res['emails'] = ''; string of emails semi-colon separated such as me at somewhere (dot) com ; me at somewhereelse (dot) co (dot) uk

$res['fax'] = ''; eg: 25734895875

$res['ph'] = ''; eg: 2345908455

$res['address'] = ''; eg: 24/43 xxxxxxx,34 xxxxxxxxx St,cccccccccc, cccccccc, wwwwww, 4444

$res['twitter'] = '';eg: [login to view URL]

$res['facebook'] = ''; eg: [login to view URL]

$res['instagram'] = ''; eg: [login to view URL]

$res['pinterest'] = ''; eg: [login to view URL]

$res['linkedin'] = ''; eg: [login to view URL]

$res['parsed_address'] = ''; eg: address1:24/43 xxxxxxx;address2:34 xxxxxxxxx St,address3:cccccccccc, cccccccc,address4: wwwwww, 4444

Remember:
* some address data is stored within html tags so it is not possible to just strip tags too early. Also tags can tell you a lot about where addresses start and end, where text alone really cant.
* some addresses do not have clear indicators like a city or a postcode - but they may have other indicators like:
center, mall, building, street, road, avenue, etc etc. Use any indicators you want - mine are in code examples.
* some addresses can only be identified by the white space around them at top, bottom, left and most importantly right - working out an algorithm to find unused white space at the right of text might help to grap addresses...
* code examples contain cities and postcodes for many countries and some states/towns within countries. Ideally a lot more states, towns and postcodes in many more countries and/or regexes may be required to hit the 80% success rate.
*Only focus on English-speaking countries. No point having 80% success rate on non-english sites - does not count. Only 85% success on English language sites will be accepted.
* May call open source dependencies from inside the function - source for these must be supplied with finished project
* Full copyright in source code remains or is assigned to me
* You do not need to use HTMLSimple Dom - you can use any parser you want (as long as you provide any dependencies with finished code) or no parser library.

PHP

Project ID: #10448462

About the project

20 proposals Remote project Active 7 years ago

20 freelancers are bidding on average $495 for this job

mmagr99

I am very confident about the job as I have done alot of similar work................................................................

$777 AUD in 10 days
(190 Reviews)
8.3
hoangvandungbk

hello, i am expert with php and i can help you this project, please contact with me to start, i will complete good for you

$315 AUD in 5 days
(389 Reviews)
7.5
mantislin

Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi

$503 AUD in 6 days
(283 Reviews)
7.5
omtechnologies

Hi, Our Zend Certified PHP Engineers with 9+ years of experience can do your project precisely and exactly as per your requirements. We have expertise in PHP / MySQL / JQuery / CSS / JavaScript / HTML / XML / SEO More

$277 AUD in 10 days
(205 Reviews)
7.7
sonarkaushik

Sir, I am well versed in this kind of jobs and can do your project as per requirement. I have over 8 years of experiences. I am very much able to work on this. ***I am ready to start

$477 AUD in 5 days
(36 Reviews)
5.5
rana100

Dear Sir, I am very much interested to work in your project. I know the following languages that will meet the techincal needs of your project PHP, MYSQL,Wordpress,Javascript, CSS, Boostrap, responsive design. I c More

$555 AUD in 10 days
(25 Reviews)
5.1
projectivemotion

Check my profile. I'm very fast, reliable, and profesional. I'm easily 3 times faster than other programmers.

$421 AUD in 0 days
(15 Reviews)
4.8
whitealien

Lines 1214-1221 (1220 is not the complete loop) foreach($countries as $key => $value){ foreach($value as $xkey => $xval){ //echo $xkey." - ".$xval."<br />"; if($xkey<>"pformat"){ More

$555 AUD in 5 days
(12 Reviews)
5.0
dunitech

Hey There !! I have read the job description of yours and ready to start work with you as i am having major 12+ years experience in php, MySQL ,laravel, yii, wordpress , asp.net,C# mssql, .net, mvc, MVC4, MVC5,and More

$666 AUD in 7 days
(4 Reviews)
4.4
craftbox

Hello, I am expert in PHP and ANDROID, I know Scraping very well ! and i gauranteed that you will not disappointed trust me and please do not ignore this proposal

$250 AUD in 2 days
(2 Reviews)
1.6
hcldevelopment

We can get it done for you and also we are willing for small coding skills test . beta functions implementation and testing will all be our job.

$705 AUD in 10 days
(0 Reviews)
0.0
PolusSolutions

Hi there, I have gone through your requirements. Our expertise meets the same. We can do this project as we hold expertise in similar projects. We would like to discuss more on your requirements. Can we have an More

$555 AUD in 15 days
(0 Reviews)
0.0