Improve our PHP address scraper.
$250-750 AUD
Paid on delivery
We require a basic postal address scraper – a function to scrape and parse postal address data from arbitrary web pages – by improving or replacing supplied source code function (with helpers).
This is part of a larger project. Success with this component may lead to opportunities for other components also. We are also looking for one permanent employee to move to Australia (if necessary).
Summary
There is no international standard for postal addresses and they differ, even in same country from different data input users. The problem requires a definition as to what an address is.
The suggested answer(but not necessarily the only answer) to this question is: a text string which normally contains a postcode and/or country or and/or city name and at least one of the words or abbreviations - "st,rd,street,road, center, mall,plaza etc etc". For full lists of indicators and country/state,county, postcodes, postcode regexes, see supplied source code.
Step one is to find the webpage from the supplied url that is most likely to contain an address – normally called “about us”, “contact us” or similar. Once the html from the appropriate page is obtained, we look for the indicators.
If we can find any of these indicators in the text from an arbitrary web address then we extract the text either side of the indicator and start pruning off things that do not make sense - like HTML tags, punctuation etc.
The address string is then parsed, either using an API like google maps or a library, to place the data into known fields such as “street number”, “street”, ”suburb”, “state”, “postcode” and “country”.
An existing PHP function will be provided that shows one attempt to accomplish this. It is only 20-50% effective and its effectiveness need to be increased to 80%+ for this project to be completed successfully. It is also missing the parsing stage, the successful developer can choose the library he or she prefers.
Existing function relies on several helper functions which will also be supplied. The purpose of these helper functions can be provided on request. All helper functions and supplied source code are copyright protected.
Terms and Conditions
An NDA and copyright assignment needs to be executed before work begins.
A small coding skills test is required to be passed before any proposal will be considered:
“Tell me what lines 1214-1220 of the supplied sample source code does?”
No funds will be released until a working beta function is delivered, when 33% of funds will be released. The beta function must be an improvement over the existing source code. Final funds will be released when finished function is delivered, and it can extract valid, parsed address data from an arbitrary list of 100 urls (that are linked to pages that contain or contain themselves, valid address data) with an 80% success rate (list to be supplied on request for final payment).
Needs to be php 5.4.3 script function. Needs to return array for use in another PHP function. You do not need to use HTMLSimple Dom - you can use any parser you want (as long as you provide any dependencies with finished code) or no parser library. No frameworks! I am only using straight php 5.4.3.
Function to return following array (with sample data):
$res['emails'] = ''; string of emails semi-colon separated such as me at somewhere (dot) com ; me at somewhereelse (dot) co (dot) uk
$res['fax'] = ''; eg: 25734895875
$res['ph'] = ''; eg: 2345908455
$res['address'] = ''; eg: 24/43 xxxxxxx,34 xxxxxxxxx St,cccccccccc, cccccccc, wwwwww, 4444
$res['twitter'] = '';eg: [login to view URL]
$res['facebook'] = ''; eg: [login to view URL]
$res['instagram'] = ''; eg: [login to view URL]
$res['pinterest'] = ''; eg: [login to view URL]
$res['linkedin'] = ''; eg: [login to view URL]
$res['parsed_address'] = ''; eg: address1:24/43 xxxxxxx;address2:34 xxxxxxxxx St,address3:cccccccccc, cccccccc,address4: wwwwww, 4444
Remember:
* some address data is stored within html tags so it is not possible to just strip tags too early. Also tags can tell you a lot about where addresses start and end, where text alone really cant.
* some addresses do not have clear indicators like a city or a postcode - but they may have other indicators like:
center, mall, building, street, road, avenue, etc etc. Use any indicators you want - mine are in code examples.
* some addresses can only be identified by the white space around them at top, bottom, left and most importantly right - working out an algorithm to find unused white space at the right of text might help to grap addresses...
* code examples contain cities and postcodes for many countries and some states/towns within countries. Ideally a lot more states, towns and postcodes in many more countries and/or regexes may be required to hit the 80% success rate.
*Only focus on English-speaking countries. No point having 80% success rate on non-english sites - does not count. Only 85% success on English language sites will be accepted.
* May call open source dependencies from inside the function - source for these must be supplied with finished project
* Full copyright in source code remains or is assigned to me
* You do not need to use HTMLSimple Dom - you can use any parser you want (as long as you provide any dependencies with finished code) or no parser library.
Project ID: #10448462
About the project
20 freelancers are bidding on average $495 for this job
I am very confident about the job as I have done alot of similar work................................................................
hello, i am expert with php and i can help you this project, please contact with me to start, i will complete good for you
Hi sir, I am scraping expert, I have did too many similar projects, please check my feedback then you will know. Can you tell me more details? then I will provide demo data for you. Thanks, Kimi
Hi, Our Zend Certified PHP Engineers with 9+ years of experience can do your project precisely and exactly as per your requirements. We have expertise in PHP / MySQL / JQuery / CSS / JavaScript / HTML / XML / SEO More
Sir, I am well versed in this kind of jobs and can do your project as per requirement. I have over 8 years of experiences. I am very much able to work on this. ***I am ready to start
Check my profile. I'm very fast, reliable, and profesional. I'm easily 3 times faster than other programmers.
Lines 1214-1221 (1220 is not the complete loop) foreach($countries as $key => $value){ foreach($value as $xkey => $xval){ //echo $xkey." - ".$xval."<br />"; if($xkey<>"pformat"){ More
Hello, I am expert in PHP and ANDROID, I know Scraping very well ! and i gauranteed that you will not disappointed trust me and please do not ignore this proposal
We can get it done for you and also we are willing for small coding skills test . beta functions implementation and testing will all be our job.
Hi there, I have gone through your requirements. Our expertise meets the same. We can do this project as we hold expertise in similar projects. We would like to discuss more on your requirements. Can we have an More