Saturday, 28 September 2013

Visual Web Ripper: Using External Input Data Sources

Sometimes it is necessary to use external data sources to provide parameters for the scraping process. For example, you have a database with a bunch of ASINs and you need to scrape all product information for each one of them. As far as Visual Web Ripper is concerned, an input data source can be used to provide a list of input values to a data extraction project. A data extraction project will be run once for each row of input values.

An input data source is normally used in one of these scenarios:

    To provide a list of input values for a web form
    To provide a list of start URLs
    To provide input values for Fixed Value elements
    To provide input values for scripts

Visual Web Ripper supports the following input data sources:

    SQL Server Database
    MySQL Database
    OleDB Database
    CSV File
    Script (A script can be used to provide data from almost any data source)

To see it in action you can download a sample project that uses an input CSV file with Amazon ASIN codes to generate Amazon start URLs and extract some product data. Place both the project file and the input CSV file in the default Visual Web Ripper project folder (My Documents\Visual Web Ripper\Projects).

For further information please look at the manual topic, explaining how to use an input data source to generate start URLs.


Source: http://extract-web-data.com/visual-web-ripper-using-external-input-data-sources/

Thursday, 26 September 2013

Using External Input Data in Off-the-shelf Web Scrapers

There is a question I’ve wanted to shed some light upon for a long time already: “What if I need to scrape several URL’s based on data in some external database?“.

For example, recently one of our visitors asked a very good question (thanks, Ed):

    “I have a large list of amazon.com asin. I would like to scrape 10 or so fields for each asin. Is there any web scraping software available that can read each asin from a database and form the destination url to be scraped like http://www.amazon.com/gp/product/{asin} and scrape the data?”

This question impelled me to investigate this matter. I contacted several web scraper developers, and they kindly provided me with detailed answers that allowed me to bring the following summary to your attention:
Visual Web Ripper

An input data source can be used to provide a list of input values to a data extraction project. A data extraction project will be run once for each row of input values. You can find the additional information here.
Web Content Extractor

You can use the -at”filename” command line option to add new URLs from TXT or CSV file:

    WCExtractor.exe projectfile -at”filename” -s

projectfile: the file name of the project (*.wcepr) to open.
filename – the file name of the CSV or TXT file that contains URLs separated by newlines.
-s – starts the extraction process

You can find some options and examples here.
Mozenda

Since Mozenda is cloud-based, the external data needs to be loaded up into the user’s Mozenda account. That data can then be easily used as part of the data extracting process. You can construct URLs, search for strings that match your inputs, or carry through several data fields from an input collection and add data to it as part of your output. The easiest way to get input data from an external source is to use the API to populate data into a Mozenda collection (in the user’s account). You can also input data in the Mozenda web console by importing a .csv file or importing one through our agent building tool.

Once the data is loaded into the cloud, you simply initiate building a Mozenda web agent and refer to that Data list. By using the Load page action and the variable from the inputs, you can construct a URL like http://www.amazon.com/gp/product/%asin%.
Helium Scraper

Here is a video showing how to do this with Helium Scraper:


The video shows how to use the input data as URLs and as search terms. There are many other ways you could use this data, way too many to fit in a video. Also, if you know SQL, you could run a query to get the data directly from an external MS Access database like
SELECT * FROM [MyTable] IN "C:\MyDatabase.mdb"

Note that the database needs to be a “.mdb” file.
WebSundew Data Extractor
Basically this allows using input data from external data sources. This may be CSV, Excel file or a Database (MySQL, MSSQL, etc). Here you can see how to do this in the case of an external file, but you can do it with a database in a similar way (you just need to write an SQL script that returns the necessary data).
In addition to passing URLs from the external sources you can pass other input parameters as well (input fields, for example).
Screen Scraper

Screen Scraper is really designed to be interoperable with all sorts of databases. We have composed a separate article where you can find a tutorial and a sample project about scraping Amazon products based on a list of their ASINs.


Source: http://extract-web-data.com/using-external-input-data-in-off-the-shelf-web-scrapers/

Wednesday, 25 September 2013

How to scrape Yellow Pages with ScreenScraper Chrome Extension

Recently I was asked to help with the job of scraping company information from the Yellow Pages website using the ScreenScraper Chrome Extension. After working with this simple scraper, I decided to create a tutorial on how to use this Google Chrome Extension for scraping pages similar to this one. Hopefully, it will be useful to many of you.
1. Install the Chrome Extension

You can get the extension here. After installation you should see a small monitor icon in the top right corner of your Chrome browser.
2. Open the source page

Let’s open the page from which you want to scrape the company information:


3. Determine the parent element (row)

The first thing you need to do for the scraping is to determine which HTML element will be the parent element. A parent element is the smallest HTML element that contains all the information items you need to scrape (in our case they are Company Name, Company Address and Contact Phone).  To some extent a parent element defines a data row in the resulting table.

To determine it, open Google Chrome Developer Tools (by pressing Ctrl+Shift+I), click the magnifying class (at the bottom of the window) and select the parent element on the page. I selected this one:

As soon as you have selected it, look into the developer tools window and you will see the HTML code related to this element:



Source: http://extract-web-data.com/how-to-scrape-yellow-pages-with-screenscraper-chrome-extension/

Tuesday, 24 September 2013

LiveHTTPHeaders Sniffer Review

This post deals with LiveHTTPHeaders, the FireFox add-on, by Daniel Savard. This built-in tool is designed to show live HTTP headers and traffic content, too. This tool stands among other simple and multy-functional HTTP traffic analysers.

The sniffer displays all the items related to requests and responses such as cookie, headers, caching info and others. I like it for its simplicity. In order to see the request’s content, just mouse over the corresponding request line.


The sniffer supports HTTPS and makes it possible to edit and replay some (GET or POST) HTTP requests (on Headers tab choose a line and press ‘Replay’).

Results filtering is done only with Regexes (Config tab). The logs can be saved in any text format.
Summary

This HTTP sniffer is a simple, lightweight tool for quick monitoring of HTTP headers. It certainly is a help for web applications debugging.



Source: http://extract-web-data.com/livehttpheaders-sniffer-review/

Monday, 23 September 2013

Internet Outsourcing Data Entry to Third World Countries

Outsourcing pieces of your company is cost effective. The economic downturn has made companies explore more fiscally conservative options for their company. Internet outsourcing is one of the most popular options to effectively cut costs. Entire departments that cost companies millions a year can be shipped overseas. This allows companies to focus their resources on the crucial elements of their company and not use resources on trivial but necessary matters.

One of the most common departments outsourced is customer service. Maintaining a customer service department requires health benefits, rent, and costly salaries. This creates a huge expense for a company for simple tasks. Customer service departments are being outsourced to India and China for a fraction of the cost. Customer service often requires a straightforward question and answer script. The answers can be given to anyone who has the script. This makes outsourcing customer service effective.

If someone calls for customer support and the customer service representative answers the phone and does not know the answer there is a solution. Calls can be transferred to customer representatives that have extensive product knowledge. This elite group of customer service representatives can be located at corporate headquarters or can be transferred to a trained group of outsourced customer representatives that have knowledge beyond the script. This is one of the easiest ways to cut costs and maintain the value of the company. Over 90% of customer support questions are repeat questions that can be scripted.

Data entry is one the most common outsourced departments. People who do not speak the same language as the origin country can often do data entry tasks. This makes outsourcing data entry extremely cost effective. Numbers and symbols are universal making data entry straightforward in most foreign countries.

All outsourcing tasks can be distributed online. Internet outsourcing is the future to big and small businesses creating cost effective business plans. Placing an order online for electronic equipment has become a normal way of shopping. Placing online orders for work will be common in the decades to come.

Companies worry about outsourcing because they're concerned about quality. Outsourcing has become big business in China, India, third world and developing countries. Projects outsourced are taken very seriously and business management is similar to western societies. The regulations are often more strict than the United States and the work is often held to a higher standard to insure repeat business.



Source: http://ezinearticles.com/?Internet-Outsourcing-Data-Entry-to-Third-World-Countries&id=4617038

Friday, 20 September 2013

Why Outsource Data Entry Service?

Data entry is one of the most neglected responsibilities for any organization. Many organizations can not provide much attention to the data entry departments compare to other departments of the firm. So it is beneficial for them to outsource data entry services to bpo companies. Outsourcing is one of the most cost effective and reliable way to manage your business data entry.

If you think to outsource bpo services, then India is the most preferred country to outsource data entry, data processing, data conversion and many more bpo services at affordable rate. To save money and time, India is the central place in the world to outsource data entry services.

Some of other reimbursements of outsourcing like:

- Reduced operating cost
- No need to hire and train employee
- Make able you to focus on your core business
- Take advantages of bpo professionals
- Saved money and time can be invested in the other areas of business

Outsourcing is the profitable option available for any businesses because it has maximum benefits which boosts up your business performance, increases productivity, smoothly and effectively running your database management system and work flow.

Outsourcing services make available addition benefits such as integrating high quality processes, the advanced technology, well established infrastructure and expert professionals are capable to achieve better and cover the entire range of data entry services at lowest rates with 99.98% accuracy.

So, outsource your requirements to a reliable bpo company who is accomplished to complete data entry needs with successfully and provide ideal customized solutions for your entire organization requirements.

BPO industry engaged in providing absolute services give quick, well-organized and secure solutions to retain their place in competitive outsourcing market. Many organizations provide high level of accuracy with complete confidentiality. These companies also utilize the services of proofreaders in an effort to give high accurate service.




Source: http://ezinearticles.com/?Why-Outsource-Data-Entry-Service?&id=2728233

Thursday, 19 September 2013

Advantages of Online Data Entry Services

People all over the world are enthusiastic to buy online data entry services as they find it cost effective. Most of them have an impression that they get quality services against the prices they have to pay. Entering data online is of a great help to business units of all sizes as they consider them as their main basis of profession.

Online data entering and typing services providers have skilled resources at their service who deliver quality work timely. These service providers have modernized technology, assuring cent percent security of data. Online data entry services include the following:

    Data entry
    Data Processing
    Product entry
    Data typing
    Data mining, Data capture/collection
    Business Process Outsourcing
    Data Conversion
    Form Filling
    Web and mortgage research
    Extraction services
    Online copying, pasting, editing, sorting, as well as indexing data
    E-books and e-magazines data entry

Get companies world wide quality services to business units of all sizes, some of the common input formats are:

    PDF
    TIFF
    GIF
    XBM
    JPG
    PNG
    BMP
    TGA
    XML
    HTML
    SGML
    Printed documents
    Hard copies, etc

Benefits of outsourcing online data entering services:

Major benefits of data entry for business units is that they get the facts and figures which helps in taking strategic decisions for the organization. The data projected by numbers turns to be a factor of evaluation that accelerates the progress of the business. Online data typing services maintain high level of security by using systems that are highly protected.

The business organization progresses because of right decisions taken with the help of superior quality data available.

    Save operational overhead expense.
    Saves time and space.
    Accurate services can be accessed.
    Eliminating the paper documents.
    Cost effective.
    Data accessible from anywhere in the world.
    100% work satisfaction.
    Access to professional and experienced data typing services.
    Adequate knowledge of wide range industrial needs.
    Use of highly advance technologies for quality results.

Business organizations find themselves blessed because of the benefits they receive out of outsourcing their projects on online data entering and typing services, because it not only saves their time but also saves a huge amount of money.

Upcoming business companies can focus on their key business functions instead of dealing with non-key business activities. They find it sensible to outsource their confidential and crucial projects to trustworthy online data entry services and remain free for their key business activities. These companies have several layers of quality control which assures 99.9% quality on projects on online data entry.

Hi-tech BPO is best ideal solutions for small, medium or large level of business. Hi-tech BPO with its dedicated, skilled and experienced team of BPO professionals meets all the requirements of the current business world. Contact us at: http://www.hitechbpo.com/contactus.php





Source: http://ezinearticles.com/?Advantages-of-Online-Data-Entry-Services&id=6526483