}

Tuesday, January 24, 2017

Massive Data Processing




Extract targets of text in the number of hundreds up to thousands,
They can be either of web links, e-mail addresses, or any other combination of letters and words.
Use Regular-Expression-Formularized Information Capture System, 
To process any batch of text line by line, page by page. 
 
That’s what we do. - - David & Allen. 
It’s applicable for target files of various formats, such as TXT,HTML,HTM, RTF, WORD, EXCEL, etc.  Through customized regular expression, it extracts the needed information of url links of websites and emails, or other type of text information.
 
 
- Tailored solution for concrete target-focused search only
- Suitable for extraction of url links of websites and emails
- Feasible for extraction of text from various files, such as TXT,HTML,HTM, RTF, WORD, EXCEL, etc.
 
     When you look at thousands of potential targets, do you ever think of  extracting out the target web links and e-mail addresses of thousands of entries listed on the isolated pages?  Through the application of tailored solution and refined search accompanied by extracting information one by one in batched series, it can be done.  The solution can be named as Regular-Expression-Formularized Information Capture.
 
     The application is a programmed operation repeating the mechanic steps of information extracting by combination of Regular Expression for collecting target information with tailored VISUAL BASIC solution codes.  It first collects the target web links in batches by tailored Regular Expression formula suitable for each concrete task, and then visits the target links one by one, extracting the website links with independent domain names; then through visiting the independent-domain sites, collecting the target email addresses related to the web links shown on the information platform of various websites (which can be either of the type of comprehensive ecommerce information or other types with specific focus on certain industry).
 
Currently there is no generally-working software that can capture information in high-density batches on the various types of information-platforms.  It may be because that, in case there should appear such type of software, too large information-capture would surely infringe upon the security bottom-line of the platforms themselves.  So in the case that the security of their platform is not secured enough, they’d surely take measures to adjust the relevant parameters or layout of webpage which are relevant to the information capture software.
 
     By this logic, so long as there’s space of innovation, there’d be no absolutely-perfect software of general-purpose type. 


No comments:

Post a Comment