早教吧 育儿知识 作业答案 考试题库 百科 知识分享

英语翻译ExtractingstructureddatafromWebsitesisnotatrivialtask.MostoftheinformationontheWebtodayisintheformofHypertextMarkupLanguage(HTML)documentswhichareviewedbyhumanswithabrowser.HTMLdocumentsaresometimeswrit

题目详情
英语翻译
Extracting structured data from Web sites is not a trivial task.
Most of the information on the Web today is in the form of
Hypertext Markup Language (HTML) documents which are
viewed by humans with a browser.HTML documents are
sometimes written by hand,sometimes with the aid of HTML
tools.Given that the format of HTML documents is designed for
presentation purposes,not automated extraction,and the fact that
most of the HTML content on the Web is ill-formed (“broken”),
extracting data from such documents can be compared to the task
of extracting structure from unstructured documents.
▼优质解答
答案和解析
从互联网上提取资料数据并不是一件微不足道的工作.大多数今天发布的信息都是HTML文件,他们都是人工发布到互联网上去的.HTML文件有时候是手写的,有时候借助于HTML工具.设计成HTML版本主要是陈述的目的,不能被自动提取,事实上大多数互联网的HTML文件是错误的格式(被破坏的),所以从这样的数据中提取文件就好比是在没有组织的文件中提取组织文件.