这篇文章主要讲解了“Storm怎么写一个爬虫”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“Storm怎么写一个爬虫”吧!
package com.digitalpebble.storm.crawler.bolt.indexing;
import java.util.Map;
import org.slf4j.LoggerFactory;
import backtype.storm.task.OutputCollector;
import backtype.storm.task.TopologyContext;
import backtype.storm.topology.OutputFieldsDeclarer;
import backtype.storm.topology.base.BaseRichBolt;
import backtype.storm.tuple.Tuple;
import com.digitalpebble.storm.crawler.StormConfiguration;
import com.digitalpebble.storm.crawler.util.Configuration;
/**
* A generic bolt for indexing documents which determines which endpoint to use
* based on the configuration and delegates the indexing to it.
***/
@SuppressWarnings("serial")
public class IndexerBolt extends BaseRichBolt {
private Configuration config;
private BaseRichBolt endpoint;
private static final org.slf4j.Logger LOG = LoggerFactory
.getLogger(IndexerBolt.class);
public void prepare(Map conf, TopologyContext context,
OutputCollector collector) {
config = StormConfiguration.create();
// get the implementation to use
// and instanciate it
String className = config.get("stormcrawler.indexer.class");
if (className == null) {
throw new RuntimeException("No configuration found for indexing");
}
try {
final Class<BaseRichBolt> implClass = (Class<BaseRichBolt>) Class
.forName(className);
endpoint = implClass.newInstance();
} catch (final Exception e) {
throw new RuntimeException("Couldn't create " + className, e);
}
if (endpoint != null)
endpoint.prepare(conf, context, collector);
}
public void execute(Tuple tuple) {
if (endpoint != null)
endpoint.execute(tuple);
}
public void declareOutputFields(OutputFieldsDeclarer declarer) {
if (endpoint != null)
endpoint.declareOutputFields(declarer);
}
}
感谢各位的阅读,以上就是“Storm怎么写一个爬虫”的内容了,经过本文的学习后,相信大家对Storm怎么写一个爬虫这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是天达云,小编将为大家推送更多相关知识点的文章,欢迎关注!