当前位置:  首页  >  PHP教程  >  PHP 基础  >  环境

新浪科技文章采集代码

提供各种官方和用户发布的代码示例,代码参考,欢迎大家交流学习
新浪科技的文章一键采集ThinkPhp适用代码
/* 新浪科技文章采集 */
public function sina_tech() {
/* NEED CAULL PAGE NUM */
$page_num = intval($_POST['get_post_page_num']);
if (empty($page_num)) $page_num = 1;
/* FIRST COUNT */
$post_count_a = M('post')->count();
/* FOR CULL */
for ($page = 1; $page <= $page_num; $page++) {

$fullpage = CurlGetPage('http://roll.tech.sina.com.cn/s/channel.php?ch=05#col=30&spec=&type=&ch=05&k=&offset_page=0&offset_num=0&num=5&asc=&page='.$page);

preg_match_all('/\s+(.*)\s+<\/p>/Us', $fullpage, $match);
$fullpage = iconv("GB2312", "UTF-8", $match[1][0]);//echo $data1;die;

preg_match_all('/
  • (.*)<\/li>/isU', $fullpage, $in_li_tags);
    foreach (array_unique($in_li_tags[1]) as $row) {
    /* TITLE */
    preg_match_all('/(.*)<\/a>/', $row, $title);
    $title = $title[1][0];
    /* LINK */
    preg_match_all('/href="([^"]*)"/', $row, $link);
    $link = $link[1][0];
    /* DATE */
    preg_match_all('/(.*)<\/span>/i', $row, $date);
    $date = date("Y-", time()) . $date[1][0] . ':00';
    // echo $title.' '.$link.' '.$date.'
    ';

    /* GOING THE POST PAGE */
    $fullpage_post = CurlGetPage($link);
    /* FIX TAGS */
    $fullpage_post = preg_replace('/

    (.*)<\/p>/isU', '${1}', $fullpage_post);
    $fullpage_post = preg_replace('/

    (.*)<\/p>/Us', '', $fullpage_post);
    //echo htmlspecialchars($fullpage_post);die;

    /* POST CONTENT */
    preg_match_all('/\s+(.*)\s+<\/p>/Us', $fullpage_post, $post_content);
    /* DEL A TAGS */
    $post_content = preg_replace("/]*>(.*)<\/a>/isU", '${1}', $post_content[1][0]);
    // echo '

    '.$title.'

    '.$url.'
    '.$date.'
    '.$postCon.'
    ';

    /* SAVE TO DB */
    $post_title_count = M('post')->where("title='$title'")->count();
    if ($post_title_count == 0) {
    $dataMySql["title"] = $title;
    $dataMySql["content"] = $post_content;
    $dataMySql["datetime"] = $date;
    M('post')->add($dataMySql);
    }
    }
    }
    /* LAST COUNT */
    $post_count_b = M('post')->count();
    $post_add_num = $post_count_b - $post_count_a;
    /* CALLBACK */
    if ($post_count_a == $post_count_b) {
    echo '{"success":1,"msg":"文章数无变化"}';
    } else {
    echo '{"success":1,"msg":"成功采集 ' . $post_add_num . ' 篇文章"}';
    }
    }

    AD:真正免费,域名+虚机+企业邮箱=0元

  • 吐了个 "CAO" !
    扫码关注 PHP1 官方微信号
    PHP1.CN | 中国最专业的PHP中文社区 | PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | PHP问答
    Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved PHP1.CN 第一PHP社区 版权所有