热门标签 | HotTags
当前位置:  开发笔记 > 编程语言 > 正文

KDDCup2012(今年数据挖掘在中国)

KDDCup2012:今年的数据挖掘相关会议在中国,充分体现了我们的IT企业在经历了原始社会后的第一次进步,开始注重机器学习这已经是国外巨头梦寐以求的热

KDD Cup 2012(今年数据挖掘在中国)KDD Cup 2012:今年的数据挖掘相关会议在中国,充分体现了我们的IT企业在经历了原始社会后的第一次进步,开始注重机器学习这已经是国外巨头梦寐以求的热土。本次金牌赞助是华为、腾讯、百度,比赛分为两组,一组以研究微博用户关注为基础数据,预测未给出的用户关系;另外一组应该是对搜索引擎广告推荐的相关预测,数据还未出来大家拭目以待,3月1日。报名参赛与数据发布提交地址:http://www.kddcup2012.org/

 

 

以下来自->官方网站 http://www.kdd.org/kdd2012/

This year’s KDD Cup is sponsored by Tencent Inc., which is China’s largest Internet company in terms of active users (over 700 Million users as of Jan. 2012). Tencent Inc. owns a full portfolio of popular products including instance messaging, email, and news portal, search engine, online games, blogging and micro-blogging in China, offering a rich opportunity to build user models for highly effective user intent prediction and result recommendation. This year’s KDD Cup consists of two separate tasks.

User Modeling based on Microblog Data and Search Click Datatencent

Task 1. Social Network Mining on Microblogs (Weibo)

Tencent Weibo (http://t.qq.com/) offers a wealth of social-networking information. For the 2012 KDD Cup, the released data represents a sampled snapshot of the Tencent Weibo users’ preferences for various items – the recommendation to users and follow-relation history. In addition, items are tied together within a hierarchy. That is, each person, organization or group belongs to specific categories, and a category belongs to higher-level categories. In the competition, both users and items (person, organizations and groups) are represented as anonymous numbers that are made meaningless, so that no identifying information is revealed. The data consists of 10 million users and 50,000 items, with over 300 million recommendation records and about three million social-networking “following” actions. Items are linked together within a defined hierarchy, and the privacy-protected user information is very rich as well. The data has timestamps on user activities.

Task 1 is to predict which users a given user will follow, among all potential users.

Task 2. User Click Modeling based on Search Engine Log Data

Online advertising has been the financial support of the Internet industry for years. Three successful kinds of computational ad systems are search ad, contextual ad and social networking ad systems. Search ads systems retrieve and rank ads given a query, and display result ads together with results from the search engine. Once a user clicks on an ad, the advertiser pays the search engine for its help on promotion. The ranking of ads is to maximize users’ satisfaction, advertisers’ return-on-investment and search engine’s revenue. Contextual ad systems involve an additional role, the publishers, who own Internet properties like Web sites, forums or mobile apps. Programs embedded in these properties request ads from ad systems. The ad system finds ads that semantically match content of the properties. Recently, a third kind of computational ad systems is gaining popularity, including social network ads, gained a lot of attention, where the ad system ranks ads with consideration of social relationship.

In all aforementioned systems, a key algorithmic component is to predict the click-through rate (pCTR) of ads. This is because all such systems optimize monetization under the supervision of economic rules (e.g., General Second Price auction, the one behind Google AdWords and others); and these rules require ads pCTR values to rank ads and to price clicks. The closer the pCTR to the truth, the more effective the monetization would be. The use of user information, including demographics and historical behaviors on search engines, e-business platforms, social networks, and micro-blogs, is likely valuable to improve the accuracy of ads pCTR in all above systems.

Task 2′s aim is to accurately predict the ads’ click-through rate in online computational ad systems.

Dates

Feb 20, 2012Competition announcement linked to KDD official site

Mar 1, 2012Registration opens (dataset ready for the public)

Mar 15, 2012Competition begins

Jun 1, 2012Competition ends (submission deadline)

Jun 5, 2012Results compiled

Jun 8, 2012Winners notified

Aug 12, 2012Workshop

*Note that this is only an initial announcement. Stay tuned for more detailed announcements.

KDDCUP 2012 Organizers

  • Dr. Gordon Sun, Chief Scientist, Tencent Inc.
  • Dr. Yading Aden Yue, Expert Researcher, Tencent Inc.
  • Dr. Yi Wang, Deputy Director, Contextual Advertising Platform, Tencent Inc.
  • Mr. Jian Jimmy Hu, Scientist, Tencent Inc.
  • Dr. Yong Nicky Li, Leader, Data Mining Group, Tencent Inc.

转:https://www.cnblogs.com/aloe/archive/2012/03/17/2403006.html



推荐阅读
  • Netty源代码分析服务器端启动ServerBootstrap初始化
    本文主要分析了Netty源代码中服务器端启动的过程,包括ServerBootstrap的初始化和相关参数的设置。通过分析NioEventLoopGroup、NioServerSocketChannel、ChannelOption.SO_BACKLOG等关键组件和选项的作用,深入理解Netty服务器端的启动过程。同时,还介绍了LoggingHandler的作用和使用方法,帮助读者更好地理解Netty源代码。 ... [详细]
  • 如何使用PLEX播放组播、抓取信号源以及设置路由器
    本文介绍了如何使用PLEX播放组播、抓取信号源以及设置路由器。通过使用xTeve软件和M3U源,用户可以在PLEX上实现直播功能,并且可以自动匹配EPG信息和定时录制节目。同时,本文还提供了从华为itv盒子提取组播地址的方法以及如何在ASUS固件路由器上设置IPTV。在使用PLEX之前,建议先使用VLC测试是否可以正常播放UDPXY转发的iptv流。最后,本文还介绍了docker版xTeve的设置方法。 ... [详细]
  • Nginx使用AWStats日志分析的步骤及注意事项
    本文介绍了在Centos7操作系统上使用Nginx和AWStats进行日志分析的步骤和注意事项。通过AWStats可以统计网站的访问量、IP地址、操作系统、浏览器等信息,并提供精确到每月、每日、每小时的数据。在部署AWStats之前需要确认服务器上已经安装了Perl环境,并进行DNS解析。 ... [详细]
  • GetWindowLong函数
    今天在看一个代码里头写了GetWindowLong(hwnd,0),我当时就有点费解,靠,上网搜索函数原型说明,死活找不到第 ... [详细]
  • EPICS Archiver Appliance存储waveform记录的尝试及资源需求分析
    本文介绍了EPICS Archiver Appliance存储waveform记录的尝试过程,并分析了其所需的资源容量。通过解决错误提示和调整内存大小,成功存储了波形数据。然后,讨论了储存环逐束团信号的意义,以及通过记录多圈的束团信号进行参数分析的可能性。波形数据的存储需求巨大,每天需要近250G,一年需要90T。然而,储存环逐束团信号具有重要意义,可以揭示出每个束团的纵向振荡频率和模式。 ... [详细]
  • Iamtryingtomakeaclassthatwillreadatextfileofnamesintoanarray,thenreturnthatarra ... [详细]
  • 本文介绍了brain的意思、读音、翻译、用法、发音、词组、同反义词等内容,以及脑新东方在线英语词典的相关信息。还包括了brain的词汇搭配、形容词和名词的用法,以及与brain相关的短语和词组。此外,还介绍了与brain相关的医学术语和智囊团等相关内容。 ... [详细]
  • 本文介绍了Java高并发程序设计中线程安全的概念与synchronized关键字的使用。通过一个计数器的例子,演示了多线程同时对变量进行累加操作时可能出现的问题。最终值会小于预期的原因是因为两个线程同时对变量进行写入时,其中一个线程的结果会覆盖另一个线程的结果。为了解决这个问题,可以使用synchronized关键字来保证线程安全。 ... [详细]
  • 标题: ... [详细]
  • Oracle seg,V$TEMPSEG_USAGE与Oracle排序的关系及使用方法
    本文介绍了Oracle seg,V$TEMPSEG_USAGE与Oracle排序之间的关系,V$TEMPSEG_USAGE是V_$SORT_USAGE的同义词,通过查询dba_objects和dba_synonyms视图可以了解到它们的详细信息。同时,还探讨了V$TEMPSEG_USAGE的使用方法。 ... [详细]
  • 基于移动平台的会展导游系统APP设计与实现的技术介绍与需求分析
    本文介绍了基于移动平台的会展导游系统APP的设计与实现过程。首先,对会展经济和移动互联网的概念进行了简要介绍,并阐述了将会展引入移动互联网的意义。接着,对基础技术进行了介绍,包括百度云开发环境、安卓系统和近场通讯技术。然后,进行了用户需求分析和系统需求分析,并提出了系统界面运行流畅和第三方授权等需求。最后,对系统的概要设计进行了详细阐述,包括系统前端设计和交互与原型设计。本文对基于移动平台的会展导游系统APP的设计与实现提供了技术支持和需求分析。 ... [详细]
  • 本文讨论了在使用Git进行版本控制时,如何提供类似CVS中自动增加版本号的功能。作者介绍了Git中的其他版本表示方式,如git describe命令,并提供了使用这些表示方式来确定文件更新情况的示例。此外,文章还介绍了启用$Id:$功能的方法,并讨论了一些开发者在使用Git时的需求和使用场景。 ... [详细]
  • Python中的PyInputPlus模块原文:https ... [详细]
  • IsitpossibletomakeanAppfortheIphonethatapplychangestotheOriginalIphoneSMSapp?是否有 ... [详细]
  • 支持向量机训练集多少个_25道题检测你对支持向量机算法的掌握程度
    介绍在我们学习机器算法的时候,可以将机器学习算法视为包含刀枪剑戟斧钺钩叉的一个军械库。你可以使用各种各样的兵器,但你要明白这些兵器是需要在合适的时间合理 ... [详细]
author-avatar
陪我飞的艹鱼
这个家伙很懒,什么也没留下!
PHP1.CN | 中国最专业的PHP中文社区 | DevBox开发工具箱 | json解析格式化 |PHP资讯 | PHP教程 | 数据库技术 | 服务器技术 | 前端开发技术 | PHP框架 | 开发工具 | 在线工具
Copyright © 1998 - 2020 PHP1.CN. All Rights Reserved | 京公网安备 11010802041100号 | 京ICP备19059560号-4 | PHP1.CN 第一PHP社区 版权所有