0.数据分析背景 1.数据来源和分析工具 2.数据分析目的 2.1 了解房屋和平台特点 2.2 了解评价模块 2.3 了解房东放租相关情况 3.理解数据 3.1 数据结构,理解字段 3.2 各个数据表的数据数 4.数据清洗 4.1 选择字段,设置类型 4.2 缺失值、异常值处理 5.数据分析主体 5.1 房屋与平台情况分析 5.1.1 未来一年放租情况 5.1.2 未来一年房屋接受预定的时间段 5.1.3 未来一年最受欢迎房屋的具体情况 5.2 评价模块分析 5.2.1 房屋在平台的总评论数排名与近12个月的评论数排名比较 5.2.2 各分项评分与总分概况 5.3 房东放租成租情况分析 5.3.1 房东注册时间一览 5.3.2 房东是否是超级房东与近12个月新生活跃房屋的关系 5.3.3 房东放租后成单需时概况 5.3.4 空租房屋概况 5.4 房客方相关分析 5.4.1 平台上评价较为活跃的房客 5.4.2 房客旅游类型,以及选择的房屋种类 5.4.3 房客人数、房容纳人数与房屋类型相关情况 6.分析结论 7.建议 0.数据分析背景爱彼迎是一家联系旅游人士和家有空房出租的房主的服务型网站,为用户提供多样的住宿信息。与国内其他如途牛旅游网,携程等不一样,爱彼迎作为一个直接面向出租房主和旅游人士的旅游平台,为房东和房客的交流提供了一个直接交流途径,其较为创新形式引起了大众关注。香港作为一个高度繁荣的国际大都市,是世界最具竞争力的城市之一,旅游景点众多,旅游业发达,有“东方之珠”“购物天堂”等美称。分析香港地区爱彼迎的相关数据,有利于我们去了解爱彼迎的相关市场讯息及其具有吸引力、创新性的地方,对旅游相关平台发展、制定策略具有一定的参考价值。 1.数据来源和分析工具【数据来源】:爱彼迎官网中国香港地区2019年7月13日数据 http://insideairbnb.com/get-the-data.html 【分析工具】X mind, Mysql , Excel, Wordart , 图表秀 2.数据分析目的2.1 了解房屋和平台特点 了解爱彼迎平台在香港地区面向的市场对象,受欢迎的房屋特点以及平台特性 2.2 了解评价模块 了解爱彼迎平台的评价板块设置,反馈活跃度,以及评价中的房屋的出租情况 2.3 了解房东放租相关情况 了解爱彼迎平台的成租情况,平台中房东方的活跃度、参与度等 3.理解数据3.1数据结构,理解字段 贯穿3个表的是listing id(红色下划线),红色的为房东方相关字段,蓝色的为房客方相关字段,绿色的为评价模块相关字段 3.2 各个数据表的数据数 Calendar有4587685条数据,有4587685条日历相关信息; listigns2有12569条数据,有12569间房屋; reviews2有262222条数据,有262222条评论 SELECT count(*) from calendar;SELECT count(*) from listings2; SELECT count(*) from reviews2; 4.数据清洗4.1在导入时候选择此次所需的字段,设置相应的类型 把calendar 中的listing_id,price, minimum_nights 字段,listing2 中的 id,host_id, host_listings_count, host_total_listings_count, accommodates, price, guests_included, review_scores_rating, review_scores_cleanliness, review_scores_checkin, review_scores_communication, review_scores_location, review_scores_value, calculated_host_listings_count字段,reviews2中的listing_id,id,reviewer_id设为Int类型,listing2中的 host_location, amenities字段设置为text,其余字段为varchar。保留字段及结构理解如下图 4.2通过升/降序查看每列数据的情况,对缺失值、异常值进行处理 4.2.1对于listing2表中host_location,host_identity_verified,accommodates,property_type字段原有在抓取时候即缺失、且无法通过其他途径统一为N/A值 SELECT * FROM listings2 where listings2.host_location is null;START TRANSACTION; update listings2 set listings2.host_location='N/A' where listings2.host_location is null; UPDATE listings2 set listings2.host_is_superhost= 'N/A' WHERE listings2.host_is_superhost is null; ; UPDATE listings2 set listings2.host_identity_verified= 'N/A' WHERE listings2.host_identity_verified is null; UPDATE listings2 set listings2.accommodates='N/A' where listings2.accommodates is null; UPDATE listings2 set listings2.property_type='N/A' where listings2.property_type is null; 4.2.2 对于定金缺失值,统一补全为0 START TRANSACTION;UPDATE listings2 set listings2.security_deposit =0 where listings2.security_deposit is null; 4.2.3 对于价格的缺失值,采用相同房东下相似房屋价格补全 SELECT listings2.host_id,listings2.id from listings2 where listings2.price=0;(1) SELECT * from listings2 where listings2.host_id=147762726;START TRANSACTION; update listings2 set listings2.price=501 where listings2.id=20690025; (2) SELECT * from listings2 where listings2.host_id=146408355;START TRANSACTION; update listings2 set listings2.price=681 where listings2.id=20716506; (3) SELECT * from listings2 where listings2.host_id=16155550;SELECT * from listings2 HERE name like '%rooftop%' and listings2.property_type='Apartment' and listings2.room_type='Entire home/apt' and listings2.accommodates=3 and listings2.beds=2; START TRANSACTION; update listings2 set listings2.price=688 where listings2.id=20947321; (4) SELECT * from listings2 where listings2.host_id=24358640 and listings2.accommodates=2 and listings2.beds=1 and listings2.bedrooms=1 and property_type='Condominium';START TRANSACTION; update listings2 set listings2.price=321 where listings2.host_id=24358640 and listings2.price=0; 4.2.4 host_listings_count, host_total_listings_count补全:找到该房屋房东其下所有房子的数据,并补全 Select * from listing2 where host_listings_count is null or host_total_listings_count is null;START TRANSACTION; UPDATE listings2 set listings2.host_listings_count= 1 WHERE listings2.id=32409603 ; UPDATE listings2 set listings2.host_total_listings_count= 1 WHERE listings2.id=32409603 ; 4.2.5 host_since异常值处理:host_since比first_review时间还要晚的,找到该房东的id,找到其下所有房子的首次评价数据,以最早的作为注册时间替代 【查找】 SELECT datediff(CAST(listings2.first_review AS DATE),CAST(listings2.host_since AS DATE)),listings2.id,listings2.host_id,listings2.first_review,listings2.host_since,listings2.last_review, listings2.price,listings2.number_of_reviews,listings2.number_of_reviews_ltm from listings2 where datediff(CAST(listings2.first_review AS DATE),CAST(listings2.host_since AS DATE))<0 ; 房东ID为251015872的注册时间比首评时间要晚 SELECT * from listings2 where listings2.host_id=251015872;查询到该房东下房屋最早的评价时间为2015-07-25 START TRANSACTION;update listings2 SET listings2.host_since='2015-07-25' where host_id=251015872; 同理修改3条信息。 5.数据分析主体5.1 房屋与平台情况分析 5.1.1未来一年放租情况 /*poplist_in365f 未来一年内已预订天数的房子(这里有一个VIEW)*/ create view poplist_in365f AS SELECT DISTINCT calendar.listing_id,calendar.date,calendar.minimum_nights,calendar.maximum_nights, calendar.available,count(calendar.available) as reserved from calendar GROUP BY calendar.listing_id ,calendar.available having calendar.available= 'f'; SELECT count(*) from poplist_in365f; -- 未来一年内1天至364天都有人定的情况分析 -- SELECT count(case when poplist_in365f.reserved=365 then listings2.id else null end) as '365天已订', count(case when poplist_in365f.reserved=365 then listings2.id else null end)/12569 as '365占比', count(case when poplist_in365f.reserved<365 and poplist_in365f.reserved>=180 then listings2.id else null end) as '180-364天已订', count(case when poplist_in365f.reserved<365 and poplist_in365f.reserved>=180 then listings2.id else null end)/12569 as '180-364占比', count(case when poplist_in365f.reserved<180 and poplist_in365f.reserved>=60 then listings2.id else null end) as '60-179天已预订', count(case when poplist_in365f.reserved<180 and poplist_in365f.reserved>=60 then listings2.id else null end)/12569 as'60-179占比', count(case when poplist_in365f.reserved<60 and poplist_in365f.reserved>=1 then listings2.id else null end) as '1-60天已预订', count(case when poplist_in365f.reserved<60 and poplist_in365f.reserved>=1 then listings2.id else null end)/12569 as'1-60天占比' from listings2 JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id; 全年已接受预定的占比21.79%,180-364已接受预定的占比41.47%,60-179的占比8.09%,1-60的占比20.2% 5.1.2 未来一年已接定的房屋接受预定的时间段 SELECT left(calendar.date,7) as '年月', right(calendar.date,2)as '日' ,count(calendar.listing_id)FROM calendarwhere calendar.available='f' GROUP BY left(calendar.date,7) , right(calendar.date,2); 把数据导出后以“年月”“日”分别作为表的统计字段,转置 时间主要集中在20190713当天,以及2020年4月的中下旬 5.1.3未来一年已接定的房屋(2739间)具体情况 (1)房东回复时间 SELECT listings2.host_response_time,count(listings2.host_response_time), poplist_in365f.reserved from listings2 JOIN poplist_in365f on listings2.id= poplist_in365f.listing_idwhere poplist_in365f.reserved=365 GROUP BY listings2.host_response_time order BY count(listings2.host_response_time) DESC; --剔除异常值后的总数-- SELECT count(listings2.host_response_time), poplist_in365f.reserved from listings2 JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 and listings2.host_response_time <> 'N/A'; --回复时间分组-- SELECT listings2.host_response_time,count(listings2.host_response_time),count(listings2.host_response_time)/665, poplist_in365f.reserved from listings2 JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 and listings2.host_response_time <> 'N/A' GROUP BY listings2.host_response_time order BY count(listings2.host_response_time) DESC; (2)房东是否提供个人照片 SELECT listings2.host_has_profile_pic,count(listings2.host_has_profile_pic), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.host_has_profile_pic order BY count(listings2.host_has_profile_pic) DESC; (3)房东是否有个人认证 SELECT host_identity_verified,count(host_identity_verified), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY host_identity_verified order BY count(host_identity_verified) DESC; (4)是否为超级房东 SELECT listings2.host_is_superhost,count(listings2.host_is_superhost), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.host_is_superhost order BY count(listings2.host_is_superhost) DESC; (5)房东所在地 SELECT count( case when listings2.host_location like '%HK%' or listings2.host_location like'%Hong Kong%' orlistings2.host_location like '%Hong kong%'or listings2.host_location like '%Hong-Kong%' or listings2.host_location like '%Hongkong%' or listings2.host_location like '%新界%' OR listings2.host_location like '%香港%' OR listings2.host_location like '%Sai Kung%' then listings2.host_location ELSE null end) as '在香港本地', count(case when listings2.host_location like'%Guangdong%' then listings2.host_location ELSE null end) as '在周边城市', count(case when (listings2.host_location like '%China%' or listings2.host_location like '%CN%' OR listings2.host_location like '%Taiwan%' or listings2.host_location like '%Taipei%' ) AND (listings2.host_location not like '%HK%' and listings2.host_location not like'%Hong Kong%' and listings2.host_location not like '%Hong kong%'and listings2.host_location not like '%Hong-Kong%' and listings2.host_location not like '%Hongkong%' and listings2.host_location not like '%新界%' and listings2.host_location not like '%香港%'and listings2.host_location not like '%Sai Kung%') and (listings2.host_location not like '%Guangdong%') then listings2.host_location else null end ) as '国内其他城市', count(case when ( listings2.host_location like '%HK%' or listings2.host_location like'%Hong Kong%' or listings2.host_location like '%Hong kong%'or listings2.host_location like '%Hong-Kong%' or listings2.host_location like '%Hongkong%' or listings2.host_location like '%新界%' OR listings2.host_location like '%香港%' or listings2.host_location like '%Sai Kung%') OR (listings2.host_location like '%China%' or listings2.host_location like '%CN%' OR listings2.host_location like '%Taiwan%' or listings2.host_location like '%Taipei%') OR listings2.host_location like '%Guangdong%' then null when listings2.host_location='N/A' then null ELSE listings2.host_location end) as '常在国外', count(case when listings2.host_location='N/A' then listings2.host_location else null END) as '空值' from listings2 JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 ; (6)取定规则宽松度 SELECT listings2.cancellation_policy,count(listings2.cancellation_policy), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.cancellation_policy order BY count(listings2.cancellation_policy) DESC; (7)是否闪定 SELECT listings2.instant_bookable,count(listings2.instant_bookable), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.instant_bookable order BY count(listings2.instant_bookable) DESC; (8)定金分组 SELECT listings2.security_deposit,count(listings2.security_deposit), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 and listings2.security_deposit is not null GROUP BY listings2.security_deposit order BY count(listings2.security_deposit) DESC; (9)需要房客线上提供身份证明 SELECT listings2.requires_license,count(listings2.requires_license), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.requires_license order BY count(listings2.requires_license) DESC; 100%不用线上提供身份证明 (10)是否需要提供手机 SELECT listings2.require_guest_phone_verification,count(listings2.require_guest_phone_verification), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.require_guest_phone_verification order BY count(listings2.require_guest_phone_verification) DESC; (11)是否需要房客提供个人照片 SELECT listings2.require_guest_profile_picture,count(listings2.require_guest_profile_picture), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.require_guest_profile_picture order BY count(listings2.require_guest_profile_picture) DESC; (12)房间常有设施 SELECT listings2.amenities, poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 and listings2.amenities is not NULL ; 转出excel后分列,导入数据库(字段类型设置为text) CREATE VIEW amenities1 ASSELECT amenities.amenities1 from amenities UNION ALL SELECT amenities.amenities2 from amenities UNION ALL SELECT amenities.amenities3 from amenities UNION ALL SELECT amenities.amenities4 from amenities UNION ALL SELECT amenities.amenities5 from amenities UNION ALL SELECT amenities.amenities6 from amenities UNION ALL SELECT amenities.amenities7 from amenities UNION ALL SELECT amenities.amenities8 from amenities UNION ALL SELECT amenities.amenities9 from amenities UNION ALL SELECT amenities.amenities10 from amenities UNION ALL SELECT amenities.amenities11 from amenities UNION ALL SELECT amenities.amenities12 from amenities UNION ALL SELECT amenities.amenities13 from amenities UNION ALL SELECT amenities.amenities14 from amenities UNION ALL SELECT amenities.amenities15 from amenities UNION ALL SELECT amenities.amenities16 from amenities UNION ALL SELECT amenities.amenities17 from amenities UNION ALL SELECT amenities.amenities18 from amenities UNION ALL SELECT amenities.amenities19 from amenities UNION ALL SELECT amenities.amenities20 from amenities UNION ALL SELECT amenities.amenities21 from amenities UNION ALL SELECT amenities.amenities22 from amenities UNION ALL SELECT amenities.amenities23 from amenities UNION ALL SELECT amenities.amenities24 from amenities UNION ALL SELECT amenities.amenities25 from amenities UNION ALL SELECT amenities.amenities26 from amenities UNION ALL SELECT amenities.amenities27 from amenitis UNION ALL SELECT amenities.amenities28 from amenities UNION ALL SELECT amenities.amenities29 from amenities UNION ALL SELECT amenities.amenities30 from amenities UNION ALL SELECT amenities.amenities31 from amenities UNION ALL SELECT amenities.amenities32 from amenities UNION ALL SELECT amenities.amenities33 from amenities UNION ALL SELECT amenities.amenities34 from amenities UNION ALL SELECT amenities.amenities35 from amenities UNION ALL SELECT amenities.amenities36 from amenities UNION ALL SELECT amenities.amenities37 from amenities UNION ALL SELECT amenities.amenities38 from amenities UNION ALL SELECT amenities.amenities39 from amenities UNION ALL SELECT amenities.amenities40 from amenities UNION ALL SELECT amenities.amenities41 from amenities UNION ALL SELECT amenities.amenities42 from amenities UNION ALL SELECT amenities.amenities43 from amenities UNION ALL SELECT amenities.amenities44 from amenities UNION ALL SELECT amenities.amenities45 from amenities UNION ALL SELECT amenities.amenities46 from amenities UNION ALL SELECT amenities.amenities47 from amenities UNION ALL SELECT amenities.amenities48 from amenities UNION ALL SELECT amenities.amenities49 from amenities UNION ALL SELECT amenities.amenities50 from amenities UNION ALL SELECT amenities.amenities51 from amenities UNION ALL SELECT amenities.amenities52 from amenities UNION ALL SELECT amenities.amenities53 from amenities UNION ALL SELECT amenities.amenities54 from amenities UNION ALL SELECT amenities.amenities55 from amenities UNION ALL SELECT amenities.amenities56 from amenities UNION ALL SELECT amenities.amenities57 from amenities UNION ALL SELECT amenities.amenities58 from amenities; -- 设施最多的是 -- SELECT * from amenities1; SELECT amenities1.amenities1 , count(amenities1.amenities1 ) from amenities1 where amenities1.amenities1 <> ' ' GROUP BY amenities1.amenities1 ORDER BY count(amenities1.amenities1 ) DESC; (13)日租金区间 SELECT count(case when listings2.price <500 then listings2.price else null end )as '低于500每天',count(case when listings2.price >=500 and listings2.price<1000 then listings2.price else null end)as '500-1000每天', count(case when listings2.price >=1000 AND listings2.price<2000 then listings2.price else null end )as '1000-2000每天', count(case when listings2.price >=2000 and listings2.price<3000 then listings2.price else null end )as '2000-3000每天', count(case when listings2.price >=3000 and listings2.price<5000 then listings2.price else null end )as '3000-5000每天', count(case when listings2.price >5000 then listings2.price else null end )as '高于5000每天' from listings2 join poplist_in365f on listings2.id=poplist_in365f.listing_id where poplist_in365f.reserved=365; (14)房屋类型,房客数,日租金情况 SELECT listings2.property_type,listings2.guests_included,count(listings2.guests_included) , poplist_in365f.reserved from listings2join poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY listings2.property_type,listings2.guests_included ORDER BY count(listings2.guests_included) desc ; (15)房屋类型,房客数,日租金均值情况 SELECT listings2.property_type,listings2.guests_included,avg(listings2.price) from listings2 join poplist_in365f on listings2.id= poplist_in365f.listing_idwhere poplist_in365f.reserved=365 GROUP BY listings2.property_type,listings2.guests_included ORDER BY count(listings2.id) DESC;
5.2.1 房屋在平台的总评论数排名与近12个月的排名比较 (1)房屋在平台的总评论数排名与近12个月的排名升降情况 -- 这里有个VIEW!总评价rank 和近12个月的总评rank(8674:4505 大于0)--CREATE view review_ranking AS SELECT dense_rank() over(ORDER BY listings2.number_of_reviews) as rank1, dense_rank() over(ORDER BY listings2.number_of_reviews_ltm) as rank2, listings2.id from listings2 where listings2.number_of_reviews>0; SELECT count(*) from review_ranking; -- 近12个月对比总数排名是升是降的占比【描述近12个月业绩如何】 -- SET sql_mode = 'NO_UNSIGNED_SUBTRACTION'; SELECT count(rank1-rank2)/8854 as '近12个月对比总数排名降低的占比' from review_ranking where rank1-rank2<0; SELECT count(rank1-rank2)/8854 as '近12个月对比总数排名升高的占比' from review_ranking where rank1-rank2>0; (2)近12个月评论数排名 和 总排名 变化幅度 SELECT count(CASE when abs(rank1-rank2) >0 and abs(rank1-rank2)<50 then abs(rank1-rank2) else null end) as "low",count(CASE when abs(rank1-rank2)>=50 and abs(rank1-rank2) <100 then abs(rank1-rank2) else null end) as "middle", count(case when abs(rank1-rank2)>=100 then abs(rank1-rank2) else null end) as "high" from review_ranking ; 5.2.2 各分项评分与总分概况 (1)总分(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_rating <>101;SELECT listings2.review_scores_rating, listings2.id from listings2 where listings2.review_scores_rating <>101; order by listings2.review_scores_rating DESC limit 0,1; SELECT listings2.review_scores_rating, listings2.id from listings2 where listings2.review_scores_rating <>101 order by listings2.review_scores_rating DESC limit 2162,1; SELECT listings2.review_scores_rating, listings2.id from listings2 where listings2.review_scores_rating <>101 order by listings2.review_scores_rating DESC limit 4235,1; SELECT listings2.review_scores_rating, listings2.id from listings2 where listings2.review_scores_rating <>101 order by listings2.review_scores_rating DESC limit 6488,1; SELECT listings2.review_scores_rating, listings2.id from listings2 where listings2.review_scores_rating <>101 order by listings2.review_scores_rating DESC limit 8650,1; 【最小估计值】 K=1.5: 88-1.5(99-88)=71.5 K=3: 88-3(99-88)=55 总分实际最小值20比k=3时的最小估计值55要小,极度异常; 【最大估计值】 K=1.5: 99+1.5(99-88)=115.5 总分实际最大值小于k=1.5时的最大估计值,处于正常范围。 (2)定位准确度(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_accuracy <>101;SELECT listings2.review_scores_accuracy, listings2.id from listings2 where listings2.review_scores_accuracy <>101 order by listings2.review_scores_accuracy desc limit 0,1; SELECT listings2.review_scores_accuracy, listings2.id from listings2 where listings2.review_scores_accuracy <>101 order by listings2.review_scores_accuracy desc limit 2162,1; SELECT listings2.review_scores_accuracy, listings2.id from listings2 where listings2.review_scores_accuracy <>101 order by listings2.review_scores_accuracy desc limit 4324,1; SELECT listings2.review_scores_accuracy, listings2.id from listings2 where listings2.review_scores_accuracy <>101 order by listings2.review_scores_accuracy desc limit 6486,1; SELECT listings2.review_scores_accuracy, listings2.id from listings2 where listings2.review_scores_accuracy <>101 order by listings2.review_scores_accuracy desc limit 8648,1; 【最小估计值】 K=1.5: 9-1.5(10-9)=7.5 K=3: 9-3(10-9)=6 总分实际最小值2比k=3时的最小估计值6要小得多,极度异常; 【最大估计值】 K=1.5: 10+1.5(10-9)=11.5 总分实际最大值小于k=1.5时的最大估计值,处于正常范围。 (3)清洁度(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_cleanliness <>101;SELECT listings2.review_scores_cleanliness, listings2.id from listings2 where listings2.review_scores_cleanliness <>101 order by listings2.review_scores_cleanliness DESC limit 0,1; SELECT listings2.review_scores_cleanliness, listings2.id from listings2 where listings2.review_scores_cleanliness <>101 order by listings2.review_scores_cleanliness DESC limit 2162,1; SELECT listings2.review_scores_cleanliness, listings2.id from listings2 where listings2.review_scores_cleanliness <>101 order by listings2.review_scores_cleanliness DESC limit 4324,1; SELECT listings2.review_scores_cleanliness, listings2.id from listings2 where listings2.review_scores_cleanliness <>101 order by listings2.review_scores_cleanliness DESC limit 6487,1; SELECT listings2.review_scores_cleanliness, listings2.id from listings2 where listings2.review_scores_cleanliness <>101 order by listings2.review_scores_cleanliness DESC limit 8649,1; 【最小估计值】 K=1.5: 8-1.5(10-8)=5 K=3: 8-3(10-8)=2 总分实际最小值6接近k=1.5时的最小估计值5,临近轻度异常; 【最大估计值】 K=1.5: 10+1.5(10-8)=13 总分实际最大值小于k=1.5时的最大估计值,处于正常范围。 (4)登入评价(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_checkin <>101;SELECT listings2.review_scores_checkin, listings2.id from listings2 where listings2.review_scores_checkin <>101 order by listings2.review_scores_checkin DESC limit 0,1; SELECT listings2.review_scores_checkin, listings2.id from listings2 where listings2.review_scores_checkin <>101 order by listings2.review_scores_checkin DESC limit 2162,1; SELECT listings2.review_scores_checkin, listings2.id from listings2 where listings2.review_scores_checkin <>101 order by listings2.review_scores_checkin DESC limit 4324,1; SELECT listings2.review_scores_checkin, listings2.id from listings2 where listings2.review_scores_checkin <>101 order by listings2.review_scores_checkin DESC limit 6487,1; SELECT listings2.review_scores_checkin, listings2.id from listings2 where listings2.review_scores_checkin <>101 order by listings2.review_scores_checkin DESC limit 8649,1; 【最小估计值】 K=1.5: 9-1.5(10-9)=7.5 K=3: 9-3(10-9)=6 总分实际最小值2比k=3时的最小估计值6要小得多,极度异常; 【最大估计值】 K=1.5: 10+1.5(10-9)=11.5 总分实际最大值小于k=1.5时的最大估计值,处于正常范围。 (5)屋主交流回复(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_communication <>101;SELECT listings2.review_scores_communication, listings2.id from listings2 where listings2.review_scores_communication <>101 order by listings2.review_scores_communication DESC limit 0,1; SELECT listings2.review_scores_communication, listings2.id from listings2 where listings2.review_scores_communication <>101 order by listings2.review_scores_communication DESC limit 2162,1; SELECT listings2.review_scores_communication, listings2.id from listings2 where listings2.review_scores_communication <>101 order by listings2.review_scores_communication DESC limit 4323,1; SELECT listings2.review_scores_communication, listings2.id from listings2 where listings2.review_scores_communication <>101 order by listings2.review_scores_communication DESC limit 6486,1; SELECT listings2.review_scores_communication, listings2.id from listings2 where listings2.review_scores_communication <>101 order by listings2.review_scores_communication DESC limit 8647,1; 【最小估计值】 K=1.5: 8-1.5(10-8)=5 K=3: 8-3(10-8)=2 总分实际最小值4比k=1.5时的最小估计值5要小,比k=3时的估计值2要大,中度异常; 【最大估计值】 K=1.5: 10+1.5(10-8)=13 总分实际最大值小于k=1.5时的最大估计值,处于正常范围。 (6)区位条件(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_location <>101;SELECT listings2.review_scores_location, listings2.id from listings2 where listings2.review_scores_location <>101 order by listings2.review_scores_location DESC limit 0,1; SELECT listings2.review_scores_location, listings2.id from listings2 where listings2.review_scores_location <>101 order by listings2.review_scores_location DESC limit 2411,1; SELECT listings2.review_scores_location, listings2.id from listings2 where listings2.review_scores_location <>101 order by listings2.review_scores_location DESC limit 4323,1; SELECT listings2.review_scores_location, listings2.id from listings2 where listings2.review_scores_location <>101 order by listings2.review_scores_location DESC limit 6485,1; SELECT listings2.review_scores_location, listings2.id from listings2 where listings2.review_scores_location <>101 order by listings2.review_scores_location DESC limit 8646,1; _scores_communication, listings2.id from listings2 Q3=Q1=Q2=最大值,75%数据为10,剩下的25%中最小为6,分数总体较高 (7)性价比评价(总数、最大值、最小值、四分位数) SELECT count(listings2.id) from listings2 where listings2.review_scores_value <>101;SELECT listings2.review_scores_value, listings2.id from listings2 where listings2.review_scores_value <>101 order by listings2.review_scores_value DESC limit 0,1; SELECT listings2.review_scores_value, listings2.id from listings2 where listings2.review_scores_value <>101 order by listings2.review_scores_value DESC limit 2161,1; SELECT listings2.review_scores_value, listings2.id from listings2 where listings2.review_scores_value <>101 order by listings2.review_scores_value DESC limit 4323,1; SELECT listings2.review_scores_value, listings2.id from listings2 where listings2.review_scores_value <>101 order by listings2.review_scores_value DESC limit 6485,1; SELECT listings2.review_scores_value, listings2.id from listings2 where listings2.review_scores_value <>101 order by listings2.review_scores_value DESC limit 8646,1; 【最小估计值】 K=1.5: 9-1.5(10-9)=7.5 K=3: 9-3(10-9)=6 总分实际最小值8比k=1.5时的最小估计值7.5要大,临近轻度异常; 【最大估计值】 K=1.5: 10+1.5(10-9)=11.5 总分实际最大值小于k=1.5时的最大估计值,处于正常范围。 5.3 房东放租成租情况分析 5.3.1 房东注册时间一览 SELECT LEFT(listings2.host_since,4),count(listings2.host_since), poplist_in365f.reserved from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 GROUP BY LEFT(listings2.host_since,4) order BY count(listings2.host_since) DESC; 5.3.2 房东是否是超级房东与近12个月活跃房屋的关系 SELECT listings2.number_of_reviews_ltm/listings2.number_of_reviews as '近12个月活跃度' , count(listings2.id), listings2.host_is_superhost,listings2.host_since from listings2JOIN poplist_in365f on listings2.id= poplist_in365f.listing_id where poplist_in365f.reserved=365 and listings2.number_of_reviews_ltm/listings2.number_of_reviews=1 GROUP BY listings2.host_is_superhost ORDER BY listings2.number_of_reviews desc; 共有500个近12个月评论即为总评论数的房屋新生力量,这些房屋的房东大部分不是超级房东 5.3.3 房东放租后成单需时概况 (1)1000天以上0评房屋 SELECT count(listings2.id) as'一千天以上0评房屋', count(listings2.id)/12569 as '占总比'from listings2where listings2.number_of_reviews=0 and datediff(cast('2019-07-13' as date),CAST(listings2.host_since AS DATE))>1000; (2)注册很久才有首评的房子 SELECT count(listings2.id) as '一千天以上有首评房屋', count(listings2.id)/12569 as '占总比' from listings2where listings2.number_of_reviews>0 and datediff(cast(listings2.first_review as date),CAST(listings2.host_since AS DATE))>1000; (3)首评需时长的房屋占房东放租总房数的比例 CREATE VIEW longtime2 ASSELECT listings2.host_listings_count,listings2.host_id,count(listings2.host_id), count(listings2.host_id)/ listings2.host_listings_count as '首评需时长的占总放租房比', listings2.number_of_reviews,listings2.number_of_reviews_ltm,listings2.host_since, listings2.price,listings2.property_type,listings2.room_type, listings2.bathrooms,listings2.bedrooms,listings2.beds,listings2.bed_type, listings2.calendar_updated,listings2.cancellation_policy,listings2.instant_bookable,listings2.amenities, listings2.calculated_host_listings_count_entire_homes, listings2.calculated_host_listings_count_private_rooms, listings2.calculated_host_listings_count_shared_rooms, listings2.host_has_profile_pic,listings2.host_identity_verified,listings2.host_is_superhost, listings2.host_response_time,listings2.host_location from listings2 WHERE (listings2.number_of_reviews>0 and datediff(cast(listings2.first_review as date),CAST(listings2.host_since AS DATE))>1000) GROUP BY listings2.host_id; SELECT count(longtime2.host_id),longtime2.首评需时长的占总放租房比 from longtime2 where longtime2.首评需时长的占总放租房比=1; 5.3.4 空租房屋概况 (1)注册最久但无人问津的房屋 -- 注册久无人问津的房子 3715,最老的是2008-12-26日的 --SELECT datediff(cast('2019-07-13' as date),CAST(listings2.host_since AS DATE)), listings2.id,listings2.host_id,listings2.first_review,listings2.host_since,listings2.last_review, listings2.calendar_updated, listings2.price,listings2.number_of_reviews,listings2.number_of_reviews_ltm from listings2 where listings2.number_of_reviews=0 ORDER BY datediff(now(),CAST(listings2.host_since AS DATE)) DESC; (2)空租房占房东放租房数100%的 CREATE VIEW longtime1 ASSELECT listings2.host_listings_count,listings2.host_id,count(listings2.host_id), count(listings2.host_id)/ listings2.host_listings_count as '空租房占总放租房比', listings2.host_since, listings2.calculated_host_listings_count_entire_homes, listings2.calculated_host_listings_count_private_rooms, listings2.calculated_host_listings_count_shared_rooms, listings2.host_has_profile_pic,listings2.host_identity_verified,listings2.host_is_superhost, listings2.host_response_time,listings2.host_location from listings2 WHERE (listings2.number_of_reviews=0 and datediff(cast(listings2.first_review as date),CAST(listings2.host_since AS DATE))>1000) GROUP BY listings2.host_id; SELECT count(longtime1.host_id),longtime1.空租房占总放租房比 from longtime1 where longtime1.空租房占总放租房比=1; \ 仅有1间是空租房,且占房东总放租数100% 5.4 房客方相关分析 5.4.1 平台上评价较为活跃的房客 (1)平台评价活跃前20房客及评论数 SELECT count(reviews2.comments), reviews2.reviewer_id from reviews2GROUP BY reviews2.reviewer_id having count(reviews2.comments)>1 ORDER BY count(reviews2.comments) desc LIMIT 20; (2)评论房客的总人数 SELECT count( DISTINCT reviews2.reviewer_id) from reviews2;(3)按评论数看各评论次数人员占比 SELECT count(case when reviewssep.c=1 then reviewssep.id else null end) as '评论次数为1',count(case when reviewssep.c=1 then reviewssep.id else null end)/234025 as '占比1', count(case when reviewssep.c=2 then reviewssep.id else null end) as '评论次数为2', count(case when reviewssep.c=2 then reviewssep.id else null end)/234025 as '占比2', count(case when reviewssep.c=3 then reviewssep.id else null end) as '评论次数为3', count(case when reviewssep.c=3 then reviewssep.id else null end)/234025 as '占比3', count(case when reviewssep.c=4 then reviewssep.id else null end) as '评论次数为4', count(case when reviewssep.c=4 then reviewssep.id else null end)/234025 as '占比4', count(case when reviewssep.c>5 then reviewssep.id else null end) as '评论次数大于5', count(case when reviewssep.c>5 then reviewssep.id else null end)/234025 as '占比5' from reviewssep GROUP BY reviewssep.c; 5.4.2 房客旅游类型 SELECT listings2.is_business_travel_ready , count(listings2.id )as '个数' from listings2;5.4.3 房客人数、房容纳人数与房屋类型相关情况 SELECT listings2.guests_included,listings2.accommodates,listings2.property_type, count(listings2.id) as '个数' FROM listings2GROUP BY listings2.guests_included,listings2.property_type ORDER BY count(listings2.id) DESC; 6.分析结论【结论1】平台主要情况 平台主要面向旅行型的市场对象,一般房客会选择房容人数比实际房客数多的房屋,一般会比实际多1倍;最多人选择的是可容2-2人的公寓套房,或可容2人的共管公寓、客用套房。 【结论2】未来一年平台情况 平台在未来一年,半数以上房屋能被出租半年以上,其中21.79%房屋全年都已被预定。受香港至今还未平息的动乱事件影响,未来半年的预定趋冷,热点主要集中在统计当日以及半年后的2020年4月中下旬。 【结论3】未来365天都被预定的房屋情况 (1)近4成房东进行了个人认证,几乎全部房东会提供个人照片,且近半数房东在1小时内回复房客,超过7成房东在香港本地,房东主动性互动性较强,方便房客问询,提高房客对房东的认知度和安全感。 (2)房客方基本不需要在线上提供手机或个人照片等信息,完全不用提供证件信息;7成房屋免定,其余的随预定时间长短在100-29118间高低不等,租房门槛、成本都不高,为房客方租房提供便利。房屋取定规则较为宽松,严格的只占2成左右,房客因应实际需要进行调整的灵活度较高。 (3)房屋设施种类繁多,共127种,其中空调、WIFI、必备用品、厨房、洗浴设施用具等基本配备,设施较为完备,考虑的方面从带小孩旅行到商务类型都已涵盖。 (4)近4成的房屋支持闪定,在平台房屋中普及率还不算高,。 (5)公寓套房出租最受欢迎,其中以1-2人租客成行居多,日租金在700-800。2人租公寓套房、1人租共管公寓/独立房屋/服务式住宅/招待所也较受欢迎。2人租共管公寓/服务式住宅、5人租服务式住宅、4人租共管公寓/独立房屋、16人租公寓套房偏冷。租次较少的房屋日租人均价在200-500间浮动,比1-2人租要低不少。以此可吸引家庭团或小团体组队旅行的房客 【结论4】评价体系 (1)52.2%房屋在近12个月的评论数有所上升,排名变化大多在50位左右,只有4.4%的房屋在近12个月排名浮动超过100位,排名较为稳定,且稳中带升,从平台评价看,平台内放租房屋活跃度较高,发展势头不错。 (2)平台房客方对房屋的评价较高,总体评分分值在88以上,分项分值普遍在6分以上;其中总分、房屋登入、定位的评分出现最低分为20,2,2,最小值极度异常,屋主交流、房屋区位最低分为4,6,中度异常,均有恶意差评的嫌疑。 (3)平台上反复评价的房客不多,2次以上评价的占比不到1成,最高评次为25,有效长评活跃度不高,反复评价的房客活跃度需要激活。 【结论5】放租的房东 (1)房东在2015年注册数增长呈最大值,而后增长逐渐趋缓,新增用户后劲力度不足; (2)从房屋数看,房东注册后1000天以上才成租的房屋占比不到2成,放租后成租速度比较快,但占房东放租房屋100%的房屋中有976间从注册到成租需时超过1000天,对这部分房东而言成租信心受挫。 (3)注册后至今0评处于空租状态的房屋占比约2成,注册后1000天以上0评处于空租的占比18.07%,相当一部分房东的房屋活跃度不高,亟待催活。 7.建议【建议1】 旅游型房客市场已定调,尝试开拓商务类的市场,如结合在商务区附近的放租民宿,在平台社区做连片联合专题,结合商务会议热点地点做相关的交通指引专题或分区讨论,在平台时提高和方便商务相关设施的曝光率。 【建议2】 受动乱事件影响,未来相当一段时间,香港的旅游业会迎来寒冬期,在此期间只能消化中长期定单,对短期租的定金适度下调,中长期单的以稳定为主。 【建议3】 关于闪定功能可再根据房东房客方的实际需要进行调整改善,以及进行必要的宣传科普 【建议4】 以日租人均价比1-2人租低不少,鼓励家庭团或小团体组队旅行订房。 【建议5】 评价体系需要优化:将分制划分为高中低三个范围,加颜色标注,方便房客定分;对于给予低分的,需要房客提供进一步的图文证明,并要求房东方给予合理解释,减少恶意差评;鼓励房客方给予长评,对有效长评方给予一定的线上奖励,如积分机制或称号加持。 【建议6】 推出片区联动、联系节日或相关主题带动新房东增长,鼓励已注册用户或房客等分享推荐平台(给予线上奖励),扩大平台知名度;对于长期空租的房屋及房东给予一定的支持计划,激活房东方,结合热评受欢迎房屋的相关给房东方提供改进调整建议,结合社区发展拉动静止用户活跃起来。 |