CHAPTERT1 NiceTHotel FairfieldTInnTByTMarriottTBinghamton 4 GoodTplaceTtoTvisit FairfieldTInnTByTMarriottTBinghamton 4 OverallTgood

CHAPTERT1INTRODUCTIONDATATMININGDataTMiningTisTtheTmethodTofTextractingTtheTmassiveTamountsTofTknowledge.TItTisTtheTusedTforTlocatingTinformationTlikeTpatterns,Tassociations,TanomaliesTandTsignificantTstructuresTfromThugeTamountsTofTdataTkeptTinTdatabase,TinformationTwarehouses,TorTotherTdataTrepositories.TThisTcanTbeTowingTtoTtheTsupplyTofTgiantTamountsTofTknowledgeTinTelectronicTforms,TandTadditionallyTtheTrequirementTforTmodifyingTtheTdataTintoTusefulTinformationTandTdataTforTbroaderTapplicationsTasTwellTasTmarketTanalysis,TbusinessTanalysis,TandTdataTprocessingThasTattractedTaTgoodTdealTofTattentionTinTdataTtrade.DataTminingThasTbeenTpopularlyTtreatedTasTaTwordTofTinformationTDiscoveryTinTDatabasesT(KDD),TothersTreadTasTaTnecessaryTstepTwithinTtheTprocessTofTinformationTdiscovery.TKnowledgeTdiscoveryTasTaTmethodTconsistsTofTAssociateTinTNursingTunvariedTsequenceTofTtheTsubsequentTsteps:knowledgeTcleaning(toTtakeTawayTnoiseTorTdigressiveTdata),TknowledgeTintegration(whereTmultipleTknowledgeTsourcesTisTalsoTcombined)1T,knowledgeTselection(whereTknowledgeTrelevantTtoTtheTanalysisTtaskTareaTunitTretrievedTfromTtheTdatabase),TknowledgeTtransformation(whereTknowledgeTareaTunitTremodeledTorTconsolidatedTintoTformsTacceptableTforTminingTbyTactivityToutlineTorTaggregationToperations,TforTinstance)T,knowledgeTmining(anTessentialTmethodTwhereverTintelligentTwaysTareaTunitTappliedTsoTasTtoTextractTknowledgeTpatterns),TpatternTevaluation(toTdetermineTtheTactuallyTfascinatingTpatternsTrepresentingTdataTsupportedTsomeTpowerfulnessTmeasures;TanddataTpresentationT(whereTimageTandTdataTillustrationTtechniquesTareaTunitTwontTtoTpresentTtheTwell-minedTdataTtoTtheTuser).DataTMiningTTasksDataTminingTtasksTmayTbeTclassifiedTintoT2TclassesT:TdescriptiveTdataTminingTandTpredictiveTdataTmining.SummarizationTisTthatTtheTgeneralizationTorTabstractionTofTinformation.TATcollectionTofTrelevantTknowledgeTisTabstractedTandTsummarized,TensuingTaTsmallerTsetTwhichTprovidesTaTgeneralTsummaryTofTinformation.ClusteringTisTseggregatingTsimilarTteamsTfromTunstructuredTknowledge.TItTisTtheTtaskTofTclusteringTaTcollectionTofTobjectsTinTanTexceedinglyTsuchTthatTobjectTinTsameTgroupTareTuniqueTandTadditionalTlikeToneTanotherTthanTtoTthoseTinTotherTteams.TOnceTtheTclustersTseggregated,TtheTobjectsTareTtaggedTwithTtheirTcorrespondingTclusters,TandTcustomaryToptionsTofTtheTobjectsTinTclusterTwillTbeTsummarizedTtoTmakeTaTcategoryTdescription.ClassificationTisTlearningTrulesTwhichTwillTbeTappliedTtoTnewTknowledgeTandTcanTusuallyTembodyTfollowingTsteps:TpreprocessingTofTinformation,TplanningTmodeling,TlearningTorTfeatureTchoiceTselectionTandTvalidationT/evaluation.TClassificationTpredictsTcategoricalTcontinuousTvaluedTfunctions.TClassificationTisTthatTtheTderivationTofTmodelTthatTdeterminesTtheTcategoryTofTassociateTdegreeTobjectTsupportedTitsTattributes.TaTcollectionTofTobjectTisTgivenTasTcoachingTsetTduringTwhichTeachTobjectTisTdiagrammaticTbyTvectorTofTattributesTtogetherTwithTitsTcategory.TByTanalyzingTtheTconnectionTbetweenTattributesTandTsophisticationTofTtheTobjectsTwithinTtheTcoachingTset,TclassificationTmodelTmayTbeTmade.RegressionTisTfindingTperformTwithTlowestTerrorTtoTmodelTknowledge.TIt’sTappliedTforTmathematicsTmethodologyTwhichTisTmostTfrequentlyTusedTforTnumericTprediction.TMultivariateTanalysisTisTwidelyTusedTforTpredictionTandTprediction,TwhereverTitThasTsubstantialToverlapTwithTtheTsphereTofTmachineTlearning.TMultivariateTanalysisTisTadditionallyTwillTnotTperceiveTthatTamongTtheTindependentTvariablesTareaTunitTassociatedTwithTtheTvariable,TandTtoTexploreTtheTstylesTofTtheseTrelationships.AssociationTisTcravingTforTrelationshipTbetweenTvariablesTorTobjects.TItTaimsTtoTextractTattention-grabbingTassociation,TcorrelationsTorTcasualTstructuresTamongTtheTobjectsTi.e.TtheTlooksTofTanotherTsetTofTobjects.TTheTassociationTrulesTmayTbeThelpfulTforTselling,TgoodsTmanagement,TadvertisingTetc.TAssociationTruleTlearningTmayTbeTaTwidespreadTandTwellTresearchedTtechniqueTforTlocatingTattention-grabbingTrelationsTbetweenTvariablesTinTmassiveTdatabases.MOTIVATIONDataTminingTisTtheToneTofTtheTwayTofThandlingThugeTinformationTforTminingTcompetitors.TWithThugeTamountTofTunstructuredTreviewTdata,TbothTtheTcompetitorTandTcustomerTfacedTtheTcrucialTchallengeTofTextractingTveryTusefulTinformation.TProjectTisTaboutTtheTrecommenderTsystemTforTbothTtheTcustomerTandTtheTcompetitorTbyTinformationTfilteringTsystemTthatTseeksTtoTpredictTtheTratingTorTreviewsTthatTcustomerTprovides.DatasetTisTcollectedTfromTtheTonline.TItTisTaboutTtheTcustomerTreviewTaboutTtheThotel.TPreprocessingTofTdataTisTinvolvedTwhereTirrelevantTdataTareTremovedTandTwithTtheTprocessedTdataTneedTtoTanalyzeTandTidentifyTtheTtopTk-businessTcompetitorsTofTaTparticularTlocationTofTcity.TCustomerTfindsTdifficultiesTtoTchooseTtheTbestThotelTtoTvisitTandTenjoy.TCustomerTcanTfindTtheThotelTreviewsTfromTwebTsearchTresult,TbutTthatTdoesn’tTprovideTproperTinformationTandTthatTleadTtoTconfusionTforTtheTcustomerTtoTchooseTtheThotel.TTheTcompetitor’sTin-orderTtoTmakeTtheTbusinessTcompetitorTlevelThigh,TtheyTgetTtheTfeedbackTfromTtheTcustomerTandTthatThelpsTtoTimproveTtheTnegativeTcommentsTaboutTtheThotel.TTheTmotivationTofTtheTprojectTisTinTorderTtoTovercomeTtheTaboveTproblemTandTmakeTcustomerTtoTprovideTaTclearTdecisionTwithTtheTanalysisTofTreviews.Similarly,TtheThotelTcompetitorsTtoTidentifyTstepsTtoTimproveTservice.OBJECTIVEToTidentifyTtheThotelTcompetitorsTbasedTonTtheTcustomerTreviewsTtoTbusiness.ToTdetermineTtheTimprovementTofThotelTbusiness.ToTidentifyTtheTfakeTreviewsTbyTunauthorizedTusers.ToTrecommendTtheTbestThotelTtoTtheTcustomers.ORGANIZATIONTOFTTHETTHESISOrganizationTofTtheTprojectTrepresentsTtheTshortTdescriptionTofTeachTchapter.TChapterT1TprovidesTtheTgeneralTintroductionTtoTdataTmining,TintroductionTtoTtheTprojectTandTdescribesTtheTmotivationTandTobjectiveTofTtheTproject.TChapterT2TisTaboutTtheTLiteratureTSurveyTofTvariousTapproachesTusedTandThowTitTcanTuseTinTidentifyingTtheTbusinessTCompetitorTinTtheTproject.TChapterT3TexplainsTaboutTtheTalgorithmTusedTforTtheTcomponentsTinvolved,TinformationTaboutTtheTtoolTusedTandTtheTdatasetTforTanalysisTpurpose.ChapterT4providesTtheinformationTaboutTtheTimplementationTofTtheTprojectTandTtheTprocessTtoTbeTfollowedTinTorderTtoTachieveTtheTobjectiveTofTprojectTand.TChapterT5TgivesTtheTconclusionTandTfutureTactionTplan.CHAPTERT2LITREATURETSURVEYTMiningTcompetitor’sTofTaTgivenTitem,TtheTmostTinfluencedTfactorTofTtheTitemTwhichTsatisfiesTtheTcustomerTneedTcanTbeTextractedTfromTtheTdataTthatTisTtypicallyTstoredTinTtheTdatabase.TThisTsectionTgivesTtwoTtypesTofTliteraturesTsuchTasTcompetitorTminingTandTunstructuredTdataTmanagement.TTheTunstructuredTdataTsourcesTareTinTaTdifferentTformat,TwhichTisTnotTfallTunderTanyTpredefinedTcategory.TWhenTmanagingTthousandsTofTcustomers,TbusinessTwillThaveTdifficultyTsustainingTtheTrisingTcostsTcreatedTbyTinteractionsTamongTpeople.2.1TONLINETREVIEWS:JinTetTalT1,InformationTfromTwebTproducesTtheTcustomerTopinionTinTdifferentTperspective.TEachTcustomerThasTdifferentTopinionsTandTanalysisTofTcompetitorTfromTlargeTwebTinformationTisTdone.TTherefore,ToneTofTtheTbestTcompetitiveTstrategiesTisTtheTsuccessfulTutilizationTofTwebTdataTforTdecisionTsupport.CustomerTreviewsTforTbusinessTcompetitorTminingTisTcollectedTthroughTseveralTmethods,TwhichTisTusuallyTunstructuredTdataT.MostTofTtheTdataTminingTtechnologiesTcanTonlyThandleTstructuredTdata.TSo,TduringTminingTprocess,TunstructuredTdataTisTnotTtakenTintoTaccountTandTmuchTvaluableTserviceTinformationTisTlost.TStructuredTsystemsTareTthoseTwhereTtheTdataTandTtheTcomputingTactivityTisTpredeterminedTandTwell-defined.TUnstructuredTsystemsTareTthoseTthatThaveTnoTpredeterminedTformTorTstructureTandTareTusuallyTfullTofTtextualTdata.TTypicalTunstructuredTdataTincludeTemail,Treports,Tletters,TandTotherTcommunications.2.2TANALYSISTOFTCOMPETITORSTINTBUSINESS:LappasTetTalT2,CompetitiveTminingTisTdoneTonTdifferentTdomainsTinTorderTtoTgetTanTappropriate.SearchingTtheTqueriesTasTperTtheTcustomerTpreferenceTandTrequestingTtheTsearchTengineTforTtheTmatchingTresults.TFinally,TcustomerTgoesTwithTtheTchoiceTofTtheTsearchTengine.TSometimes,TtheTexactTcustomerTpreferenceTisTnotTidentified,TbutTcustomerTgoesTwithTtheTbestTofTsearchTresultsTobtainedTthatTmatchesTfewTofTpreferences.THowever,TthisTtechniqueTfindsTmanyTproblemsTsuchTasTfindingTtheTtop-nTbusinessTcompetitorsTofTanTitemTandTstructuredTdata.LiTetTalT3,ToaccomplishTminingTcompetitiveTinformationTareTrequiredTsuchTasTaTaboutTtheTcompany,TitsTproductTorTpersonTwhoTworksTinTthatTcompanyTfromTtheTweb.TAnTalgorithmTwasTcalledT”CoMiner”,TalgorithmTextractsTaTsetTofTcomparativeTitemTofTtheTinputTinformationTandTthenTranksTthemTaccordingTtoTtheTtheirTsimilarityTorTidentityTfoundTinTcompartiveness,TandTfinallyTfindsTtheTcompetitiveTitem.TUsuallyTtheTCoMinerTspecificallyTdesignedTforTsupportingTaTparticularTdomain.TTheTdisadvantageTofTCoMinerTisTforTmanyTdomainsTitTwillTbeTdifficultTtoTidentify.TPantTetTalT4,WebTfootprintTrefersTtoTtheTinformationTfromTonlineTmetricsTforTtopTcompetitorTidentification.TFirm’sTwebTsiteTprovidesTtheTcontentTofTfirm’sTactivities,TproductsTandTserviceTtoTitsTvariousTstakeholders.TThisTisTbasedTonTtheTdata,TfirmTlinksTandTwebsiteTinformationTthatTareTstoredTasTlogTtoTidentifyTtheTpresenceTofTonlineTisomorphism,ThereTtheTCompetitiveTisomorphism,TwhichTisTaTofTcompetingTfirmsTbecomingTsimilarTasTtheyTmimicTeachTotherTunderTcommonTmarketTservices.TPredictiveTmodelsTforTcompetitorTidentificationTbasedTonTonlineTmetricsTareTsupportedTthanTtheTofflineTdata.TTheTtechonolgyTjoinsThandsTwithTtheTonlineTandTofflineTmetricsTtoTboostTtheTdevelopingTperformance.SocialTmediaTisTconsideredTasTtheTpopularTinformationTexchangeTplatformTsuchTasTTwitterTandTFacebookTthatTareTbeingTincreasinglyTusedTbyTfirmsTtoTcommunicateTwithTvariousTstakeholders.OnlineTNewsTstoriesTavailableTonTtheTwebTfromTaTlargeTnumberTofTnewsTsourcesTthatTmentionTtheTfirm.ShenghuaTBaoTetTalT5,TAbleTtoTsolveTtheTproblemTofTambiguityTbyTmeansTofTprovidingTtheTinputTentityTwithTadditionalTrestrictions.TCoMinerTisTtheTalgorithmTforTdiscoveringTcompetitors,TtheirTcompetitiveTdomains,TandTdetailedTcompetitiveTevidencesTbyTminingTwebTresources.TCoMinerTextractsTtheTcompetitiveTdomainTinTwhichTtheTgivenTentityTandTitsTcompetitorsTplayTagainstTeachTotherTbyTminingTtheTsalientTphraseTfromTaTsetTofTwebTphrase.2.3TRATING:LiTetTal6,TRankingTmethodsTtoTgiveTtheTcompetitorTinTaTrantingTmethod.TDataTfromTlocation-basedTsocialTmediaTareTusedTforTrankingTtheTcompetitor.TTheTuseTofTPage-RankTmodelTandTit’sTvariantTtoTobtainTtheTCompetitiveTRankTofTfirms.THoweverTminingTcompetitorsTfromTtheTsocialTmediaTdevelopedTmanyTprivacyTrelatedTissues.TAlso,TsocialTmediaTinformationTareTnotTalwaysTaccurate,TpredictionTofTcompetitorTmayTleadTtoTincorrectTresult.TaniaTFerreiraTetTalT7,TGatheringTknowledgeTaboutTtheTcustomersTofTe-commerceTplatforms.TAllowTtheTanalysisTofTbehaviors.TFindTpurchasingTpatterns.TDevelopTaTbetterTrelationshipTmanagementTwithTcustomer.TBetterTstockTmanagement.TOptimizingTtheTorganization’sTprocesses.SupportTtoTcreateTmarketingTactions.GreaterTcompetitiveness.BetterTfinancialTperformance.E-commerceTisTaTconceptTapplicableTtoTanyTtypeTofTbusinessTorTtradeTtransactionTthatTallowsTconsumersTtoTtransactTgoodsTandTservicesTelectronicallyTwithoutTpreventTofTtimeTorTdistance.TAdvantagesTofTe-commerceTare:GreaterTconvenienceTinTpurchasingTtheTproductTorTservice,TNoTstandingTinTqueueTorTbeingTplacedTonTholdTevermore,T24-hourTavailability,TAccessTatTanyTtimeTforTdevicesTwithTanTInternetTconnection,TAccessTtoTstoresTlocatedTremotely,TEasierTtoTcompareTprices,TReduceTemployeeTcosts.TDisadvantagesTofTe-commerceTare:TNeedTforTanTInternetTaccessTdeviceTandTconnection,TInabilityTtoTexperienceTtheTproductTbeforeTpurchase,TVulnerabilityTofTconfidentialTdata,TTechnicalTproblems,TPossibleTdelaysTorTproductTdamageTduringTdelivery.2.4TINFORMATIONTRETRIEVAL:MohamedTRedaBouadjenekTetTalT8,TScienceTthatTdealsTwithTtheTrepresentation,Tstorage,TorganizationTof,TandTaccessTtoTinformationTitemsTinTorderTtoTsatisfyTtheTuserTrequirementsTconcerningTtoTthoseTinformation.TToTimproveTtheTclassicTIRTprocessTandTreduceTtheTamountTofTirrelevantTdocuments:TQueryTreformulationT-TwhichTincludesTexpansionTorTreductionTofTtheTquery,TPost-filteringTorTre-rankingTofTtheTretrievedTdocuments,TImprovementTofTtheTIRTmodelT–TtheTwayTdocumentsTandTqueriesTareTrepresentedTandTmatchedTtoTquantifyTtheirTsimilarities.TQueryTreformulationTisTtheTprocessTwhichTconsistsTofTtransformingTanTinitialTqueryTQTtoTanotherTqueryTQ?.TThisTtransformationTmayTbeTeitherTaTreductionTorTanTexpansion.TQueryTReductionTreducesTtheTqueryTsuchTthatTsuperfluousTinformationTisTremoved,TwhileTQueryTExpansionTisTtoTenhanceTtheTqueryTwithTadditionalTinformationTlikelyTtoToccurTinTrelevantTdocuments.WanTetTalT9,TCompetitivenessTinTtheTcontextTofTproductTdesign.TInitialTstepTisTtheTdefinitionTofTaTdominanceTfunctionTthatTrepresentsTtheTvalueTofTaTproduct.TIdentificationTofTtheTdemandTforTtheTproductTandTprovidingTtheTsameTlevelTinTtheTentireTdomain.TheTgoalTisTthenTtoTuseTtheTfunctionTtoTcreateTitemsTthatTareTnotTdominatedTbyTother,TorTmaximizeTitemsTwithTtheTmaximumTpossibleTdominanceTvalue.TSimilarly,TitTrepresentsTitemsTasTpointsTinTaTmultidimensionalTspaceTandTlooksTforTsubspacesTwhereTtheTappealTofTtheTitemTisTmaximized.2.5TOPINIONTMINING:Marrese-TayloretTalT10,TOverallTopinionTpolarityTisTcalculatedTandTclassifiedTasTpositiveTorTnegative.TInTsentenceTlevel,TeachTsentenceTinTtheTdocumentTisTanalyzedTandTdeterminesTtheTopinionTexpressedTinTaTsentenceTasTpositive,Tnegative,TorTneutral.TInTopinionTmining,TtheTtermTaspectTmeansTimportantTfeaturesTofTproductsTratedTbyTcustomersT(ForTexample,TinTcaseTofTrestaurantTfood,Tservice,TcleanlinessTetc.).TTheTproductTandTrestaurantTreviewsTareTaTmixtureTofTpositiveTandTnegativeTopinionTaboutTdifferentTaspects.TItTneedsTmoreTfine-grainedTanalysisTofTreviewsTtoTmineTtheseTmixedTopinions,TaspectTlevelTperformTthisTtask.THenceTaspectTbasedTopinionTminingTisTpreferredTinTthisTwork.TTheTcoreTtasksTinTaspectTbasedTopinionTminingTisTaspectTidentification,TaspectTbasedTopinionTwordTidentificationTandTitsTorientationTdetection.VlachouTetTalT11,TTop-kTqueriesTareTwidelyTappliedTforTretrievingTtheTkTmostTinterestingTobjectsTbasedTonTtheTindividualTuserTpreferences.TClearly,TanTobjectT(product)TthatTisThighlyTrankedTbyTmanyTusersT(customers)ThasTobviouslyTaTwiderTvisibilityTandTimpactTinTtheTmarket.TThus,TanTintuitiveTdefinitionTofTtheTinfluenceTofTaTproductTinTtheTmarketTisTtheTnumberTofTcustomersTthatTconsiderTitTappealingT(theTproductTbelongsTtoTtheirTtop-kTresults)TbasedTonTtheirTpreferences.TIdentifyingTtheTmostTinfluentialTobjectsTfromTaTgivenTdatabaseTofTproductsTisTimportantTforTmarketTanalysisTandTdecision-makingTandTisTbeneficialTforTseveralTreal-lifeTapplications.AnaTValdiviaTetTalT11,TSentimentTclassification,TtheTbest-knownTsentimentTanalysisTtask,TaimsTtoTdetectTsentimentsTwithinTaTdocument,TaTsentence,TorTanTaspect.TThisTtaskTcanTbeTdividedTintoTthreeTsteps:TpolarityTdetectionT(labelTtheTsentimentTofTtheTtextTasTpositive,Tnegative,TorTneutral),TaspectTselection/extractionT(obtainTtheTfeaturesTforTstructuringTtheTtext),TclassificationT(applyTmachineTlearningTorTlexiconTapproachesTtoTclassifyTtheTtext).TTheTdetectionTofTironicTexpressionsTinTTripAdvisorTreviewsTisTanTopenTproblemTthatTcouldThelpTtoTextractTmoreTvaluableTinformation.TNeedTnewTapproachesTtoTfixTtheTpositive,TnegativeTandTneutralityTviaTconsensusTamongTSAMs.FarmanTAliTetTalT12,TMergedTontologyTandTSVMTbasedTrecommendationTandTinformationTextractionTsystedTautomatesTtheTextractionTofTpreciseTdataTfromTtheTInternetTandTsuggestsTaccurateTitemsTforTdisabledTusers.TATnumberTofTresonableTissuesTareTeffectivelyTconsidered.TOralTquestionsTconversionTintoTtheTrightTformatTforTaTkeywordTbasedTmostlyTcomputerTprogram.TItTcategorisesTtheTretrievedTinformationaTandTeffectivelyTcomputesTtheTtheTpolarityTforTtheTdesiredTitemsTthatTneedTtoTbeTrecommended.CHAPTERT3SYSTEMTDESIGNPROPOSEDTSYSTEMTARCHITECTURETFigT3.1TProposedTSystemArchitectureDuringTminingTprocess,TunstructuredTdataTisTnotTtakenTintoTaccountTandTmuchTvaluableTserviceTinformationTisTlost.TStructuredTsystemsTareTthoseTwhereTtheTdataTisTpredeterminedTandTwell-defined.TUsuallyTCustomerTreviewsTareTofTunstructuredTdata,TwhereTweTneedTtoTconvertTtoTstructuredTdataTandTthenTstartTusingTtheTmodifiedTdataTforTfurtherTprocess.TCOMPONENTSDataTCollectionDatabaseCustomerTReviewsFig3.2TDataTCollectionTDataTcollectionTisTcarriedToutTbyTtheTcustomersTofTtheThotel.TCustomerTusedTtoTprovideTtheTreviews.TReviewsTobtainedTforTtheThotelTmayTdifferTfromTcustomerTtoTcustomer;TitTisTsolelyTbasedTonTtheTcustomer’sTpreferenceTandTperspective.TReviewsThelpTbothTtheTcustomerTandTtheTcompetitorTtoTidentifyTtheTadvantagesTandTdisadvantagesTofTaTspecificThotel.TCustomerTreviewsTaboutTtheTHotelTareTstoredTinTtheTdatabaseTofTtheThotel.TReviewsTstoredTinTtheTdatabaseTareTinTtheTformTofTunstructuredTdata.DataTPreprocessingTDataTpreprocessingTinvolvesTtransformationTofTanTunstructuredTdataTintoTaTstructuredTdata.TCustomerTreviewTisTanTunstructuredTdata,TitTisTincomplete,TproperTinformationTwillTbeTmissedTandTmayTcontainTmanyTerrors.TDataTpreprocessingTisTaTprovenTmethodTofTresolvingTsuchTissues.TDataTpreprocessingTusesTtheTNLPTsuchTasTTokenization,TremovalTofTirrelevantTdataTandTstemming.TItTpreparesTtheTdataTforTfurtherTprocessing.AlgorithmTforTDataTPreprocessing:ProcessTtheTreviewsToneTbyToneTbasedTonTeachThotelRemoveTtheTwhitespaceTorTextraspaceTfromTtheTreview.RemoveTtheTnewTlineTspaceTfromTtheTreview.RemoveTtheTemoticonTandTsmileyTfromTtheTreview.BreakTtheTsentenceTintoTpartitions.TheTproblemTwithTtheTSmall-SpaceTisTthatTtheTnumberTofTsubsetsT{displaystyle ell }TthatTweTpartitionTSTintoTisTlimited,TsinceTitThasTtoTstoreTinTmemoryTtheTintermediateTmedians.TSo,TifTMTisTtheTsizeTofTmemory,TneedTtoTpartitionTSTintoT{displaystyle ell }TsubsetsTsuchTthatTeachTsubsetTfitsTinTmemory,T(n/{displaystyle ell })TandTsoTthatTtheTweightedT{displaystyle ell }kTcentersTalsoTfitTinTmemory,T{displaystyle ell }k;M.TIdentificationTofTFaultTReview:TFaultTreviewTcanTbeTprovidedTonlyTbyTunauthorizedTusers.TCustomerTcanTprovideTreviewsTonlyTwithTtheTBillTnumber,TthisThelpsTtoTreduceTtheTfaultTreviews.TReviewsTbyTPayableTagentTcanTalsoTbeTreduced,TbecauseTtheThotelTownersTwillTnotTprovideTtheTbillTinformationTtoTtheTagents.TFakeTbillTwillTnotTbeTgeneratedTinTanyTHotel.FeatureTSelectionTandTExtractionTFeatureTSelectionTrefersTtoTselectingTtheTmostTrelevantTattributesTandTFeatureTextractionTisTcombiningTattributesTintoTaTnewTreducedTsetTofTfeatures.AlgorithmTforTFeatureTSelection:TMinimumTDescriptionTLengthTisTanTinformationTtheoreticTmodelTselectionTprinciple.TItTassumesTthatTtheTsimplest,TmostTcompactTrepresentationTofTdataTisTtheTbestTandTmostTprobableTexplanationTofTtheTdata.TItTconsidersTeachTattributeTasTaTsimpleTpredictiveTmodelTofTtheTtargetTclass.AlgorithmTforTFeatureTExtraction:TNon-negativeTMatrixTFactorizationTisTaTstateTofTtheTartTfeatureTextractionTalgorithm.TNMFTisTusefulTwhenTthereTareTmanyTattributesTandTtheTattributesTareTambiguousTorThaveTweakTpredictability.TNMFTproducesTmeaningfulTpatterns.TPseudoTcodeTforTNaiveTBayes:StepT1:TConvertTtheTdataTsetTintoTaTfrequencyTtableStepT2:TCreateTLikelihoodTtable.StepT3:TNaiveTBayesianTequationTisTusedTtoTcalculateTtheTposteriorTprobabilityTforTeach.TTheTclassTwithTtheThighestTposteriorTprobabilityTisTtheToutcomeTofTprediction.PrT(CategoryT|TWord)T=TPr(TWord|CategoryT).Pr(Category)T/TPr(TWord)T/*HereTCategoryTrepresentTPositive,TnegativeTorTadviceandTWordTrepresentTGood,TbadTorTimprove*//*NaiveTBayesTusedTtoTpredictTtheTprobabilityTofTdifferentTclassTbasedTonTvariousTattributes.TThisTalgorithmTisTmostlyTusedTinTtextTclassificationTandTwithTproblemsThavingTmultipleTclasses.*/PseudoTcodeTforTSentimentalTAnalysis:ForTeachTwordTinTtheTreviewTifTtheTwordTisTinTtheTNegationListTNegNumT=T-1TelseTifTtheTwordTisTinTtheTAdvList:TadvNumT=T1TelseTifTtheTwordTisTinTtheTMainWordList:TifTtheTWordTvalueTisTPositive:TPosCountT=TPosCountT*negNum+advNumTelse:TNegCountT=TNegCountT*negNum-advNumTAddTasTPositiveTreviewTifTposCount+negCountT;Taccuracy:TpositiveT++TAddTasTNegativeTreviewTifTposCount+negCountT;T-accuracy:TnegativeT++3.2.4TVisualizationTVisualizationTrepresentsTtheTdiagrammaticTorTstatisticalTrepresentationTofTtheTprocessedTdataTasTanTOutput.TBarTchartTrepresentationTisTused,TwhereTx-axisTrepresentsTtheTHotelsTandTy-axisTrepresentsTtheTlevelTofTrating.CHAPTERT4IMPLEMENTATIONTANDTRESULTSImplementationTisTdoneTbasedTinTJavaTlanguage.TTheTalgorithmsTcanTeitherTbeTappliedTdirectlyTtoTaTdatasetTorTcalledTfromTJavaTcode.DataTcollectionTprocessTisTimplementedTbyTgettingTtheTreviewsTfromTtheTcustomer.TCustomerTcanTchooseTtheThotelTnameTandTcanTprovideTtheTreview.TReviewsTprovidedTbyTtheTcustomerTisTbasedTonTtheTopinionTofTself.TReviewsTofTaTsameThotelTmayTdifferTfromThotelTtoThotel.TCustomerTreviewsTwillTbeTstoredTinTtheTdatabase.TInformationTstoredTinTtheTdatabaseTofTaThotelTisTinTtheTunstructuredTformat.HotelTName Rating ReviewHotelTRussoTPalace 4 GoodTlocationTawayTfromTtheTcroudsHotelTRussoTPalace 5 GreatThotelTwithTJacuzziTbath!LittleTParadiseTHotel 5 PureTdelight!LittleTParadiseTHotel 4 NiceTHotelFairfieldTInnTByTMarriottTBinghamton 4 GoodTplaceTtoTvisitFairfieldTInnTByTMarriottTBinghamton 4 OverallTgoodFairfieldTInnTByTMarriottTBinghamton 3 DisappointedFairfieldTInnTByTMarriottTBinghamton 4 EnjoyableFairfieldTInnTByTMarriottTBinghamton 4 GreatThotelT-TgreatTlocationFairfieldTInnTByTMarriottTBinghamton 5 GoodThotel.DaysTInnTElTRenoTOk 1 DecentTPlaceDaysTInnTElTRenoTOk 3 NoisyDaysTInnTElTRenoTOk 4 NiceThotelDaysTInnTElTRenoTOk 2 DisgustingDaysTInnTElTRenoTOk 2 OldTsmellyTroomsTableT4.1THotelTReviewTDatasetDatasetsTareTalsoTdownloadedTfromTonline15,16,17.THotelTDatasetTSizeTisT1TGB.TTheTdownloadTdatasetsTalsoTwillTbeTinTunstructuredTdataTformat.TTheseTdatasetsTareTneedTtoTbeTmodifiedTtoTstructuredTformatTbyTpreprocessingTtheTdata.DataTPre-processingTinvolvesTtheTcorrectionTofTinformationTfromTtheTdatasetsTandTmakingTtheTdataTcompleteTandTinTtheTstructuredTformat.TPre-processingTisTdoneTwithTtheThelpTofTWekaTtool,TwhereThotelTdatasetTfileTwillTbeTpre-processed.TToTpreprocessTinTwekaTtool,TtheTdatasetTshouldTbeTinT.arffTfileTformat.TWekaTpreprocessToperationTsupportsTonlyTfileTformatTofTARFF.TFirstTstepTisTtoTconvertTtheTfileTfromT.csvTtoT.arffTformat.ThenTloadTtheThotelTdatasetTwhichTisTinT.arffTformat.TItTwillTlistTtheTnumberTofTattributesTinTtheTloadedTdatasetTfile.TTheTemptyTorTnullTvalueTinTtheTdatasetTareTeliminatedTorTfiltered.TFinally,TtheTprocessedTdatasetTwillTbeTobtainedTwhichTwillTbeTusedTforTfurtherTprocessing.InTtheTfirstTphase,TtheTdataTisTcollectedTandTtheTcollectedTdataTisTpreprocessed.TWithTtheTpreprocessedTdata,TableTtoTidentifyTtheTreviewsTwhetherTitTisTpositiveTreview,TnegativeTreviewTorTadviceTreviewTtoTcustomerTorThotelTowner(competitor).TInTtheTsecondTphase,TfeatureTextractionTwillTbeTdoneTandTtheTresultTwillTbeTshownTinTtheTformTofTchartTthatThelpsTtheTcustomerTtoTchooseTtheTbestThotelTandTalsoTtheTcompetitorTtoTefficientlyTmakeTthemTstableTinTtheTsocietyTwithTtheTstrengthTandTservicesTtoTimproveTinTfuture.TSCREENSHOTHomePageT:ReviewsTaboutTeachThotel:HolidayInnTReviewTPageT:TajTHotelTReviewTPage:CustomerTreviewTpage:TCustomerToverTallTrating:CHAPTERT5CONCLUSIONTANDTFUTURETWORK5.1TCONCLUSIONBestThotelsTareTrecommendedTforTtheTbusinessTcompetitorsTandTcustomer.TNaiveTBayesTalgorithmTwasTusedTtoTidentifyTtheTcompetitorsTofTselectedThotels.TItTsupportsTtoTimproveTtheTbusinessTandTalsoTprovidingTappropriateTcompetitorsTofTtheTbusinessTtoTtheTcustomerTneed.TTheTproposedTworkThelpsTtheTcompetitorTtoTfindTtheTwayTforTbuildingTtheTbusinessTandTcustomerTtoTchooseTtheTbestThotelTthatTsatisfiesTtheTneed.T5.2TFUTURETWORKForTtheTfutureTenhancement,TfeaturesTandTprocessTneedTtoTbeTconsideredTinTtheTalgorithmsTforTtheTbetterTresults.TAlgorithmTneedsTtoTbeTmodifiedTeffectivelyTinTorderTtoTmakeTtheTresultTusefulTtoTotherTcustomersTtoTidentifyTtheTtopThotelsTevenTmoreTinTaTbetterTwayTinTthecity.

x

Hi!
I'm Mary!

Would you like to get a custom essay? How about receiving a customized one?

Check it out