cnn_dailymail_6789_3000_1500_train
This is a BERTopic model. BERTopic is a flexible and modular topic modeling framework that allows for the generation of easily interpretable topics from large datasets.
Usage
To use this model, please install BERTopic:
pip install -U bertopic
You can use the model as follows:
from bertopic import BERTopic
topic_model = BERTopic.load("KingKazma/cnn_dailymail_6789_3000_1500_train")
topic_model.get_topic_info()
Topic overview
- Number of topics: 54
- Number of training documents: 3000
Click here for an overview of all topics.
Topic ID | Topic Keywords | Topic Frequency | Label |
---|---|---|---|
-1 | said - people - one - police - year | 10 | -1_said_people_one_police |
0 | player - league - cup - club - game | 1072 | 0_player_league_cup_club |
1 | police - said - death - murder - found | 291 | 1_police_said_death_murder |
2 | obama - president - republicans - house - republican | 152 | 2_obama_president_republicans_house |
3 | labour - mr - cameron - minister - prime | 98 | 3_labour_mr_cameron_minister |
4 | hospital - baby - surgery - heart - doctor | 77 | 4_hospital_baby_surgery_heart |
5 | iphone - apple - user - device - phone | 74 | 5_iphone_apple_user_device |
6 | doll - fashion - look - collection - like | 69 | 6_doll_fashion_look_collection |
7 | syria - isis - syrian - iraq - iraqi | 46 | 7_syria_isis_syrian_iraq |
8 | pakistan - taliban - al - drone - afghanistan | 45 | 8_pakistan_taliban_al_drone |
9 | food - restaurant - menu - burger - coffee | 43 | 9_food_restaurant_menu_burger |
10 | car - driver - vehicle - crash - driving | 41 | 10_car_driver_vehicle_crash |
11 | space - tower - car - airport - nasa | 40 | 11_space_tower_car_airport |
12 | property - house - home - apartment - room | 40 | 12_property_house_home_apartment |
13 | school - rape - sexual - student - sex | 36 | 13_school_rape_sexual_student |
14 | nfl - rice - quarterback - said - coach | 36 | 14_nfl_rice_quarterback_said |
15 | music - album - song - miley - cnn | 33 | 15_music_album_song_miley |
16 | olympic - gold - olympics - athlete - world | 33 | 16_olympic_gold_olympics_athlete |
17 | zoo - bear - tian - elephant - ivory | 33 | 17_zoo_bear_tian_elephant |
18 | flight - plane - aircraft - pilot - airport | 32 | 18_flight_plane_aircraft_pilot |
19 | flu - bacteria - vaccine - health - disease | 31 | 19_flu_bacteria_vaccine_health |
20 | dog - animal - pet - cat - dogs | 30 | 20_dog_animal_pet_cat |
21 | school - education - exam - child - degree | 30 | 21_school_education_exam_child |
22 | kenya - kenyan - mall - said - nairobi | 28 | 22_kenya_kenyan_mall_said |
23 | cent - per - price - cadbury - christmas | 27 | 23_cent_per_price_cadbury |
24 | french - france - sarkozy - hollande - minister | 26 | 24_french_france_sarkozy_hollande |
25 | russian - ukraine - russia - putin - ukrainian | 25 | 25_russian_ukraine_russia_putin |
26 | iran - nuclear - iranian - israel - irans | 24 | 26_iran_nuclear_iranian_israel |
27 | film - bond - novel - the - cnn | 24 | 27_film_bond_novel_the |
28 | lava - fire - snow - pahoa - volcano | 24 | 28_lava_fire_snow_pahoa |
29 | drug - mexican - chavez - cartel - said | 23 | 29_drug_mexican_chavez_cartel |
30 | ship - vessel - captain - crew - coast | 23 | 30_ship_vessel_captain_crew |
31 | snowden - us - intelligence - information - gebregeorgis | 23 | 31_snowden_us_intelligence_information |
32 | match - wimbledon - federer - final - open | 22 | 32_match_wimbledon_federer_final |
33 | chinese - china - beijing - hong - protester | 21 | 33_chinese_china_beijing_hong |
34 | jury - white - ferguson - police - said | 21 | 34_jury_white_ferguson_police |
35 | weather - temperature - rain - warm - park | 21 | 35_weather_temperature_rain_warm |
36 | prince - royal - william - princess - queen | 20 | 36_prince_royal_william_princess |
37 | weight - fat - diet - gym - size | 19 | 37_weight_fat_diet_gym |
38 | golf - mcilroy - round - pga - championship | 19 | 38_golf_mcilroy_round_pga |
39 | hamilton - race - rosberg - prix - button | 19 | 39_hamilton_race_rosberg_prix |
40 | north - kim - korean - korea - koreas | 18 | 40_north_kim_korean_korea |
41 | human - found - fossil - ancient - fish | 18 | 41_human_found_fossil_ancient |
42 | climate - change - global - energy - wind | 17 | 42_climate_change_global_energy |
43 | school - teacher - pupil - schools - ofsted | 17 | 43_school_teacher_pupil_schools |
44 | ebola - virus - health - outbreak - liberia | 17 | 44_ebola_virus_health_outbreak |
45 | whale - nyad - shark - swim - beach | 17 | 45_whale_nyad_shark_swim |
46 | money - kallakis - foster - court - wines | 15 | 46_money_kallakis_foster_court |
47 | painting - art - portrait - auction - artist | 14 | 47_painting_art_portrait_auction |
48 | solar - planet - sun - bubble - earth | 14 | 48_solar_planet_sun_bubble |
49 | tsarnaev - oswald - boston - marathon - kennedy | 14 | 49_tsarnaev_oswald_boston_marathon |
50 | patient - care - va - hospital - patients | 14 | 50_patient_care_va_hospital |
51 | love - woman - im - relationship - men | 13 | 51_love_woman_im_relationship |
52 | marijuana - alcohol - drug - hangover - liver | 11 | 52_marijuana_alcohol_drug_hangover |
Training hyperparameters
- calculate_probabilities: True
- language: english
- low_memory: False
- min_topic_size: 10
- n_gram_range: (1, 1)
- nr_topics: None
- seed_topic_list: None
- top_n_words: 10
- verbose: False
Framework versions
- Numpy: 1.22.4
- HDBSCAN: 0.8.33
- UMAP: 0.5.3
- Pandas: 1.5.3
- Scikit-Learn: 1.2.2
- Sentence-transformers: 2.2.2
- Transformers: 4.31.0
- Numba: 0.56.4
- Plotly: 5.13.1
- Python: 3.10.6
- Downloads last month
- 3
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.