SentenceTransformer based on sentence-transformers/multi-qa-mpnet-base-dot-v1
This is a sentence-transformers model finetuned from sentence-transformers/multi-qa-mpnet-base-dot-v1. It maps sentences & paragraphs to a 768-dimensional dense vector space and can be used for semantic textual similarity, semantic search, paraphrase mining, text classification, clustering, and more.
Model Details
Model Description
- Model Type: Sentence Transformer
- Base model: sentence-transformers/multi-qa-mpnet-base-dot-v1
- Maximum Sequence Length: 512 tokens
- Output Dimensionality: 768 tokens
- Similarity Function: Dot Product
Model Sources
- Documentation: Sentence Transformers Documentation
- Repository: Sentence Transformers on GitHub
- Hugging Face: Sentence Transformers on Hugging Face
Full Model Architecture
SentenceTransformer(
(0): Transformer({'max_seq_length': 512, 'do_lower_case': False}) with Transformer model: MPNetModel
(1): Pooling({'word_embedding_dimension': 768, 'pooling_mode_cls_token': True, 'pooling_mode_mean_tokens': False, 'pooling_mode_max_tokens': False, 'pooling_mode_mean_sqrt_len_tokens': False, 'pooling_mode_weightedmean_tokens': False, 'pooling_mode_lasttoken': False, 'include_prompt': True})
)
Usage
Direct Usage (Sentence Transformers)
First install the Sentence Transformers library:
pip install -U sentence-transformers
Then you can load this model and run inference.
from sentence_transformers import SentenceTransformer
# Download from the 🤗 Hub
model = SentenceTransformer("BenElliot27/multi-qa-mpnet-base-dot-v1-ATLAS-TALK")
# Run inference
sentences = [
'Failure to read evgen file in Rivet grid job? What I see is that you are not producing ttz_analysis.yoda, and the code is producing a user.narayan.6831283.EXT0._000026.ttz_analysis.yoda with zero file size.\nCheers,\nAlden',
'Hi all,\nI\'m having trouble running Rivet in Athena with private evgen as input, see:\nhttp://bigpanda.cern.ch/task/4532210/\nThe error given is "pilot: Encountered zero file size for file user.mcfayden.4532210.EXT0._000003.187522.Zjets.yoda", but actually the problem seems to be due to the evgen file not being opened correctly which means Rivet had no events to run over:\nEventSelector INFO EventSelection with query\nDbSession Info Open DbSession\nDomain[ROOT_All] Info > Access DbDomain READ [ROOT_All]\nDomain[ROOT_All] Info -> Access DbDatabase READ [ROOT_All] 9651733E-360B-424D-B1BC-51B25F68B05D\nDomain[ROOT_All] Info user.mcfayden.4532110.EXT2._000001.mc12_7TeV.187522.EVNT.root\nRootDBase.open Success user.mcfayden.4532110.EXT2._000001.mc12_7TeV.187522.EVNT.root File version:53005\nImplicitCollection Info Opened the implicit collection with connection string "PFN:user.mcfayden.4532110.EXT2._000001.mc12_7TeV.187522.EVNT.root"\nImplicitCollection Info and a name "POOLContainer(DataHeader)"\nAthenaSummarySvc INFO -> file incident: FID:9651733E-360B-424D-B1BC-51B25F68B05D [GUID: FID:9651733E-360B-424D-B1BC-51B25F68B05D]\nPoolSvc INFO Failed to find container MetaDataHdrDataHeader to get Token.\nEventPersistenc... INFO Added successfully Conversion service:AthenaPoolCnvSvc\nAthenaPoolConve... ERROR Failed to convert persistent object to transient: FID "74385E9E-38B2-7F4F-A610-B42059934C68" is not existing in the catalog ( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )\nAthenaPoolConve... ERROR createObj PoolToDataObject() failed, Token = [DB=9651733E-360B-424D-B1BC-51B25F68B05D][CNT=MetaDataHdr(DataHeader)][CLID=D82968A1-CF91-4320-B2DD-E0F739CBC7E6][TECH=00000202][OID=000000000000000C-0000000000000000]\nDataProxy WARNING accessData: conversion failed for data object 222376821/;00;MetaDataSvc\n Returning NULL DataObject pointer\nMetaDataSvc ERROR Could not get DataHeader, will not read Metadata\nFull log here:\nhttp://aipanda057.cern.ch/media/filebrowser/be0426bd-2ec0-49bc-9462-eb822ca3c9f3/tarball_PandaJob_2330211990_ANALY_NIKHEF-ELPROD_SHORT/athena_stdout.txt\nRunning the same job but using officially produced evgen (from a much older release) as input works just fine, see:\nhttp://bigpanda.cern.ch/task/4532297/\nIt even works with privately produced evgen from a few months ago, see:\nhttp://bigpanda.cern.ch/task/4511752/\nAlso, if I download the input file and run on it locally it runs with no problems.\nAny ideas what the problem might be here?\nCheers,\nJosh.\n\nHi\nThank you for looking into it. But I figured out the problem. I was sending an so file which was compiled with a different athena release than the grid version \nNow that I have figured out the problem, it works fine\nCheers\nRohin\n\nHi Josh.\nYour read on the situation matches mine. The input file is in place and of the right size throughout the operation.\nItems to troubleshoot from here out include: possible Athena version compatibility or ROOT version mismatch, or a subtle site error. It’s failed on a retry, so that’s not good.\nCould you download the exact file, if you haven’t already, and run it locally. Send me the output and the results of the ls and env commands?\nThanks,\nAlden\n\nDear experts,\nplease excuse me referring back to this old thread. I’m struggling with the same problem, running my custom rivet code on evgen files on the grid (http://bigpanda.cern.ch/task/8787362/). Locally the code runs just fine on these files, not on the grid though.\nThe error occurs while accessing the evgen files and the job finishes with:\nPilot error 1191: Encountered zero file size for file user.tkupfer.8787362.EXT0._000003.WYWb900LH05.yoda\nHere is a part of the athena stdout:\nRootCollection Info Opening Collection File dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/rucio/user/fschenck/06/5e/mc15_13TeV.WYWb900LH05.10000.1.evgen.root in mode: READ\nRootCollection Info File dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/rucio/user/fschenck/06/5e/mc15_13TeV.WYWb900LH05.10000.1.evgen.root opened\nDbSession Info Open DbSession \nDomain[ROOT_All] Info > Access DbDomain READ [ROOT_All] \nDomain[ROOT_All] Info -> Access DbDatabase READ [ROOT_All] 4B75BCC9-FAA4-4E2F-AC15-A2B26FF20048\nDomain[ROOT_All] Info dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/rucio/user/fschenck/06/5e/mc15_13TeV.WYWb900LH05.10000.1.evgen.root\nRootDatabase.open Success dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/rucio/user/fschenck/06/5e/mc15_13TeV.WYWb900LH05.10000.1.evgen.root File version:53413\nImplicitCollection Info Opened the implicit collection with connection string "PFN:dcap://dcache-atlas-dcap.desy.de:22125//pnfs/desy.de/atlas/dq2/atlaslocalgroupdisk/rucio/user/fschenck/06/5e/mc15_13TeV.WYWb900LH05.10000.1.evgen.root"\nImplicitCollection Info and a name "POOLContainer(DataHeader)"\nAthenaSummarySvc INFO -> file incident: FID:4B75BCC9-FAA4-4E2F-AC15-A2B26FF20048 [GUID: FID:4B75BCC9-FAA4-4E2F-AC15-A2B26FF20048]\nPoolSvc INFO Failed to find container MetaDataHdrDataHeader to get Token.\nEventPersistenc... INFO Added successfully Conversion service:AthenaPoolCnvSvc\nAthenaPoolConve... ERROR Failed to convert persistent object to transient: FID "613EA41B-C384-2247-96A7-82EEABEA23B1" is not existing in the catalog ( POOL : "PersistencySvc::UserDatabase::connectForRead" from "PersistencySvc" )\nAthenaPoolConve... ERROR createObj PoolToDataObject() failed, Token = [DB=4B75BCC9-FAA4-4E2F-AC15-A2B26FF20048][CNT=MetaDataHdr(DataHeader)][CLID=D82968A1-CF91-4320-B2DD-E0F739CBC7E6][TECH=00000202][OID=000000000000000B-0000000000000000]\nDataProxy WARNING accessData: conversion failed for data object 222376821/;00;MetaDataSvc\n Returning NULL DataObject pointer \nMetaDataSvc ERROR Could not get DataHeader, will not read Metadata\nMetaDataSvc WARNING Unable to load MetaData Proxies\n\nI\'ve tried to figure out whether different athena releases are used to compile the .so files and to run the code on the grid, since\nthis seems to have solved the problem before.\nI\'ve already tried many combinations of commands to specify the AthenaTag and to set up the local athena version on lxplus, but without any success..\n\nBeing very precise on the version in the end:\nasetup 20.1.8.3,AtlasProduction,64,here (locally)\n--athenaTag=20.1.8.3,AtlasProduction,64 (grid)\n\nwasn\'t successful neither and I\'ve still suspicious about the proper athena setup because it says:\ntransUses : Atlas-20.1.8 \ntranshome : AnalysisTransforms-AtlasProduction_20.1.8.3\n\nI\'m not very used to the grid and most likely I\'m doing something stupid.\nSo, please let me know if there is any trick to set up athena on the grid properly, or if this problem has been solved any other way.\n\nThanks in advance!\n\nBest,\nTobias\n\nHi Alden,\nRunning on the same file locally works without any problem\n(File: user.mcfayden.evnt.test.2014-12-08_124829.187522.test_EXT2/user.mcfayden.4532110.EXT2._000001.mc12_7TeV.187522.EVNT.root)\nThe full log and output of ls and env are attached.\nCheers,\nJosh.\nlog.txt (77 KB)\nls.txt (708 Bytes)\nenv.txt (40.1 KB)\n\nRight. Looks like it runs well – so I am at a loss.\nI’ll put some more time into this tomorrow. Sorry.\nCheers,\nAlden\n\nHi Alden,\nI think I might have found the issue.\nI’m just waiting for some jobs to finish to confirm this, so maybe wait before putting too much time into this. \nCheers,\nJosh.\n\nHi again,\nYep, it looks like the problem is due to the fact that I had this in my pathena command:\n–extOutFile=“*mc12_7TeV.187522.EVNT.root”\nI think that this was required in the pre-JEDI days when running two transforms in one job to retrieve the intermediate files.\nAnd it essentially meant that I had the same output file in two output containers, *_EXT1 and *_EXT2.\nMore details:\nFailed task: http://bigpanda.cern.ch/task/4548422/ (with input from: http://bigpanda.cern.ch/task/4547050/)\nSucceeded task: http://bigpanda.cern.ch/task/4548421/ (with input from: http://bigpanda.cern.ch/task/4546848/)\nI have no idea why this causes the file not to be read properly as input for other tasks… but at least I have a fix!\nCheers,\nJosh.\n\nThanks, Josh – that looks like a good fix.\nCheers,\nAlden',
'Hi UK loud support,\nwould you please check what is the issue in accessing these files\nin (*).\nI have checked this one and the error is here:\nTrying SURL srm://srm-atlas.gridpp.rl.ac.uk:8443/srm/managerv2?SFN=/castor/ads.rl.ac.uk/prod/atlas/StripDeg/atlasgroupdisk/phys-beauty/rucio/data11_7TeV/a8/23/DAOD_ONIAMUMU.594591._000001.pool.root.1 ...\n[SE][Ls][SRM_INVALID_PATH] No such file or directory\n Thanks.\n Cheers,\n Farida\n(*)\ndata11_7TeV:DAOD_ONIAMUMU.594591._000001.pool.root.1\ndata12_8TeV:DAOD_JPSIMUMU.01237672._000076.pool.root.1\ndata12_8TeV:DAOD_JPSIMUMU.01237615._000026.pool.root.1\n\nDear Farida, dear UK cloud support,\nsorry for disturbing you again,\nis there some progress for recovering those three DAOD files?\ndata11_7TeV:DAOD_ONIAMUMU.594591._000001.pool.root.1\ndata12_8TeV:DAOD_JPSIMUMU.01237672._000076.pool.root.1\ndata12_8TeV:DAOD_JPSIMUMU.01237615._000026.pool.root.1\nBest regards,\nVladimir.\n\nHi UK cloud support,\nUser is still waiting for your feedback to fix the issue related to the below files. I have just tried and seems the issue persist (*)\nThanks for looking it this.!\n Cheers,\n Farida\n(*)\nrucio download --protocol srm data11_7TeV:DAOD_ONIAMUMU.594591._000001.pool.root.1\n2016-04-14 22:57:48,933 INFO [Starting download for data11_7TeV:DAOD_ONIAMUMU.594591._000001.pool.root.1 with 1 files]\n2016-04-14 22:57:49,014 INFO [Starting the download of data11_7TeV:DAOD_ONIAMUMU.594591._000001.pool.root.1]\n2016-04-14 22:57:50,884 WARNING [Source file not found.\nDetails: Source file not found.\nDetails: Could not open source: error on the turl request : [SE][PrepareToGet][SRM_INVALID_PATH] No such file or directory]\n2016-04-14 22:57:51,082 WARNING [Source file not found.\nDetails: Source file not found.\nDetails: Could not open source: error on the turl request : [SE][PrepareToGet][SRM_INVALID_PATH] No such file or directory]\n2016-04-14 22:57:51,345 WARNING [Source file not found.\nDetails: Source file not found.\nDetails: Could not open source: error on the turl request : [SE][PrepareToGet][SRM_INVALID_PATH] No such file or directory]\n2016-04-14 22:57:51,579 WARNING [Source file not found.\nDetails: Source file not found.\n\nHi Vladimir\nThe first file is meant to be at RAL (data11_7TeV:DAOD_ONIAMUMU.594591._000001.pool.root.1). I have checked and it does not exist. As this was the only replica of that data it is unfortunately lost.\nThe other two files are meant to be at Lancaster, I will ask the site admin to check but I suspect they are likely to be lost too.\nSorry about this. I’ll start a separate thread in the B-Physics mail list about what we can do to recover them.\nAlastair',
]
embeddings = model.encode(sentences)
print(embeddings.shape)
# [3, 768]
# Get the similarity scores for the embeddings
similarities = model.similarity(embeddings, embeddings)
print(similarities.shape)
# [3, 3]
Training Details
Training Dataset
Unnamed Dataset
- Size: 11,044 training samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 14 tokens
- mean: 297.34 tokens
- max: 512 tokens
- min: 34 tokens
- mean: 458.93 tokens
- max: 512 tokens
- Samples:
anchor positive We need a PATHelp thread here @jburr mentioned that we could create a PATHelp thread here. I don’t think I have permissions, @lheinric what do you think? I guess it’s hard without Attila and friends on board.
I don’t think they would be opposed to the idea, though whether they have the time to provide significant support is another question.
Either way - this would need to be well-advertised as an alternative (or even replacement!?) for PAT help.
Significant support just means watching a category and replying to it though, right? If we could get a few people behind it it might build some momentum.
I’m also curious as to whether it might get some help from above: I believe that Microsoft is increasing our licensing fees for things like sharepoint so in some kind of ideal world we’d move away from it entirely. Of course getting rid of it entirely will take many years and I don’t know if there’s any financial advantage to a partial migration…
By an astounding coincidence take a look at the last slide in today’s ASG intro…
https://indico.cern.ch/event/801152/contributions/3329455/attachments/1800673/2936936/ASGIntro22022019.pdf
Awesome, well as a starting point I guess we need @akras to complete to something.
A question raised from the ASG meeting: is it possible to tag a mailing list or something similar?
I’m sure you can make a bot that does it, but that goes back to the same issue where we have to implement that ourselves.
So… February 22 was quite a while ago. Just saying Any ideas to improve usage?
I didn’t attend the ASG meeting when people discussed this, but I think I might just start posting my stupid questions here.
But as a starting point encourage people to set categories they are interested in to “watching first post” which will notify them when there are new threads.
I’m watching ASG and Machine Learning now. There’s nothing to watch, but I’m watching it like a boss.
Did this ever go anywhere? I’m from DAST, and it would also make sense to move DAST here.
But yeah, there’s a lot of momentum behind the mailing list, so being able to forward mails from the mailing list here, and send responses to the user’s mail would be a good way to get the migration going.
It seems like it could be done: https://meta.discourse.org/t/configuring-reply-via-email-e-mail/42026
Personally I think funneling DAST through this thread would be great. The only downsides I see are that:
From that link it looks like we can only use one email account. Their suggested workaround was to forward everything to one account and then filter into categories on the discourse side.
Someone would have to set it up. @fschenck are you volunteering?
The overall result will probably be better if we can get everyone using the forum directly rather than just using it to log emails. That said, I think anything which gets people onto a forum is still better than what we do now.
@lheinric what do you think?
Ah, I didn’t know it’s one email account for the entire discourse, but a filter would be simple.
Well, I could give it a go.
I also think it makes sense to move DAST here as we end up answering the same questions a lot, as searching the mailing list archives is a PITA even if you know what you’re doing.
But it would be important to keep both running for a while as people know the mailing list.
@fschenck this is me replying to see if your mailing list integration worked.Alternatives to twiki This doesn’t really seem like a replacement for twiki, but that being said it would be really awesome if we did have a replacement for twiki (i.e. I’d use it for whatever groups I lead). I’ve been looking around and found fosswiki and XWiki both of which look like better maintained alternatives. @lheinric, do you know if CERN IT supports anything like this (or who I should ask about what they support)?
While I’ll never be the first person to defend twikis, unless there is some way to wholesale copy across the entire existing twiki I don’t think any replacement will be viable. There is just far too much documented on twikis and we run this risk…
Right, I knew which one that was before I clicked on it.
But to be honest I don’t see the risk: there’s a well established interface between the two (the URL), and if there’s something out there that performs better that outweighs the inconvenience in my mind. I’m admittedly not an expert on twiki, but I’m not convinced that there are any features there that lock us in.
I’m nothing if not predictable
My worry would be that it leaves there being yet one more place to look for documentation, so documentation can get lost in more places. It also leaves people learning yet another system (which, let’s face it, physicists are rather loath to do).
From that wikipedia page it does sound like Foswiki is broadly compatible so it might be possible to transfer things across. Still, transferring how your whole documentation is structured is hardly going to be a small project. Links from slides, code, etc will become dead (unless we can set up some sort of automatic forwarding).
Again, not saying that I think it’s doomed to fail but I doubt it would be a simple switch.
I think the gitbook one is nice for some uses-cases of documentation that can be authored collaboratively. I somewhat like the idea of an say analysis-specific gitbook, but it’s definitely not a “wiki” (also turnaround time to publishing is a bit higher)
We’re toying with the idea of replacing the ML Forum twiki with CodiMD. There are still some quarks to work out (the indexing between pages is a bit sloppy) but so far it seems like a nice alternative.How do I
rucio get
one file? I have a dataset:
mc16_13TeV:mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.merge.AOD.e6337_e5984_s3126_r10201_r10210_tid14774488_00
and my job is failing on the file
AOD.14774488._000007.pool.root.1
in that dataset. How do I download this file alone?
I tried
rucio get AOD.14774488._000007.pool.root.1
and
rucio get mc16_13TeV:mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.merge.AOD.e6337_e5984_s3126_r10201_r10210_tid14774488_00/AOD.14774488._000007.pool.root.1
but both print something like
2019-07-17 08:53:22,099 INFO Processing 1 item(s) for input
2019-07-17 08:53:22,099 INFO Getting sources of DIDs
2019-07-17 08:53:22,244 INFO Using main thread to download 0 file(s)
2019-07-17 08:53:22,244 ERROR None of the requested files have been downloaded.Hi @dguest
I first checked that the file you are requesting exists in the dataset by doing,
rucio list-files mc16_13TeV:mc16_13TeV.410470.PhPy8EG_A14_ttbar_hdamp258p75_nonallhad.merge.AOD.e6337_e5984_s3126_r10201_r10210_tid14774488_00
and get the following output (truncated and ending at the sought-for file):
+---------------------------------------------+--------------------------------------+-------------+------------+----------+ - Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 1.0, "similarity_fct": "dot_score" }
Evaluation Dataset
Unnamed Dataset
- Size: 2,762 evaluation samples
- Columns:
anchor
andpositive
- Approximate statistics based on the first 1000 samples:
anchor positive type string string details - min: 21 tokens
- mean: 278.68 tokens
- max: 512 tokens
- min: 19 tokens
- mean: 440.02 tokens
- max: 512 tokens
- Samples:
anchor positive Job submission failure Hello,
Same here. I submitted 2 jobs over 3 and the failed one has the error:
Failed to connect to host.
ERROR : Failed to get allowed site list
Thanks,
ClementHi DAST experts,
While submitting jobs, some of my jobs (not all ) wouldn’t get submitted and I am getting the following prun error messages:
Error1:
ERROR: Failed to get allowed site list
Failed to connect to host. / >> SSL connect error. The SSL handshaking failed.
Error2:
ERROR: failed to upload source files with 255
ERROR: Could not check Sandbox duplication with 35
My Setup:
localSetupDQ2Client --skipConfirm
localSetupPandaClient --noAthenaCheck
voms-proxy-init -voms atlas
Can you please point me the source of this problem? Thank you for your time in advance.
Cheers,
Hasib
Hi,
I have the same error.
Best, Haifeng
Hi Hasid,
It seems to me an issue with the first authentication which is failing now for some users too.
Some issue on the ATLAS VO (.. of unavailable CRL) can provide such error,
I will cc voms experts to see whether is a problem with the host server.
Users get these errors () after doing the panda_client setup.
Thanks!
Cheers,
Farida
()
Error1:
ERROR: Failed to get allowed site list
>> Failed to connect to host. / >> SSL connect error. The SSL handshaking failed.
Error2:
ERROR: failed to upload source files with 255
Hi,
I do not think this is a VO issue but a pandaserver timeout issue …
note that my rf. compliant pilots are failing …as well as interactive users who want to submit jobs,
eg:
31 Oct 15:43:06dq2-get: globus_xio Input/output error Dear all,
I’m trying since few days now to download some datasets which fail due to Input/output error.
For example, trying dq2-get user.cinca.169889_E01-00_S01-00_tag0_JES_EFFECTIVE_STATISTICAL1_DW.SelHadTop_mySimpleTree.root/:
stderr:Using grid catalog type: UNKNOWN
Using grid catalog : (null)
VO name: atlas
Checksum type: None
Trying SURL srm://svr018.gla.scotgrid.ac.uk:8446/srm/managerv2?SFN=/dpm/gla.scotgrid.ac.uk/home/atlas/atlasscratchdisk/rucio/user/cinca/e0/c9/user.cinca.4322300._000003.mySimpleTree.root …
Source SE type: SRMv2
Source SRM Request Token: 82e64a92-eef9-4994-bf9d-4779aa505e57
Source URL: srm://svr018.gla.scotgrid.ac.uk:8446/srm/managerv2?SFN=/dpm/gla.scotgrid.ac.uk/home/atlas/atlasscratchdisk/rucio/user/cinca/e0/c9/user.cinca.4322300._000003.mySimpleTree.root
File size: 125606257
Source URL for copy: gsiftp://disk046.gla.scotgrid.ac.uk/disk046.gla.scotgrid.ac.uk:/gridstore3/atlas/2014-10-28/user.cinca.4322300._000003.mySimpleTree.root.232695315.0
Destination URL: file:/afs/cern.ch/work/c/cinca/eos/atlas/user/c/cinca/E01-00_S01-00/lo_JES/user.cinca.169889_E01-00_S01-00_tag0_JES_EFFECTIVE_STATISTICAL1_DW.SelHadTop_mySimpleTree.root.8510447/user.cinca.4322300._000003.mySimpleTree.root
streams: 1
globus_xio: Unable to open file /afs/cern.ch/work/c/cinca/eos/atlas/user/c/cinca/E01-00_S01-00/lo_JES/user.cinca.169889_E01-00_S01-00_tag0_JES_EFFECTIVE_STATISTICAL1_DW.SelHadTop_mySimpleTree.root.8510447/user.cinca.4322300._000003.mySimpleTree.root
globus_xio: System error in open: Input/output error
globus_xio: A system call failed: Input/output error
Could you please tell me how I could solve this problem as at some points the samples will be erased from the grid ?
Thanks for your help,
DianeHi Diane,
Maybe your home directory partition is full. Try if you can create new file. Also try to download the file in /tmp directory?
Better way, if you need to keep the dataset for longer time, is to request a DaTRI transfer to your LOCALGROUPDISK.
Cheers,
Yun-Ha
Hi Yun-Ha,
thanks, I asked for transfer to our local group disk.
The download succeeds in tmp/ repository, it may be that my eos quota is exceeded, which I find strange.
But I’ll check.
Thanks for your help in fixing this !
DianeERROR Missing DCS field information: solenoid 0 toroid 0 Hi,
I have a big set of jobs running on the grid, and ~98% have finished
successfully. However, for the remaining jobs I keep getting the
following error:
MagFieldAthenaSvc ERROR Missing DCS
field information: solenoid 0 toroid 0
IOVSvcTool ERROR Problems
calling MagFieldAthenaSvc[0xd3c6b64]+31
Skipping all subsequent callbacks.
IncidentSvc ERROR Standard
std::exception is caught handling incident0xff9aad54
etc
at multiple sites and with multiple retries. Is this a known issue? If
so, what is the work-around?
Cheers,
CameronPlease provide a Panda Monitor link to one or a few of the jobs i question.
Mattias Ellert
ATLAS DAST
Essentially any of the failed jobs here:
http://panda.cern.ch/server/pandamon/query?job=*&ui=user&name=Cameron%20Cuthbert
E.g.
http://panda.cern.ch/server/pandamon/query?job=2303236765
http://panda.cern.ch/server/pandamon/query?job=2303230620
I have downloaded one of the files failing and can confirm it fails locally, too. I think the issue is with the infile DCS metadata.
The log.log file ends with:
Shortened traceback (most recent user call last):
File "./BPhysAnalysisMasterAuto.py", line 55, in <module>
print "Setting evtMax to ",EvtMax
NameError: name 'EvtMax' is not defined
Py:Athena INFO leaving with code 8: "an unknown exception occurred"
Athena tries the print the variable “EvtMax” that is not defined.
Mattias Ellert
ATLAS DAST
Yes, but log.log is an old log file I created with an earlier version of the code.
"BPhysAnalysisMasterAuto.py" no longer contains these lines. The actual error in this case is in athena_stdout.txt. E.g. :
http://aipanda048.cern.ch:25880/monitor/logs/517f5c02-c1be-4c1a-9500-37899a64cdb0/tarball_PandaJob_2303236765_ANALY_RAL_SL6/athena_stdout.txt
Cameron
[cuthbert@sydui1 totalSumOfWeights]$
Hi Cameron.
I forward your question to the database experts.
Mattias Ellert
ATLAS DAST
Hi Mattias,
Thanks. I am also getting a second class of error which may be DB related:
ToolSvc.CaloNoiseToolDefault.sysInitialize() FATAL Standard std::exception is caught
ToolSvc.CaloNoiseToolDefault.sysInitialize() ERROR CaloCondBlobBase::getAddress: Index out of range: 100608 >= 95616
StatusCodeSvc FATAL Unchecked StatusCode in AlgTool::sysInitialize() from lib /cvmfs/atlas.cern.ch/repo/sw/software/i686-slc5-gcc43-opt/17.2.1/GAUDI/v22r1p7-lcg61d/InstallArea/i686-slc5-gcc43-opt/lib/libGaudiKernel.so
See:
http://panda.cern.ch/server/pandamon/query?overview=viewlogfile&nocachemark=yes&guid=9f934b7c-238c-4db2-95a0-9b6bd070ef29&lfn=group.phys-beauty.data12_8TeV.periodI.physics_Bphysics.PhysCont.DAOD_UPSIMUMU.grp14_v03_p1425.v1.log.4288166.001234.log.tgz&site=RAL-LCG2_SCRATCHDISK&scope=group.phys-beauty
Cheers,
Cameron
Any news on this?
I do not follow very well what you are doing Cameron, but looking at your log file it seems to me that the error
is related to time used to access the DCS folder.
Nevertheless I cannot check this, because the information is not printed.
May be you could try to use a DEBUG level of logging ? I do not think that to force the system to access Oracle-Frontier
you need to put the override line you mention. There should be something else at athena level, but there again
I cannot really help.
A.
Hi,
Ok I will try to run some test jobs on DEBUG and send you through the logs.
In the mean time, who can I ask about accessing the 'HEAD' of the conditions database?
Cheers,
Cameron
Hi
Regarding
This has probably nothing to do with DCS folders
This can be related to the fact that you are using a “new” condition tag and an old software version and the two are not compatible
(we have only backward compatibility not forward compatibility)
Guillaume
Hi,
Is there anything that can be done to fix this issue then, aside from using a different software version (not an option)?
Cheers,
Cameron
Hi
Maybe you can say which software release you are using and which condition tag you are using ?
Guillaume
Hi,
Py:Athena INFO using release [AtlasOffline-17.2.1] [i686-slc5-gcc43-opt] [17.2.X-VAL/rel_3] -- built on [2012 03/27 21:48]
TrfJobReport metaData_conditionsTag = COMCOND-BLKPA-006-07
Cheers,
Cameron
Here is the log file from a run with atlas -l DEBUG. Not sure it gives you any extra info in this case...
outDEBUG.log (1.6 MB) - Loss:
CachedMultipleNegativesRankingLoss
with these parameters:{ "scale": 1.0, "similarity_fct": "dot_score" }
Training Hyperparameters
Non-Default Hyperparameters
eval_strategy
: stepsper_device_train_batch_size
: 32per_device_eval_batch_size
: 32warmup_ratio
: 0.1fp16
: Truebatch_sampler
: no_duplicates
All Hyperparameters
Click to expand
overwrite_output_dir
: Falsedo_predict
: Falseeval_strategy
: stepsprediction_loss_only
: Trueper_device_train_batch_size
: 32per_device_eval_batch_size
: 32per_gpu_train_batch_size
: Noneper_gpu_eval_batch_size
: Nonegradient_accumulation_steps
: 1eval_accumulation_steps
: Nonetorch_empty_cache_steps
: Nonelearning_rate
: 5e-05weight_decay
: 0.0adam_beta1
: 0.9adam_beta2
: 0.999adam_epsilon
: 1e-08max_grad_norm
: 1.0num_train_epochs
: 3max_steps
: -1lr_scheduler_type
: linearlr_scheduler_kwargs
: {}warmup_ratio
: 0.1warmup_steps
: 0log_level
: passivelog_level_replica
: warninglog_on_each_node
: Truelogging_nan_inf_filter
: Truesave_safetensors
: Truesave_on_each_node
: Falsesave_only_model
: Falserestore_callback_states_from_checkpoint
: Falseno_cuda
: Falseuse_cpu
: Falseuse_mps_device
: Falseseed
: 42data_seed
: Nonejit_mode_eval
: Falseuse_ipex
: Falsebf16
: Falsefp16
: Truefp16_opt_level
: O1half_precision_backend
: autobf16_full_eval
: Falsefp16_full_eval
: Falsetf32
: Nonelocal_rank
: 0ddp_backend
: Nonetpu_num_cores
: Nonetpu_metrics_debug
: Falsedebug
: []dataloader_drop_last
: Falsedataloader_num_workers
: 0dataloader_prefetch_factor
: Nonepast_index
: -1disable_tqdm
: Falseremove_unused_columns
: Truelabel_names
: Noneload_best_model_at_end
: Falseignore_data_skip
: Falsefsdp
: []fsdp_min_num_params
: 0fsdp_config
: {'min_num_params': 0, 'xla': False, 'xla_fsdp_v2': False, 'xla_fsdp_grad_ckpt': False}fsdp_transformer_layer_cls_to_wrap
: Noneaccelerator_config
: {'split_batches': False, 'dispatch_batches': None, 'even_batches': True, 'use_seedable_sampler': True, 'non_blocking': False, 'gradient_accumulation_kwargs': None}deepspeed
: Nonelabel_smoothing_factor
: 0.0optim
: adamw_torchoptim_args
: Noneadafactor
: Falsegroup_by_length
: Falselength_column_name
: lengthddp_find_unused_parameters
: Noneddp_bucket_cap_mb
: Noneddp_broadcast_buffers
: Falsedataloader_pin_memory
: Truedataloader_persistent_workers
: Falseskip_memory_metrics
: Trueuse_legacy_prediction_loop
: Falsepush_to_hub
: Falseresume_from_checkpoint
: Nonehub_model_id
: Nonehub_strategy
: every_savehub_private_repo
: Falsehub_always_push
: Falsegradient_checkpointing
: Falsegradient_checkpointing_kwargs
: Noneinclude_inputs_for_metrics
: Falseeval_do_concat_batches
: Truefp16_backend
: autopush_to_hub_model_id
: Nonepush_to_hub_organization
: Nonemp_parameters
:auto_find_batch_size
: Falsefull_determinism
: Falsetorchdynamo
: Noneray_scope
: lastddp_timeout
: 1800torch_compile
: Falsetorch_compile_backend
: Nonetorch_compile_mode
: Nonedispatch_batches
: Nonesplit_batches
: Noneinclude_tokens_per_second
: Falseinclude_num_input_tokens_seen
: Falseneftune_noise_alpha
: Noneoptim_target_modules
: Nonebatch_eval_metrics
: Falseeval_on_start
: Falseeval_use_gather_object
: Falsebatch_sampler
: no_duplicatesmulti_dataset_batch_sampler
: proportional
Training Logs
Epoch | Step | Training Loss | Validation Loss |
---|---|---|---|
0.2890 | 100 | 0.7838 | 1.1991 |
0.5780 | 200 | 0.4176 | 0.6541 |
0.8671 | 300 | 0.2991 | 0.6290 |
1.1561 | 400 | 0.4573 | 0.6447 |
1.4451 | 500 | 0.1258 | 0.6278 |
1.7341 | 600 | 0.0781 | 0.6762 |
2.0231 | 700 | 0.1254 | 0.6074 |
2.3121 | 800 | 0.0727 | 0.7019 |
2.6012 | 900 | 0.0199 | 0.6263 |
2.8902 | 1000 | 0.025 | 0.6574 |
Framework Versions
- Python: 3.12.8
- Sentence Transformers: 3.2.1
- Transformers: 4.44.0
- PyTorch: 2.4.1
- Accelerate: 1.3.0
- Datasets: 3.2.0
- Tokenizers: 0.19.1
Citation
BibTeX
Sentence Transformers
@inproceedings{reimers-2019-sentence-bert,
title = "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks",
author = "Reimers, Nils and Gurevych, Iryna",
booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
month = "11",
year = "2019",
publisher = "Association for Computational Linguistics",
url = "https://arxiv.org/abs/1908.10084",
}
CachedMultipleNegativesRankingLoss
@misc{gao2021scaling,
title={Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup},
author={Luyu Gao and Yunyi Zhang and Jiawei Han and Jamie Callan},
year={2021},
eprint={2101.06983},
archivePrefix={arXiv},
primaryClass={cs.LG}
}
- Downloads last month
- 9
Inference Providers
NEW
This model is not currently available via any of the supported third-party Inference Providers, and
the model is not deployed on the HF Inference API.